Orientation Observation In-depth interviews Document analysis and semiology Conversation and discourse analysis Secondary Data Surveys Experiments Ethics Research outcomes



Social Research Glossary

About Researching the Real World



© Lee Harvey 2012–2018

Page updated 5 March, 2018

Citation reference: Harvey, L., 2012–2018, Researching the Real World, available at
All rights belong to author.


A Guide to Methodology

2. Orientation

2.2 Positivism

2.2.2 Elements of the Positivistic Approach Theory and hypotheses Operationalising concepts Positivist data collection Multivariate analysis Sampling error Generalising results and developing theory

Activity 2.2.3

2.2.2 Elements of the positivistic approach
Despite these disagreements about the precise nature of 'the scientific method', most positivist sociologists agree with the following basic elements of the positivistic approach.

For most positivist sociologists, the development of theory is via a process of making assertions (known as hypotheses) about the relationship between two or more factors and then testing whether these assertions can be 'falsified' when data is collected and analysed. The process of falsifying requires that 'appropriate' data is collected and that all theoretically sensible explanations of the relationship between the factors are tested out to make sure that they really do relate to each other. Paul Lazarsfeld and his colleagues at the University of Columbia (collectively referred to as the Columbia School) have clearly set out the positivistic approach to sociology (Lazarsfeld and Rosenberg, 1955; Lazarsfeld, Pasanella and Rosenberg 1972).

The elements are as follows:
1. reviewing existing theory and establishing a hypothesis;
2. operationalising concepts;
3. collecting data;
4. testing the hypothesis using multi-variate analysis;
5. generalising from the results and suggesting changes to theory and new hypotheses to test.

Top Theory and hypotheses
A theory, as we have seen in Chapter 1, is a more-or-less sophisticated or complex account of how or why aspects of the social world are as they are. Durkheim, (see CASE STUDY, Durkheim), had a fairly sophisticated theory to account for suicide. Although Durkheim gathered suicide statistics and examined them, his study did not, as we have suggested, actually start from the empirical data.

He already had some notions about the causes of suicide. Indeed, he had a clear ontological starting point that people are social beings and external factors impinge on the life and actions of individuals. Furthermore, his study of suicide was based on an epistemological presupposition that it is possible to establish the causes of social phenomena, in this case the causes of suicide. Durkheim's preconceptions about the transition from traditional to modern societies also gave him some hunches about the causes of suicide. He examined the statistics in an attempt to refine the vague theory he was developing.

Durkheim needed evidence (provided by suicide statistics) not only to make his theory convincing but also to help him develop the theory. This he did by testing out a series of hypotheses. These are simple statements that deal with a small part of the theory. For example, Durkheim thought that religion might be a factor affecting the extent of suicide. So, in effect, he set up the following hypothesis: different religious groups have different rates of suicide.

Top Operationalising concepts
Positivistic sociology, in its attempt to determine causal relations, needs to 'measure' factors so that they can be compared. For example, the rate of suicide amongst one religious group compared to that amongst another. To be amenable to measurement, theoretical concepts need to be defined as identifiable 'things', such as observations, statements of opinion or historical data. This is known as 'operationalising' concepts.

In Durkheim's hypothesis, (see CASE STUDY, Durkheim), we would need to operationalise the concepts of 'religious group' and 'suicide'. What constitutes a religious group? Which religious groups are included in the analysis? How do we define whether a death is a suicide or not? All these points have to be clarified so that we can actually measure suicide rates among different groups.

Durkheim noticed that suicide rates were higher for Protestants than for Catholics or Jewish people. Durkheim's main analysis compared rates of suicide between Protestants and Catholics. Suicide rates relate to different countries, not different religious groups, so Durkheim had to make comparisons, for example, by comparing countries that were predominantly Catholic with those that were mainly Protestant.

Durkheim was relying on existing suicide statistics, he was in no position to collect his own first-hand data. He was thus restricted to the definitions of suicide that are used when deaths are officially classified as suicide. It has subsequently been argued that the official statistics in Catholic countries underrepresent the number of suicides because in Catholicism, suicide is regarded as a sin and where possible the person's death is reclassified.

When a concept is 'operationalised', that is, it is specified in a way that allows you to measure it, then it is known as a 'variable'. It is called a variable because it varies. A variable must have at least two 'values'. So in Durkheim's study, 'religion' as a concept becomes operationalised as 'religious denomination' which has values such as 'Protestant', 'Catholic', 'Jewish' and so on. 'Suicide' as a concept becomes operationalised as 'official suicide' and has just two values 'death designated as suicide', 'death not-designated as suicide'. (See Section 8.3.5 for more on operationalisation.)

Activity 2.2.3
Suppose you have been commissioned to test the theory that there is a relationship between social class and educational achievement. In small groups, identify the dependent and the independent variables. Operationalise the concepts of 'class' and 'educational achievement'. When you have finished compare your operationalisation with other groups in your class. Make a list of other relevant independent variables that might affect educational achievement.
Time 20–40 minutes.

Top Positivist data collection
The key to data collection for positivist sociologists is to obtain data that can be used to check the hypothesis. Ideally, this data should not be biased (see 1.10.4) but should give a 'fair' chance of the hypothesis being rejected or accepted. In that sense, the data used in hypothesis testing must meet similar criteria to the natural scientific experiment. The problem, for positivist sociologists, is that it is not possible to control the social environment in the same way that a natural scientist can apparently control the laboratory environment.

Furthermore, a positivist sociologist is not always able to 'observe', or obtain information about, the 'perfect' setting that would enable a 'fair' test of a hypothesis. Either the data cannot be obtained or it is biased in some way.

Durkheim, for example, wanted data on the suicide rates of religious groups but had to settle for national suicide rates and compare the rates in Catholic countries with those in Protestant countries (see Case Study, Durkheim).

Bias in data occurs when it does not 'represent' the population that it is intended to represent. For example, if data on suicide rates in a country only related to those people who live in cities, (perhaps because it is too difficult to collect data from outlying country areas) then the data would be biased as it would not represent the whole country. In most cases, the question of representativeness arises when a sample of people is used to provide information about a whole population (see 1.10.3).

For example, opinion polls do not ask everyone in the country who they would vote for at the next general election. Instead only a relatively small sample of about 1000 people are questioned. If the sample is has an appropriate proportion of males and females, of ethnic groups, of people from different socio-economic backgrounds (reflecting the population as a whole) and is spread across the country, then the result of the survey will be very close to the result that would have been obtained from the entire population. It is said to be a 'representative' sample.

If, on the other hand, most of the sample had consisted of people living in the capital city, then it would not be a representative sample of the country and the result would be biased towards to views of people living in the capital. Equally, if no women had agreed to answer the questions, the sample would be biased towards males and again would not be representative of the intended population, there would, in this case, be what is known as 'non-response bias'.

(Note that a 'population' does not necessarily mean the population of a whole country. If you were investigating the views of patients in a particular hospital, for example, the 'population' would be all the patients in the hospital on a specific date).

Positivist sociologists look for four things from data used to test hypotheses: validity, reliability, accuracy and representativeness (see 1.8.2, 1.9, 1.10).

Validity: does the data being collected actually measure the concept being investigated? (For example, does Durkheim's use of national suicide rates measure the extent of suicide in different religious communities?)

Reliability: is the data being collected in the same way each time? (For example, did all the countries Durkheim examined compile suicide rates in the same way?)

Accuracy: is the data recorded accurately, that is, without making mistakes? (Durkheim, of course, could not have known if the clerks of the day had made any mistakes in compiling the figures).

Representativeness: does the data represent the 'population' it is supposed to represent? Are the proportions of the different categories of people in the sample similar to the proportions of these different categories in the population that is being explored? (For example, in Durkheim's case, the data was for entire countries so there was no question of whether the data was a representative sample. On the other hand, were the statistics for the years he examined representative of the suicide rates in other years?)

Reliability and validity, it should be noted, are quite distinct features of data collection. It is, for example, possible to obtain the same or very similar results time and time again yet still fail to measure what was intended to be measured.

Positivist sociologists use a variety of methods to collect data. These include the use of already-collected statistics (such as the official statistics used by Durkheim), social surveys (in which a representative sample of respondents are asked the same questions), analysis of documents or records, systematic observation, or even controlled experimental situations (such as the use of one-way mirrors to observe groups of experimental subjects).

In most cases, although not all, positivist sociologists try to end up with some form of quantitative data so that they compare numbers in different categories. To develop a causal analysis, Durkheim compared the rate of suicide amongst Protestants with the rates amongst Catholics and other religious groups.

Although, in theory, the hypothesis precedes the data collection, in practice, the hypothesis has often to be amended to fit the reality of the data available, or the limitation on what it is possible to collect. So, in effect, Durkheim's hypothesis had to be changed to: Protestant countries have higher suicide rates than Catholic countries. He then, in effect, had to 'operationalise' this by defining what constituted 'a Catholic country' and 'a Protestant country'.

Top Multivariate analysis
Multivariate analysis sounds complicated and many novice researchers are put off by the concept because they think it is about the use of complicated statistical procedures. To the contrary, the principles of multivariate analysis are straightforward. Multivariate analysis is the basis of the analytical process used by positivist social scientists. It translates the principal of falsificationism (see into practice in a social setting.

It operates as follows. A hypothesis is set up that asserts that one variable is dependent on another variable. For example, the rate of suicide is dependent on religion (see Case Study, Durkheim). There is a 'dependent' variable (suicide) and an 'independent' variable ('religion'). Being clear which is dependent and which independent is important as it shows the direction of the expected causal link. Suicide depends on religion, religion does not depend on suicide (it would be too late by then!).

The data that is collected is used to test out the relationship between the two variables (this is known as a 'bivariate' relationship). Durkheim's data, for example showed that the suicide rate in Germany (a Protestant country) was higher than in Italy (a Catholic country). Similarly, he could have shown that the average suicide rate in all Protestant countries is higher than the average suicide rate in all Catholic countries. This would suggest that the bivariate 'relationship', suicide depends on religion, is probably correct.

However, to be a 'fair test', we need to take account of other factors that may distort the findings. It may be that it is nationality, not religion, that affects the rate of suicide. Nationality would be a third variable, it would be another 'independent' variable in this case. Durkheim tested out the possibility that nationality affected suicide rates by comparing the rates in the Catholic areas of Germany with the rates for the Protestant areas and showed that the Catholic rates were still lower. So, it seems likely that nationality is not important.

What has happened here is that the bivariate relationship between 'suicide' and 'religion' has been elaborated by using another variable 'nationality'. Nationality was used to see if the relationship between suicide and religion would disappear, or, as it is known in multivariate terms, to test whether the suicide-religion relationship was 'spurious'.

The independent variable 'nationality' is known as a 'control' variable as it controls the setting in which the suicide rates are compared, in this case, to just one country. The data showed that the suicide-religion relationship was not spurious when controlling for nationality.

The principle of multivariate analysis, then, is as follows.
1. Specify a relationship between X (dependent variable) and Y (independent variable).
2. Collect data to see if X is related to Y.
3. If so, test to see if the relationship between X and Y is spurious by testing for one or more other independent control variables, to see if the original relationship between X and Y disappears.
4. If it does, the original relationship is 'spurious'. If it does not, then we have more confidence that the relationship is 'real' and can be used to develop a theory.
5. So, in essence, multivariate analysis moves the theory forward by attempting to disprove relationships.

The process defined here is the basic principle of multivariate analysis. The approach can, and often is, made more complicated by attempting to build and test out complex causal models.

Another stage of multivariate analysis is to 'specify' the relationship. Specification can take two forms. First, showing how a non-spurious relationship can vary in different circumstances. For example, Durkheim's analysis showed that irrespective of religion, suicide rates were higher in cities than in villages. Second, develop the model to show how more than one independent variable relates to the dependent variable. For example, Durkheim showed that being married and having children also affected the rate of suicide.

So, multivariate analysis does not just test a bivariate relationship it also attempts to elaborate the underlying relationship.

In the section on social surveys (Part 8), multivariate analysis is considered in more detail.

Top Sampling error
The real complication, even for a simple analysis, is that the positivist sociologist is usually dealing with samples. Even if the sample is not biased, the sample data will not exactly match the population data, (as was noted in section There will be some variation.

For example, if you toss a coin 20 times you would expect that it to come up as 'heads' on 10 occasions, assuming it is an unbiased coin. However, it will not be heads exactly 10 out of 20 times in practice, it will vary, mostly somewhere between 8 or 12 times. It is extremely unlikely, although possible, that if you tossed it 20 times they would all be heads (in fact this is likely to occur only once in a million sets of tosses).

So although a representative (thus, non-biased) sample will closely reflect the population, it will not necessarily be identical, nor on the other hand is it likely that it will be very different. So any sample will have, what is known as, 'sampling error'. This is not 'bias' but the variation due to choosing a sample rather than including the whole population. This is where statistical procedures tend to come in to the analysis. They are used to distinguish between a 'real' difference in results and one that could have resulted from 'sampling error'.

For example, if we have a hypothesis that says 'Jill is better at getting heads than Jack' and we had a sample data that showed Jill getting 11 heads from 20 tosses and Jack getting 9, would we say that Jill is better? No, we would not. The evidence is not conclusive because we would expect that degree of variation. There is a high degree of probability that in tossing a coin 20 times anyone would get 9 or 11 heads. In short, the difference is not big enough to be convincing.

Statistical analyses help us determine what constitutes what is known as a 'significant' difference, that is, one that leads us to think there is a real difference not just a variation due to taking a sample.

Study Point
How big a difference in the number of heads that Jill and Jack get out of 20 tosses of a coin would convince you that one is better than the other?


Multivariate analysis can be used to develop complex models using 'advanced' statistical procedures taking into account sampling error. However, modern computing now means that there is no longer a problem in having to spend considerable time computing statistics, much more important is to understand the purpose and limitations of statistical procedures. (Part 8, Social Surveys refers to these issues) The important thing is to understand the principles of multivariate analysis and of the difference between sampling error and bias.

Top Generalising results and developing theory
Finally, positivistic social researchers attempt to generalise from the results of their enquiries (see also Section 1.10.1). They need to be able to suggest that the theoretical relationships they have discovered in their sample are applicable to a wider range of circumstances. If they cannot make a convincing case about the representativeness or wider applicability of their analysis then they are unable to suggest that modifications be made to theory, or that they have further confirmed a theory.

Positivists often undertake large-scale surveys so that they have large amounts of data that appear to be convincing enough to enable them to make generalisations about the whole population on the basis of their findings. After all, the aim is to find 'universal laws' (usually in inductive studies (see or at least to propose theories that are widely applicable (in deductivist studies (see

Furthermore, analysing the data (multivariate analysis) and generalising the results is not the end of then process. This has to be linked back to theory. This is not a mechanical process and requires further conceptual thought.

For example, Durkheim's analysis (see Case Study, Durkheim) showed how marriage and children as well as religion related to suicide. His theorising, however, involved a conceptual leap not directly provided by the data. His notion, for example, of egoistic suicide being caused by a lack of integration into society was a conceptual leap from the data that showed that Protestantism (a more individualistic religion) and a lack of family were linked to suicide.


Next 2.2.3 Middle-range theorising