Orientation Observation In-depth interviews Document analysis and semiology Conversation and discourse analysis Secondary Data Surveys Experiments Ethics Research outcomes



Social Research Glossary

About Researching the Real World



© Lee Harvey 2012–2019

Page updated 15 June, 2019

Citation reference: Harvey, L., 2012–2019, Researching the Real World, available at
All rights belong to author.


A Guide to Methodology

8. Surveys

8.1 Introduction to surveys
8.2 Methodological approaches
8.3 Doing survey research

8.3.1 Aims and purpose
8.3.2 Background to the research
8.3.3 Feasibility
8.3.4 Hypotheses
8.3.5 Operationalisation
8.3.6 How will data be collected and what are the key relationships?
8.3.7 Designing the research instrument
8.3.8 Pilot survey
8.3.9 Sampling
8.3.10 Questionnaire distribution and interviewing
8.3.11 Coding data
8.3.12 Analysis Response rate Frequency tables Graphical representation Measures of central tendency (averages) Levels of measurement Crosstabulation Measures of dispersion Generalising from samples Dealing with sampling error Confidence limits Statistical significance Association Summary of significance testing and association: an example

8.3.13 Hypothesis testing
8.3.14 Significance tests
8.3.15 Report writing

8.4 Summary and conclusion

Activity Association
Measuring the degree of a relationship between two variables is done by using measures of association. They provide a measure of the extent to which two variables are interrelated.

Imagine driving a car. The harder you press down on the accelerator the faster the car goes. Variable A (pressing the accelerator) is directly related to variable B (speed of the car). The measure of association between A and B in this example will be approximately 100 per cent. On the other hand, the speed of the car will not be affected in the slightest by pressing on the horn, however hard you press. So variable B (speed of the car) will be unrelated to variable C (pressing the horn). The measure of association, in this case between B and C, will be zero.

Conventionally, all statistical measures of association generate a value (called a coefficient) between 0 and 1. A coefficient of 0 means that there is no observed association at all and a coefficient of 1 means that the two variables are perfectly related, that is, that changes in the independent variable are perfectly matched by changes in the dependent variable. In practice, in the social world, the degree of association is somewhere between 0 and 1.

For example, using the CASE STUDY example we could work out the relationship between hostility towards gays and age of respondent to see if hostility decreases as the respondent gets older. This would be another way of assessing the part of hypothesis 4, (see CASE STUDY hypotheses) which suggests that age affects attitudes. There are two ways of doing this. The first is to calculate the measure of association called the correlation coefficient. (Its full name is Pearson's Product Moment Correlation Coefficient and is represented, by the letter 'r'.)

The correlation coefficient can only be used if both variables are at least interval scale. Age is clearly interval; but what about hostility to gays? There are several different aspects of hostility. We could compute a hostility score by counting the number of hostile responses that occur for the following questions:

Do you think homosexuality should be made illegal? (That is, score 1 if V23 is 'yes'.)
Do you think acts of an affectionate nature between homosexuals should be confined to their own home? (That is, score 1 if V25 is 'yes'.)
Do you agree with two gay men rearing children in a family situation? (That is, score 1 if V28 is 'no'.)
Do you think the age of consent should be raised? (That is, score 1 if V22 is greater than 20.)

The score for each respondent thus ranges between 0 (non-hostile) to 4 (very hostile). This can be treated as interval scale data, although you should note that the questions might not be regarded as of equal weight. This new 'hostility score' variable will be labelled V35.

It is thus possible to calculate the correlation coefficient of hostility score and age. The mechanics of how to calculate the coefficient are not explained here, as most statistical programs will work it out nearly all statistics textbooks explain how to do it.

What it involves, in this example, is comparing each person's age with their hostility score and seeing to what extent changes in age are reflected in changes in score. If all the young people had a high score and all the old people had a low score then we would have a high degree of association between age and hostility score and the correlation coefficient would be near to 1. (In fact it would be near to 1 as the relationship is inverse, that is, like braking in a car: the harder you push the brake the slower the car goes.) If, on the other hand, some young people and some old people had high scores and similarly some young people and some old people had low scores then there would not be much association at all and the score would be near to zero.

The correlation coefficient for age (V30) and hostility score (V35) is 0.0633. You can check this from the CASE STUDY data file. It can be calculated easily and quickly by computer but will be time consuming done by hand. What does this figure of 0.0633 mean? The minus indicates that as the respondents get older the level of hostility decreases. However, the actual size of the coefficient is very small, only 0.0633, which means that there is very little association at all between age and hostility.

1. Using the CASE STUDY Data file and a computer work out the correlation coefficient for hostility score for lesbians and age (V30).
2. Compare the results for lesbians with the one shown above for gays.
3. Similarly, work out the correlation coefficient of hostility score and sex of respondent (V31) for gays and for lesbians and compare the results.
NOTE: If you are doing this activity you will need to compute two new variables for each person in the sample, hostility score for gays (V35) and hostility score for lesbians (V36). This will be time consuming by hand but can be performed very quickly using a computer program.

An alternative way of measuring the relationship between age and hostility score would be to construct a crosstabulation of hostility score and age. Then work out the degree of association within the crosstabulated table.

There are several different measures of association for crosstabulated data. A useful one is a coefficient called lambda, which can be used for any scale of data. It is fairly easy to compute and most statistical programs will provide it as an aoption when computing crosstabulations.

Table is typical of a crosstabulation for age and hostility score that you would be generated from a computer program. In this case it has the value of chi-square and lambda at the bottom. The following explains how to read the statistical data provided and what they mean.


Table Crosstabulation V35 'Anti-gay hostility score' by V30 'Age in years'

Row %
Column %
16 or 17
19 or 20
Row total

Chi-square = 7.76994, D.F.= 8, Significance = 0.4563
Cells with EF<5 = 6 f 15 (40%)

Lambda symmetric = 0.0744
Lambda with V35 dependent= 0.0625
Lambda with V30 dependent= 0.0877


First, chi-square: the calculated value is 7.76994. This in itself tells you very little, because chi-square gets bigger the larger the table, irrespective of whether or not there is any statistical significance. The column marked D.F. stands for 'degrees of freedom' and in this case there are eight. The degrees of freedom are vital when converting a chi-square result into a significance value and it is necessary to know the degrees of freedom if looking up the chi-square in a set of tables. In this case it is additional information that you do not really need as the computer has worked out the significance level (that is, 0.4563).

A significance value (as explained in Section of 0.4563 means that there is a 45.63% chance that the difference in the sample (that is, difference in hostility scores for each age group) is likely to be the result of sampling error. This is far higher than the conventional 5% cut off point. Therefore, in this case it is possible that any differences in the sample are the result of random sampling error.

The last piece of information about the chi-square is the 'number of cells with expected frequency less than 5'. In this case it is 6 out of 15 cells, which is 40%. It is conventional to treat chi-square values with great caution if the percentage of cells with 'expected frequency less than 5' is more than 20%. In this case, the conclusion would be that the chi-square statistic is unreliable because of the large proportion of low expected frequencies.

Second, lambda: lambda is a measure of association that can be used for any scale of data. There are three values for lambda in the example. The first has age (V30) dependent = 0.08772. The second has hostility score (V35) dependent = 0.06250. The third is a 'symmetric' value, which is the average of the other two = 0.07438. The value required in this example is the one where hostility score (V35) is dependent, that is, 0.0625.

What does this tell us? It tells us that there is very little association at all between age and hostility score. Note that the lambda value was 0.06250 and the Pearson Correlation Coefficient worked out earlier was 0.0633. They are not the same, and are unlikely to be, as they are computed in different ways. They are, however, quite close and you would expect them to be approximately the same size, given that they are measuring association between the same two variables. Remember that the lambda coefficient can be used on ordinal scale data but interval scale data is needed for the Pearson Product Moment Correlation Coefficient.

Using a computer with a statistics package, analyse the CASE STUDY data file to explore the following by constructing crosstabulations and examining the statistics generated by the program.
1. Is there a statistically significant difference in hostility scores for men and women?
2. Is association between anti-gay hostility and sex greater than between anti-gay hostility and age? Comment on your results in relation to hypothesis 4 (see CASE STUDY Attitudes towards homosexuality: Hypotheses)
3. Construct the crosstabulation for legalisation of homosexuality (V23) and gay acquaintances (V26). Does the value of the chi-square statistic suggest that having gay acquaintances effects views on illegalising homosexuality?


See also DATA ANALYSIS: A BRIEF INTRODUCTION Sections 12 -14 (Downloads a Word document into your downloads folder)

Next Summary of significance testing and association: an example