RESEARCHING THE REAL WORLD



MAIN MENU

Basics

Orientation Observation In-depth interviews Document analysis and semiology Conversation and discourse analysis Secondary Data Surveys Experiments Ethics Research outcomes
Conclusion

References

Activities

Social Research Glossary

About Researching the Real World

Search

Contact

© Lee Harvey 2012–2019

Page updated 25 January, 2019

Citation reference: Harvey, L., 2012–2019, Researching the Real World, available at qualityresearchinternational.com/methodology
All rights belong to author.


 

A Guide to Methodology

8. Surveys

8.1 Introduction to surveys
8.2 Methodological approaches
8.3 Doing survey research

8.3.1 Aims and purpose
8.3.2 Background to the research
8.3.3 Feasibility
8.3.4 Hypotheses
8.3.5 Operationalisation
8.3.6 How will data be collected and what are the key relationships?
8.3.7 Designing the research instrument
8.3.8 Pilot survey
8.3.9 Sampling
8.3.10 Questionnaire distribution and interviewing
8.3.11 Coding data
8.3.12 Analysis

8.3.12.1 Response rate
8.3.12.2 Frequency tables
8.3.12.3 Graphical representation
8.3.12.4 Measures of central tendency (averages)
8.3.12.5 Levels of measurement
8.3.12.6 Crosstabulation
8.3.12.7 Measures of dispersion
8.3.12.8 Generalising from samples
8.3.12.9 Dealing with sampling error
8.3.12.10 Confidence limits
8.3.12.11 Statistical significance
8.3.12.12 Association
8.3.12.13 Summary of significance testing and association: an example

8.3.13 Hypothesis testing
8.3.14 Significance tests
8.3.15 Report writing

8.4 Summary and conclusion

Activity 8.3.12.2
Activity 8.3.12.3
Activity 8.3.12.4

8.3.12.5 Levels of measurement
Where data can only be grouped into broad categories of agreement such as 'yes' or 'no', or into religious categories such as 'Christian', 'Muslim', 'Sikh' or 'Atheist', then we have nominal data. People are put into named categories but there is no sense in which these categories can be put in any order. This is the lowest level of data.

Where the categories can be put in some sort of order then the data is said to be ordinal. Questions where people rank their preferences are examples of ordinal data, as are answers to questions where the choices range from 'strongly agree' through 'agree' and 'disagree' to 'strongly disagree'. Ordinal data is said to be of a higher level than nominal data.

Where the categories are ordered and the gap between the categories is of equal size then the data is said to be interval scale. Temperature is an example of an interval scale, as are incomes measured in numbers of dollars or pounds. Interval scale data is of a higher level than ordinal data.

When interval data has a meaningful zero value, then the data is referred to as ratio scale because the size different scores can be meaningfully compared. For example a measurement in miles is ratio scale because zero miles means no distance. Similarly income measured in numbers of euros is ratio scale, as is weight in kilos, or distance in miles.

A Likert scale ranging from 1 'very poor' through 2 'poor' to 5 'very good' is an ordinal scale and in some cases is construed to be interval (although the intervals may only be assumed to be equal) but is not at all a ratio scale because zero has no meaning on this scale.

Activity 8.3.12.2
What scale of data are variables V1, V9, V30, V31 and V34 in the example survey?

In deciding the best average you need to take account of the scale of the data. You need at least interval scale data to work out an arithmetic mean, as the actual size of each value is important.

You need at least ordinal scale data to work out the median because you need to put data in rank order. You cannot work out the median religion, for example, as there is no way they can be ranked in order.

The mode can be computed for nominal or higher-level data. If you can put things in categories you can work out which category has the most in it. So you could work out the modal religion for a sample of people.

The example data in Table 8.3.12.3 is a ratio interval scale as it measures age in years. So we can use any of the three measures meaningfully. As it is high-level data, the first preference is the arithmetic mean because it takes account of all the values. However, we might want to think about the median in this case because the mean is being made artificially high by the half-dozen very high ages. They have tended to drag the value of the mean upwards because they are so disproportionately large. The median, on the other hand, simply regards these large values as 'above the middle' and disregards their actual size. So where an interval or ratio scale data distribution is skewed to one side then it is worth considering using the median as a better indication of average than the arithmetic mean.

The mode would not normally be used in this circumstance as it does not take into account the other values at all. It is more or less by chance that it happened to be the same as the median in this case and it could easily have been '16' or even '21'. The mode is really only of any use with interval scale data when the distribution has a single peak, unlike Table 8.3.12.3, which has three separate peaks.

Activity 8.3.12.3
Compute the arithmetic mean, median and mode for the age of consent for women (V21) and compare them with those for men (V22).

The following sums up the analysis of hypothesis 1 of the CASE STUDY thus far. Hypothesis 1 posits hostility towards gays and lesbians; it has three parts. First, that homosexuality should be illegal. the analysis has shown that there are twice as many people who think homosexuality should be legal as those who think it should be made illegal.

Second, that the age limit for consenting adults should be raised. The analysis shows that half the entire sample thought that the age limit for consenting gays and lesbians should be reduced. The median age suggested by the sample was 18 for both gays and lesbians.

The third item was acceptance of gay and lesbian families. The majority of the sample were opposed to gays (74.6%) and lesbians (69.6%) raising children in a family situation.

You can check these figures for using the CASE STUDY data file.

Respondents were invited to make general comments at the end of the questionnaire and some took the opportunity to explain the reasons for their answer to this question. Where appropriate, any relevant comments provided by respondents should be used to elaborate the analysis: the report does not just have to be statistical.

The main reasons given for not agreeing to gay couples raising children were that the child would grow up to see homosexuality as normal and would not have the benefit of a conventional, heterosexual family upbringing. Some people commented that having gay parents would lead to the child growing up to be gay because that would be the only relationship of which the child would be aware. In these comments there was an implicit assumption that socialisation which promoted positive images of homosexuality was undesirable. This clearly reflects the dominant heterosexist ideology of the time and the reassertion of family values.

The first two items suggest that the sampled population is more tolerant than hostile towards gays and lesbians. However, that tolerance is tempered by the views of the sample on gay and lesbian relationships as suitable family environments. While homosexuality was broadly viewed with a degree of tolerance it was still seen as a deviation from the heterosexist norm.

Activity 8.3.12.4
What proportion of people think that gays and lesbians should restrict affectionate acts to the privacy of their own home? Given this additional information, what would be your initial assessment of hypothesis 1?

Hypothesis 1 has been assessed on the basis of frequency tables, making use of percentages and averages. The conclusions have been presented as initial assessments rather than definitive proof or disproof. There are three reasons for this lack of expressed certainty.

First, the analysis has only considered the overall frequencies for each variable and has not considered other factors that might affect the results, such as gender and age of respondent.

Second, the results relate to the sample and care must be taken when generalising about the population from which the sample was drawn because of the problems of sample bias and sampling error.

A third reason for not claiming proof is the problem of asserting a probabilistic outcome as definitive proof.

 

See also DATA ANALYSIS: A BRIEF INTRODUCTION Section 8 (Downloads a Word document into your downloads folder)

Next 8.3.12.6 Crosstabulation

Top