Orientation Observation In-depth interviews Document analysis and semiology Conversation and discourse analysis Secondary Data Surveys Experiments Ethics Research outcomes



Social Research Glossary

About Researching the Real World



© Lee Harvey 2012–2020

Page updated 29 April, 2020

Citation reference: Harvey, L., 2012–2020, Researching the Real World, available at
All rights belong to author.


A Guide to Methodology

8. Surveys

8.1 Introduction to surveys
8.2 Methodological approaches
8.3 Doing survey research

8.3.1 Aims and purpose
8.3.2 Background to the research
8.3.3 Feasibility
8.3.4 Hypotheses
8.3.5 Operationalisation
8.3.6 How will data be collected and what are the key relationships?
8.3.7 Designing the research instrument
8.3.8 Pilot survey
8.3.9 Sampling
8.3.10 Questionnaire distribution and interviewing
8.3.11 Coding data Coding frame Data file

8.3.12.Response rate

8.4 Statistical analysis
8.5 Summary and conclusion

8.3.11 Coding data
Coding is the process of transferring the responses provided by the respondents into a data file that can be used for analysis.

Completed interview schedules or questionnaires provide a wealth of material but, to make sense of the data, they have to be approached systematically. The first job is to extract the data from the questionnaires and put it into a form that is easier to analyse, viz. a data file. Once compiled into dta file, there are a variety of software packagaes available that will enable rapid results. (There are very few occasions when the researcher would analyse the data by hand instead of using a computer.)

Top Coding frame
The easiest way to create a data file is to draw up a coding frame when designing the questionnaire or schedule. The coding frame lists all the alternative values for a given variable and allocates a number for each possible answer (see CASE STUDY Attitudes towards homosexuality: Coding Frame).

It is important to list the values for each variable bearing in mind that a question may have more than one variable.

For example, the following question has a single variable and the newspaper titles are the values:

Which of the following national newspaper do you read most often? (Please select one answer)
Daily Mirror
Daily Express
Daily Telegraph
None of these

However, if the question had been:

Which of the following national newspaper do you normally read? (Please tick all that are appropriate)
Daily Mirror
Daily Express
Daily Telegraph
None of these

Then the question contains seven variables; in effect there are seven questions, viz.:

Do you read the Daily Mirror? yes/no;
Do you read the Daily Express? yes/no;
and so on,

and each newspaper constitutes a variable and the values are yes/no.

The coding frame for closed questions can be taken directly from an interview schedule, as for example in the CASE STUDY Attitudes towards homosexuality: Coding Frame.

Additional coding for open questions needs to be devised. There are three broad alternatives for doing this.

First, pre-coding the open question (although the alternative answers are not seen by the respondent) and then 'forcing' responses into the pre-coded categories.

Second, logging all the open responses and grouping them into categories based on an analysis of the responses.

Third, treat the responses as qualitative data and analyse separately without trying to quantify them.

To make questionnaires easier to use and less cluttered for the respondent, it is usual not to include codings on the questionnaire.

The coding frame should, though, be drawn up for the questionnaire before attempting to code data.

Top Data file
What you then need to do is to produce a grid on which to record the appropriate codes for each respondent. Normally, each row of the grid represents a respondent and each column represents a variable
(see CASE STUDY Attitudes towards homosexuality: Data file.

Sometimes a variable has more than nine values and takes up more than one column of figures. For example, question 30 in the CASE STUDY Attitudes towards homosexuality: Coding Frame asks for age and values range from 18 upwards. This takes up two columns.

To produce the data file the grid has to be completed. Start with the first questionnaire or interview schedule. Write an identity number on the schedule (if there is not one already there) and then enter the identity number in the first column of the grid. Then go through the schedule inserting the correct code for each variable in the appropriate column.

It is important to take care to avoid errors creeping in at this stage. Such miscoding errors are very hard to detect later. One way to reduce errors is for two people (or teams) to code the data independently and then to compare the two data files. This is very effective but increases the work load.

Activity 8.3.11
Code the data you got from your pilot survey, Activity


Next 8.3.12 Response rate