Orientation Observation In-depth interviews Document analysis and semiology Conversation and discourse analysis Secondary Data Surveys Experiments Ethics Research outcomes



Social Research Glossary

About Researching the Real World



© Lee Harvey 2012–2017

Page updated 29 May, 2017

Citation reference: Harvey, L., 2012–2017, Researching the Real World, available at
All rights belong to author.


A Guide to Methodology

7. Secondary data

7.1 Introduction to secondary analysis
7.2 Extent of re-analysis of secondary data

7.3 Nature of the data

7.4 Data sources

7.5 Examining data sources
7.6 Methodological approaches

7.7 Summary and conclusion

7.7 Summary and conclusion
Social researchers can use existing data in their research. Using available data offers economies of time, money and personnel as well as limiting the ‘reporting burden placed on the public’. However, seeking out appropriate data still often requires a considerable amount of work.

Secondary data analysis involves more than finding results through a literature search; it involves taking existing data and reworking it in some way to explore an issue that the original research has not addressed.

Secondary data should be approached cautiously. All data rely upon the assumptions, conceptions and priorities of those who collect them and should not, therefore, be treated as neutral or ‘objective’ facts. However, this does not render secondary statistics unsuitable for further examination and analysis, indeed they can constitute a useful, highly accessible and comprehensive resource if approached critically.

Existing data is in the form of statistics collected and published both by government and non-government organisations. There are a variety of published statistics ranging from regularly produced government official statistics through to one-off unofficial statistics. Some of this data is available as full data files and some just as already-formed tables, which limits the extent of re-analysis. Other secondary data is in the form of non-statistical archives, such as historical archives, personal material such as letters and diaries, and in-depth interview transcripts.

Clearly, the theoretical position adopted by the user will shape the interpretation of any statistical data, which are themselves subject to a multitude of influences during their production. Are secondary statistics then of any value for social enquiry? This leads to the more fundamental questions about the possibility of measuring social phenomena at all, that is, to questions of epistemology. Positivist, phenomenological and critical perspectives approach statistical data in different ways.

While there is substantial tradition within social science of survey work, most social scientists think of collecting new data rather than re-analysing existing data sets. Researchers like to collect data in a way that explores their particular concerns and tend to think that extant data will be inadequate, for three reasons. First, it was collected with another aim in mind and therefore does not fit the research hypotheses closely enough. Second, the researcher lacks control over the collection process and so does not really know how ‘good’ the data is. Third, the data is ‘old’ data; it has already been used and is ‘out of date’.

There are problems of validity when using statistics that have been collected for other purposes. The researchr's theoretical concerns may not be the same as the ones that guided the original data collection. There are often problems of definition of basic concepts. There are also problems over the reliability of some published statistics. Government administrative statistics are often regarded as unreliable. Government survey statistics are usually much more reliable. The survey division of GSS is scrupulous in its methods of data collection. Unofficial statistics vary enormously and it is often difficult to judge the reliability of the data collection process.

Comparisons over time can be a problem when using government statistics because of the numerous inconsistencies in collection, analysis and presentation.

However, secondary analysis can be a useful addition to new research, especially in initial stages of development, such as using demographic statistics to define sample quotas. It may also complement data collected in other ways by providing a comparison or a national framework.

Furthermore, some research can only be done through secondary research. Those using census data are an obvious example. The sociology of labour, health and poverty rely heavily on secondary data. Demographic research is necessarily based on secondary data derived from censuses, surveys and administrative records.

Although secondary analysis has a long history (Marx, 1887 ; Booth, 1889–97; Durkheim, 1897) the establishment of data archives and the development and accessibility of information technology has provided the impetus for the development of secondary data analysis. This has also been helped by the increasing use of social surveys, rather than administrative records, by the Government Statistical Service and the availability of these via repositories such as the UK Data Archive. Secondary statistical analysis is now used within all the social scientific disciplines: sociology, politics, economics, history, demography, education, geography, planning, social psychology, health, social medicine and labour force studies. Indeed the same data set is often analysed from different perspectives by more than one discipline. This contributes to a multidisciplinary understanding of social issues. In the end, however, despite the availability of the data and the technology to analyse it, secondary data analysis only occurs if the data is worth reanalysing.

Published statistics are often affected by political pressures; including manipulation of the way they are collected and presented. Using a variety of sources helps to reveal some of the ways that statistics have been manipulated. Published statistics do not ‘speak for themselves’ but must be interpreted; different sociological perspectives can lead to very different interpretations of the same data.


Next 8 Surveys