Orientation Observation In-depth interviews Document analysis and semiology Conversation and discourse analysis Secondary Data Surveys Experiments Ethics Research outcomes



Social Research Glossary

About Researching the Real World



© Lee Harvey 2012–2020

Page updated 29 April, 2020

Citation reference: Harvey, L., 2012–2020, Researching the Real World, available at
All rights belong to author.


A Guide to Methodology

5. Document analysis and semiology

5.1 Introduction
5.2 Document analysis for what?
5.3 Establishing the nature of documents and categorising them (external analysis)
5.4 Approaches to document analysis
5.5 Evidence of occurrence
5.6 Content analysis
5.7 Qualitative document analysis
5.8 Historical research
5.9 Hermeneutics
5.10 Semiology

5.11 Critical media analysis
5.12 Aesthetics. art criticism, art history

5.1 Introduction
Document analysis is the systematic examination of documents that constitute data in a research enquiry. Document analysis is not a summary or description of the contents of a document. It involves an analysis of the content and, in most cases, an examination of the motivation, intent and purpose of a document within a particular historical or contemporary context.

Document analysis refers to documents that have been produced prior to, and independently, of the research enquiry and have been subsequently used by the researcher. Thus, document analysis as presented in this section does not include analysis of documents compiled by the researcher, such as life histories or in-depth interview transcripts.

Documents, in this context, include any extant published or unpublished written material, including books, articles, newspapers, posters, legal or religious documents, court transcripts, parliamentary proceedings, census records, land deeds, wills, poems, songs, static advertisements, letters, inscriptions on a gravestone, minutes of meetings, notes, memoirs, sermons, political speeches, powerpoint presentations, social media posts and electronic blogs.

In addition, 'documents' has a wider meaning and includes any other means of documentation, such as, video, film, photographs, television programmes, filmed advertisements, paintings, drawings and any other relevant physical object within the research setting.

Documents are sometimes divided into four categories: published, archived, private and personal.

Published means anything that is available to anyone without recourse to a special archive, such as the national archives or public records office.

Archived means those items that are available within an archive, which may or may not be open to any member of the public: the Regenstein Library Special Collections Research Center at the University of Chicago, for example, 'welcomes 'use of the resources by faculty, students, and staff of the University of Chicago and by visiting researchers' (University of Chicago, nd).

Private documents refers to organisational documents that derive from businesses, non-governmental organisations, trade unions, educational establishments and include minutes of meetings, memos, training manuals, invoices, internal policy statements, tax returns, reports other those required to be published.

Personal documents are documents that belong to a private individual and are not in the public domain, such as personal letters, diaries and photographs.

Another way of classifying documents is to refer to them as official or unofficial. The former usually means documents produced by central and local government or a government appointed or sanctioned agency. Official documents are usually projected as objective statements, although they are social products with a particular ideological perspective. Unofficial documents would be anything not covered by official documents but sometimes is reserved for policy-related papers that have not been sanctioned by government ministries such as documents produced by non-governmental organisations (NGOs). In some cases, unofficial government papers refers to documents not in the public domain (such as secret or restricted-access documents).

Another view of documents is to classify them as primary or secondary. Primary documents are those in which the author is recounting first-hand experience. Secondary documents are those where events are documented second hand through gathering accounts from those present at the time (most journalism tends to be second-hand accounts). This classification has very little use when the document is not describing an event.

Documents can be single item documents or part of a longitudinal series, such as employment statistics, continuous attitude surveys and morbity rates. Gaining access to single items can be difficult; identifying and accessing such documents can be extremely time-consuming. Using longitudinal series clearly provides a perspective over time and, as they are normally data collected by government, they are usually readily available and cost nothing. However, the researcher needs to bear in mind that such time series are not 'objective' and are collected for a particular purpose using a specific methodology. It is important to be aware of the methodology in order to be able to identify any limitations in the construction of the longitudinal or time series data: not least any changes in methodology or definition of the statistic being compiled.


Next 5.2 Document analysis for what?