Social Research Glossary
Citation reference: Harvey, L., 2012-17, Social Research Glossary, Quality Research International, http://www.qualityresearchinternational.com/socialresearch/
This is a dynamic glossary and the author would welcome any e-mail suggestions for additions or amendments. Page updated 2 January, 2017 , © Lee Harvey 2012–2017.
|A fast-paced novel of conjecture and surprises|
Regression analysis is concerned with identifying the nature of the relationship between two (or more) operationalised concepts (variables).
The idea of regression is to discover the underlying relationship between two concepts on the basis of a set of observations.
Regression procedures begin by presupposing some sort of underlying relationship, then attempt to identify exactly what this relationship is. In social science the presumption is usually a linear relationship.
Regression procedures are normally applicable to interval scale data.
Types of regression analysis
Simple regression is the two variable (bivariate) case where the relationship is specified as a regression line for the dependent variable (Y) on the independent variable (X) which takes the general form of a straight line Y = a + bX. (Bivariate regression may be a non-straightline formula, where a clear underlying polynomial relationship is estimated, see polynomial regression. below)
Regression procedures then attempt to locate a specific relationship (in the form of a straight line) that best represents the observed data, a sort of 'average' relationship.
There are various methods of determining this average or best underlying relationship. These range from the 'subjective' assessment of the researcher to more 'objective' measurements, which determine the unique line that minimises variation.
When the data is interval, and the minimum variation is based on the standard deviation this is known as the least squares regression line of best fit, and is the 'paradigm' (exemplary) approach to regression analysis.
Multiple regression is the case in which Y is seen as a function of a number of independent variables (X1, X2,... Xn). This is a procedure used in multivariate analysis.
In the social world it is unlikely that a simple association between one independent variable and a dependent variable can be established. For example, income may be dependent on occupation, but that alone is insufficient, as age, gender, geographic location and so on, also effect income levels.
What the multiple linear regression line shows is the relationship between a single dependent variable and any number of independent variables.
The multivariate linear regression line takes the general form Y = a + b1X1 + b2X2 + b3X3 +....+ bnXn
where there are n independent variables and each has a coefficient (b1, b2 etc).
The size of the coefficient provides an indicator of the relative importance of the the individual independent variables (the Xs) in determining the dependent variable (Y).
Conclusions based on multivariate analysis must be made in light of the following:
1. the equation is (usually) based on a sample and thus the results are prone to sampling variation (sampling error).
2. the equation shows a mathematical relationship between those items chosen as independent variables and the one chosen as dependent. Important variables may have been ommitted and the assumption of a deterministic relationship between Y and the selection of Xs may be entirely false. Therefore no causal connection is proven.
3. the equation says nothing about the extent of the association between the dependent and independent variables.
Polynomial regression occurs when the underlying relationship is taken to be a curve (such as a quadratic function). However, these are not widely used in social science.
copyright Lee Harvey 2012–2017
copyright Lee Harvey 2012–2017