OrientationObservationIn-depth interviewsDocument analysis and semiologyConversation and discourse analysisSecondary Data
SurveysExperimentsEthicsResearch outcomes
Conclusion

8.3.12.7 Measures of dispersion Sometimes it is necessary to know how spread out (dispersed) the data is around the average. For example, the data in Table 8.3.12.3 covers a wide range of years that is not apparent from simply knowing the average. It is possible to measure how spread out the data is. Measures that do this are called measures of dispersion. There are three commonly-used ones: the range, the interquartile range and the standard deviation.

8.3.12.7.1 Range
The range is the difference between the highest and lowest value for a variable. When the data is interval the range can be expressed as a single number. The range for the ages in Table 8.3.12.3 is 98 - 12 = 86 years.

We can also talk about a range of ordinal values from â€˜strongly agree to strongly disagreeâ€™. It is not really meaningful to talk about the range of nominal categories, such as â€˜Design and Printingâ€™ to â€˜Catering and Tourismâ€™ (V34 in the CASE STUDY survey ) as this provides us with no way of knowing that â€˜Government and Sociologyâ€™ students are included in the â€˜rangeâ€™.

8.3.12.7.2 Interquartile range
The interquartile range is the difference between the third quartile and the first. This means splitting the distribution in half, using the median, then find the median for each half, i.e. split the distribution into quarters.

The first quarter point is known as the first quartile Q1, the second, Q2, is the median of the whole distribution and the third quarter point is known as the third quartile Q3. The interquartile range is the difference between Q3 and Q1.

The interquartile range can be applied to ordinal, interval and ratio data.

Why bother with the interquartile range? The range of a distribution can be very misleading when a distribution has a few very high or very low values as it makes it seem as though the distribution may be spread out, although the vast majority of the distribution may be clustered around the median (as in Table 8.3.12.3). The interquartile range just shows the range of the middle half of the distribution and is sometimes a better indication of the spread or concentration of a distribution.

8.3.12.7.3 Standard deviation
The standard deviation is the most useful measure of dispersion for several reasons, not all of which are initially apparent. Its most obvious advantage is that it takes into account all the data in a distribution, as does the arithmetic mean. The standard deviation is a measure of the dispersion around the arithmetic mean and thus interval scale date is required. Like the arithmetic mean, the standard deviation is distorted by highly skewed data.

The standard deviation is, at first glance, a peculiar measure. It is the square root of the variance. The variance is defined as the mean of the squared deviation of each value of the variable from the mean of the variable. That sounds complex and not at all intuitive. The following is what is involved in computing the standard deviation.

1. Work out the arithmetic mean of the values in the sample.
2. Calculate the difference between the mean and the value for each value in the sample (that is, subtract the mean from each separate value in the sample).
3. Square all the differences (the minus differences all become positive). 4. Add up all the differences. (This gives you the total of the squared differences.)
5. Divide the total by the sample size. (This gives you the mean of the squared differences from the sample mean.) This is the variance.
6. The standard deviation is the the square root of the variance.

This appears to be rather complicated just to measure the spread of a distribution, and conceptually it is a bit cumbersome. However, it is a measure that takes account of all the values in a frequency table, unlike the range, which is only concerned with the extremes. In addition it is an important measure of dispersion when it comes to making generalisations from samples of interval or ratio scale data (See Section 8.3.12.10).

The following formula provides a short cut for working out the variance from a frequency table:

variance = (Total of (Frequency multiplied by value of x squared) divided by sample size)â€“ (Mean of x) squared

or rather more compact

Variance = ∑fx**2/n - (∑fx/n)**2

where x is a value of the variable,
∑ means 'Total'
/ = divide,
n = sample size,
f = frequency
**2 = squared.

Mean of x = total fx/n.

Before you decide that you have seen quite enough formulas and this one is beyond a joke, donâ€™t give up because this is actually much less complicated than it seems.

Using the data in Table 8.3.12.3 this works as shown in Table 8.3.12.7. The standard deviation for the data in Table 8.3.12.3 is 15.626 years. Loosely translated this means that the values deviate from the arithmetic mean by an average of 15.626 years.

Table 8.3.12.7 Calculation of standard deviation

Value
x

Frequency
f

Frequency x value
fx

Frequency x value squared
fx2

12

1

12

144

14

2

28

392

15

1

15

225

16

28

448

7168

17

2

34

578

18

33

594

10692

19

1

19

361

20

8

160

3200

21

26

546

11466

25

2

50

1250

30

2

60

1800

60

1

60

3600

78

1

78

6094

80

1

80

6400

98

3

294

28812

Total

n=112

∑fx=2478

∑fx**2=82172

Variance = 82172/112 - (2478/112)**2
Variance = 733.678 - 489.516
= 244.162
Standard deviation = √variance = √244.162 = 15.6 to one decimal place

While it is easy to grasp the idea and point of an average it is not so easy to grasp the idea of the standard deviation or the point of it. The standard deviation measures the variation around the mean. In other words, the smaller the standard deviation the more concentrated the data is around the mean. If the standard deviation were small, say two years, then this would mean that most people in the sample would agree that the age of consent should be within a narrow range either side of the mean.

So in this case, 15.6 years is a lare standard deviation, because there are several very high values that makes the variation much larger on average.

Why is it important to calculate the standard deviation? Because it is important when generalising the sample results to the population. If the sample varies a lot then it less likely that one can be precise about the average of the population from which the sample was taken (see Section 8.3.12.8 and Section 8.3.12.9).

Activity 8.3.12.7
Compute the standard deviation and range for the age of consent for women (V21) and compare them with those for men (V22).