Measures of Variability

Statisticians use summary measures to describe the amount of variability or spread in a set of data. The most common measures of variability are the range, the interquartile range (IQR), variance, and standard deviation.

The Range

The range is the difference between the largest and smallest values in a set of values.

For example, consider the following numbers: 1, 3, 4, 5, 5, 6, 7, 11. For this set of numbers, the range would be 11 - 1 or 10.

The Interquartile Range (IQR)

The interquartile range (IQR) is a measure of variability, based on dividing a data set into quartiles.

Quartiles divide a rank-ordered data set into four equal parts. The values that divide each part are called the first, second, and third quartiles; and they are denoted by Q1, Q2, and Q3, respectively.

Q1 is the "middle" value in the first half of the rank-ordered data set.
Q2 is the median value in the set.
Q3 is the "middle" value in the second half of the rank-ordered data set.

The interquartile range is equal to Q3 minus Q1.

For example, consider the following numbers: 1, 3, 4, 5, 5, 6, 7, 11. Q1 is the middle value in the first half of the data set. Since there are an even number of data points in the first half of the data set, the middle value is the average of the two middle values; that is, Q1 = (3 + 4)/2 or Q1 = 3.5. Q3 is the middle value in the second half of the data set. Again, since the second half of the data set has an even number of observations, the middle value is the average of the two middle values; that is, Q3 = (6 + 7)/2 or Q3 = 6.5. The interquartile range is Q3 minus Q1, so IQR = 6.5 - 3.5 = 3.

An Alternative Definition for IQR

In some texts, the interquartile range is defined differently. It is defined as the difference between the largest and smallest values in the middle 50% of a set of data.

To compute an interquartile range using this definition, first remove observations from the lower quartile. Then, remove observations from the upper quartile. Then, from the remaining observations, compute the difference between the largest and smallest values.
For example, consider the following numbers: 1, 3, 4, 5, 5, 6, 7, 11. After we remove observations from the lower and upper quartiles, we are left with: 4, 5, 5, 6. The interquartile range (IQR) would be 6 - 4 = 2.
When the data set is large, the two definitions usually produce the same (or very close) results. However, when the data set is small, the definitions can produce different results.

The Variance

In a population, variance is the average squared deviation from the population mean, as defined by the following formula:

σ² = Σ ( X_i - μ )² / N

where σ² is the population variance, μ is the population mean, X_i is the ith element from the population, and N is the number of elements in the population.

The variance of a sample, is defined by slightly different formula, and uses a slightly different notation:

s² = Σ ( x_i - x )² / ( n - 1 )

where s² is the sample variance, x is the sample mean, x_i is the ith element from the sample, and n is the number of elements in the sample. Using this formula, the sample variance can be considered an unbiased estimate of the true population variance. Therefore, if you need to estimate an unknown population variance, based on data from a sample, this is the formula to use.

The Standard Deviation

The standard deviation is the square root of the variance. Thus, the standard deviation of a population is:

σ = sqrt [ σ² ] = sqrt [ Σ ( X_i - μ )² / N ]

where σ is the population standard deviation, σ² is the population variance, μ is the population mean, X_i is the ith element from the population, and N is the number of elements in the population.

And the standard deviation of a sample is:

s = sqrt [ s² ] = sqrt [ Σ ( x_i - x )² / ( n - 1 ) ]

where s is the sample standard deviation, s² is the sample variance, x is the sample mean, x_i is the ith element from the sample, and n is the number of elements in the sample.

Effect of Changing Units

Sometimes, researchers change units (minutes to hours, feet to meters, etc.). Here is how measures of variability are affected when we change units.

If you add a constant to every value, the distance between values does not change. As a result, all of the measures of variability (range, interquartile range, standard deviation, and variance) remain the same.
On the other hand, suppose you multiply every value by a constant. This has the effect of multiplying the range, interquartile range (IQR), and standard deviation by that constant. It has an even greater effect on the variance. It multiplies the variance by the square of the constant.

Test Your Understanding of This Lesson

Problem 1
A population consists of four observations: {1, 3, 5, 7}. What is the variance?

(A) 2
(B) 4
(C) 5
(D) 6
(E) None of the above

Solution
The correct answer is (C). First, we need to compute the population mean.

μ = ( 1 + 3 + 5 + 7 ) / 4 = 4

Then we plug all of the known values into formula for the variance of a population, as shown below:

σ² = Σ ( X_i - μ )² / N
σ² = [ ( 1 - 4 )² + ( 3 - 4 )² + ( 5 - 4 )² + ( 7 - 4 )² ] / 4
σ² = [ ( -3 )² + ( -1 )² + ( 1 )² + ( 3 )² ] / 4
σ² = [ 9 + 1 + 1 + 9 ] / 4 = 20 / 4 = 5

Problem 2
A sample consists of four observations: {1, 3, 5, 7}. What is the standard deviation?

(A) 2
(B) 2.58
(C) 6
(D) 6.67
(E) None of the above

Solution
The correct answer is (B). First, we need to compute the sample mean.

x = ( 1 + 3 + 5 + 7 ) / 4 = 4

Then we plug all of the known values into formula for the standard deviation of a sample, as shown below:

s = sqrt [ Σ ( x_i - x )² / ( n - 1 ) ]
s = sqrt { [ ( 1 - 4 )² + ( 3 - 4 )² + ( 5 - 4 )² + ( 7 - 4 )² ] / ( 4 - 1 ) }
s = sqrt { [ ( -3 )² + ( -1 )² + ( 1 )² + ( 3 )² ] / 3 }
s = sqrt { [ 9 + 1 + 1 + 9 ] / 3 } = sqrt (20 / 3) = sqrt ( 6.67 ) = 2.58