Solution to Quiz 1

The full quiz is here.  The answers appear below.  Comments, which are not part of the answers, are italicized.

Here are the data again:

Standard: 25.8

Measurements (partial listing): 26,23,25,25,21,22,27

1.    Construct a stem-and-leaf diagram of these measurements.

Here are some reasonable solutions:

x 1
 1    21|0
 2  H 22|0
 3    23|0
      24|
(2) M 25|00
 2  H 26|0
 1    27|0
x 10
 1    2*|1
 3  H 2t|23
(2) M 2f|55
 2  H 2s|67
      2.|
x 10
 3   H 2*|123
(4) MH 2.|5567
The second column uses a new technique: divide the range 20-30 into bins of two: 20-22, 22-24, ..., 28-30.  They are labeled "*", "t" (for "two"), "f" (for "four"), "s" (for "six"), and ".".  (The final "." is unnecessary for these data, but appears here to show the full technique.)  The first and second columns do a good job of showing the distribution ("shape") of the data.  The third column is not very effective, but it accomplishes more than the following, which merely lists the numbers in a condensed form--an almost pointless exercise.
x 10
(7) HMH 2*|1235567

To get full credit, you needed to:

Produce a stem-and-leaf plot showing the data distribution (that is, something like the first three columns).
Label the plot with the depths.
Label the plot with the medians and hinges.
Indicate (as shown above each of the solutions) the units of the stems.

It would also be nice to sort each set of leaves, but that is not a requirement for this quiz.

2.    Compute the letter summaries and draw a 5-letter summary table.

Because N=7, the order of the median is (N+1)/2 = 4 and the depths of the hinges are (4+1)/2 = 2h (meaning halfway between order statistics 2 and 3).  The value at depth 2h from the top is (26+25)/2 and the value at depth 2h from the bottom is (22+23)/2.  Therefore the 5-letter summary table is

Seven CO measurements, ppm:
M       25
H  22.5    25.5    3
1  21      27      6

The commonest mistake was to ignore the "h" terms in the "2h" depths, thereby computing the hinges as 22 and 26 and the H-spread as 4 instead of 3.  It is helpful to indicate the number of measurements clearly, as shown above, but this was not required for full credit.

3.    Draw a boxplot.  Show the invisible "fences".  How many outliers are there?

The fences are at 25.5 + 3*1.5 = 30 and 22.5 - 3*1.5 = 18 (but are not shown on this computer-generated plot).  There are no outliers.  This boxplot could be improved by indicating the unit of measurement (ppm) on the axis.

A plot like this is less effective because, by expanding the range, it reduces the resolution of the graphic:

Sometimes it is helpful to show the zero value, but in this case zero is not relevant: the issue concerns the relationship between the measurements and the standard (25.8), not the absolute concentrations themselves.

A plot like this is not effective because it provides insufficient resolution for reading the box and whiskers:

4.    Compute these summary statistics: mean, H-spread, MAD.  Show your work.

a.    Mean = 20 + (1+2+3+5+5+6+7)/7 = 20 + 29/7 = 24 1/7 ~ 24.1, approximately.

b.    H-spread = Upper hinge - lower hinge = 25.5 - 22.5 = 3.

c.    MAD = Median absolute deviation (from the median: we use medians for robustness, so it makes little sense to compute a median absolute deviation from a mean).  The median is 25, so the deviations are -4, -3, -2, 0, 0, 1, 2 and the absolute deviations (in order) are 0, 0, 1, 2, 2, 3, 4.  N has not changed, so the median is still the fourth order statistic, which is 2.

Learn the technique illustrated in (a) to compute means--at least approximately--quickly and easily.  For example, using mental arithmetic only, compute the mean of {872490, 872491, 872492, ..., 872499}.  You should be able to get this answer faster than you could write down the numbers.  (Answer: 872494.5)

Common mistakes included computing deviations from the mean and failing to compute absolute values.  It is useful to know that the mean deviation from the mean is always zero.  Consequently (using the technique in (a)), the mean deviation from the median will be exactly the difference between the median and mean: that is related to the shape (skewness) of the data, but not to its spread.

5.    Are these measurements consistent with the standard (the correct value)?  List the summary statistics that support your answer.

The standard is 25.8.  By comparison, we have previously computed that the mean and median (which express "central" values of a batch) are 24.1 and 25, respectively.  The standard differs from these values by 1.7 and 0.8, respectively.  Are these large deviations?  We use a measure of spread to see.  The H-spread is 3 and the MAD is 2.  Thus the center of the batch (however measured) deviates from the standard by less than either measure of spread.  This is good (albeit informal) evidence that the measurements are consistent with the standard.

6.    State, qualitatively, how your answers to #2, #3, and #4 would change if the value of 21 were changed to 210.

Because the median, hinges, H-spread, and MAD are highly resistant to changes in data, we know they are unlikely to vary much even when a single value is dramatically changed.  However, the maximum of 27 will change to 210 in the 5-letter summary and the mean will increase by (210-21)/7 = 30 - 3 = 27.

General comments

Note that questions 5 and 6 did not require any additional calculation.  A great deal of the art of thinking about data is to learn to do so with a minimum of calculation: use what you already have and use your knowledge of the properties of statistics (like medians and H-spreads) instead.

Return to the Environmental Statistics home page

This page is copyright (c) 2001 Quantitative Decisions.  Please cite it as