Partial solution to exercise 3.1, Millard & Neerchal, Environmental Statistics with S-Plus

Graphs of nickel concentration by well (Systat output)

Graphs of nickel concentration by month

Stem and leaf plot of all data

Stem and Leaf Plot of variable: NICKEL, N = 20
Minimum:       1.0
Lower hinge:  16.5
Median:       57.4
Upper hinge: 206.5
Maximum:     942

0 H 000111223
0 M 55688
1
1   5
2 H
2   6
3   3
* * * Outside Values * * *
5   7
6   3
9   4

Summary statistics

NICKEL
N of cases   20
Minimum      1.000000000
Maximum      9.42000E+02
Range        9.41000E+02
Median       5.74000E+01
Mean         1.69525E+02
Standard Dev 2.59717E+02
Variance     6.74532E+04
C.V.         1.532030519
Skewness(G1) 1.995673802
Kurtosis(G2) 3.417189905

The mean is about three times the median.

Histogram and density plots

(a)    Raw data

(b)    Natural logarithms

Normal probability plots

(a)    Raw data (uniform scale)

These data do not appear to "come from a Normal population."  A crude but very fast  test is to examine the ratio of the max to the min, which in this case is 942:1.  Whenever this value is large (say, 10 or greater) it is unlikely a Normal distribution adequately describes all the data.  A C.V. of 60% or greater also usually signals a non-Normal shape to the data; the C.V. (see above) is over 150% for these data.  Similarly, the skewness of 2.0 indicates a high degree of non-normality.  (The skewness test is a pretty good one according to Madansky.)  The strong curvature of the normal probability plot demonstrates the departure from normality holds throughout most of the batch (and is not due, say, just to one high outlying value, which could cause the max:min ratio, C.V., and skewness all to be large).

(b)    Logarithmic scale, showing linear smooth

(c)    Logarithmic scale, discriminating wells by color and fill

Note: It is not evident that these data are real.  They appear in EPA guidance, but their source is not given.  They may have been generated randomly (by a lognormal random number generator) or perhaps they were selected from a larger collection of monitoring data because of their apparent lognormality.  Most groundwater data sets do not look this "nice."

Return to the Environmental Statistics home page

This page is copyright (c) 2001 Quantitative Decisions.  Please cite it as

This page was created 14 January 2001