The full quiz is here. The answers appear below. Comments, which are not part of the answers, are italicized.
Time: 20 minutes. Open book, open notes.
1. A simple random sample of soils in a "background" location yielded lead concentrations of 100, 150, 200, 250, and 300 mg/Kg. A simple random sample of soils at a nearby industrial site yielded lead concentrations of 100, 300, 800, and 1200 mg/Kg.
Test, with 95% confidence, whether these samples may have come from the same underlying Normally distributed population of lead concentrations. Use the t-table below. It shows values of upper percentage points of Student's t distribution for 3, 4, ..., 9 degrees of freedom (df).
| Alpha | |||
|
df |
90% | 95% | 97.5% |
| 3 | 1.638 | 2.353 | 3.182 |
| 4 | 1.533 | 2.132 | 2.776 |
| 5 | 1.476 | 2.015 | 2.571 |
| 6 | 1.440 | 1.943 | 2.447 |
| 7 | 1.415 | 1.895 | 2.365 |
| 8 | 1.397 | 1.860 | 2.306 |
| 9 | 1.383 | 1.833 | 2.262 |
The background mean is (100 + 150 + 200 + 250 + 300)/5 = 200 mg/Kg. The site mean is (100 + 300 +800 + 1200)/4 = 600 mg/Kg. The difference in means is 400 mg/Kg.
The background residuals are -100, -50, 0, 50, and 100. Their sum of squares is (-100)2 + ... + (100)2 = 25,000. The site residuals are -500, -300, 200, and 600. Their sum of squares is (-500)2 + ... + (600)2 = 740,000.
The total sum of squared residuals is therefore 765,000. The pooled estimate of variance (see the text, formula 7.69) is 765,000/7 = 109,286.
The variance of the difference of means is 109,286 * (1/5 + 1/4) = 49,179. Its square root, 222, is the standard error of the difference of means.
The t-statistic therefore is 400 / 222 = 1.80. The degrees of freedom are (5-1) + (4-1) = 7 (text, formula 7.72). To test with 95% confidence for a (two-sided) difference, use the 97.5% column (this splits the possible 5% error evenly in the high and low directions). The critical value for 7 df is 2.365, much greater than 1.80. There is no "significant" difference at the 95% level.
There is not even a significant difference at the 90% level, as the 95% column at df=7 attests. The P value of this t statistic is 11.4%.
One could make a case for applying a one-sided test: that is, to look for an increase relative to background, rather than just a change. The one-sided P-value is 5.7%, not considered significant by most scientific journal editors, nor for most regulatory purposes.
2. Why are these test results invalid?
The evidence is strong that the site lead concentrations are much more variable than the background concentrations. The unbiased estimator of site standard deviation is sqrt(740,000/3) = about 500, whereas the estimator of the background sd is sqrt(25,000/4) = about 80. The assumption of similar standard deviations is violated. The high onsite variability is masking a potentially large difference in means and makes this difference appear "insignificant." This abuse of the t-test is carried out all the time. There are still RCRA groundwater permits that require this mis-application of the t-test for determining whether downgradient groundwater concentrations "significantly" exceed upgradient concentrations.
Scoring: The passing score is 90.
![]()
Return to the Environmental Statistics home page
This page is copyright (c) 2001 Quantitative Decisions. Please cite it as
This page was created 28 March 2001 and updated 7 April 2003 to clarify the discussion at the end of problem 1.