The full quiz is here. The answers appear below. Comments, which are not part of the answers, are italicized.
Time: 20 minutes. Open book, open notes.
On March 21, 2001, the Centers for Disease Control (CDC) issued the National Report on Human Exposure to Environmental Chemicals. This report summarizes measurements of 27 environmental chemicals in the blood and urine of thousands of human test subjects.
The summary statistics are the number of subjects (N), the geometric mean (GM), and the 10th, 25th, 50th, 75th, and 90th percentiles. All measurements are in micrograms per liter (ug/L).
1. The percentiles of the lead results are reported as 0.21, 0.42, 0.80, 1.36, and 2.21 ug/L. Their logarithms are -1.56, -0.87, -0.22, 0.31, and 0.79, respectively. Sketch a normal probability plot of the logarithms. [One minute]
Use the Normal scores from the "value" row of the table at the end of the quiz, and exploit the symmetry of the Normal distribution to determine the scores for the 10th, 25th, and 50th percentiles.

It would be a mistake simply to compute the average and standard deviation of the numbers provided. They are not a batch of measurements, but selected summary statistics. Treating them as if they were five independent measurements would be a serious error.
2. "Eyeball" the slope and intercept of the normal probability plot in #1 to estimate the standard deviation and mean of the logarithms, respectively. [One minute]
The fitted line rises about 2.4 units for a 2.6 unit change in normal score. The slope therefore is about 0.9. The intercept is about -0.3. Thus the sd is about 0.9 and the mean is about -0.3.
3. Write down formulas you could use to estimate the mean and standard deviation of the lead data using your answers to #2. You don't have to evaluate the formulas, but make sure they contain numbers only--no variables. [Two minutes]
The MLE estimator of mean is m = exp(-0.3 + (0.9)2/2) = 1.1. The MLE estimator of standard deviation is m * sqrt(exp((0.9)2) - 1) = 1.2 This is a review question: see the solution to quiz 8, problem 1b.
4. 1007 subjects were tested for lead. Suppose the correct answers to #3 are a mean of 0.9 ug/L and standard deviation of 0.8 ug/L. Write down a formula for a lower 90% confidence limit of the mean. You don't have to evaluate it, but make sure it contains numbers only--no variables. [Two minutes]
LCL = mean - t(1006, 0.90) * sd / sqrt(N). We can closely approximate t by the Normal percentage point, which according to the table below is 1.28. Therefore the formula is
LCL = 0.9 - 1.28 * 0.8 / sqrt(1007) = 0.87.
5. Write down (in words, not formulas) what the answer to #4 means in terms of the blood lead concentrations in people. [One minute]
According to these measurements, there is a 90% chance that the average blood lead concentration in all people equals or exceeds 0.87 ppb.
This is a review question, too. Some incorrect answers are:
| "90% have lead greater than 0.87 ppb" -- The UCL is a statement about a mean, not about a proportion | |
| "10% of the lead concentrations will exceed the true mean. 90% of the lead concentrations are below the mean." -- This is ambiguous: which lead concentrations--sample or population? Which mean--sample or population? Regardless, this is incorrect because the UCL is a statement about the mean, not about a proportion. | |
| "The mean of samples selected from the study will be greater than the lower confidence limit 90% of the time." -- This is incorrect because the mean of the study samples will always exceed the lower confidence limit. The mean is the sum of the values divided by 1007; the LCL is computed by subtracting t * sd / sqrt(N) from that value, and so is necessarily smaller. |
6. Extra credit (valid only after answering questions 1-5). Compute the answers to #3 and #4. Use your answer to #3 in the computation for #4.
LCL = 1.1 - 1.28 * 1.2 / sqrt(1007) = 1.05.
7. Extra credit. What is the problem with using the original lead results in #1 and #2 to estimate the mean and standard deviation directly (rather than going through the logarithms), thereby avoiding the formulas in #3?
The original results are highly skewed and evidently non-normal. It is therefore difficult to estimate the mean or standard deviation from a probability plot, because it will be strongly curved:

Scoring: The passing score is 85.
Note: Here is a part of the Normal CDF.
| Percentage point | 75% | 90% | 95% | 97.50% | 99% |
| Value | 0.67 | 1.28 | 1.65 | 1.96 | 2.33 |
![]()
Return to the Environmental Statistics home page
This page is copyright (c) 2001 Quantitative Decisions. Please cite it as
This page was created 28 March 2001.