Solution to Practice Quiz 9

The full quiz is here.  The answers appear below.  Comments, which are not part of the answers, are italicized.

Time 20 minutes.  This quiz is open book, open notes.

Suppose you have developed a 95% confidence interval (CI) of the mean of some pollutant concentration based on 26 biweekly measurements in the year 2000 obtained at an air monitoring station.  The interval is (8, 17) (in parts per million by volume).

1.    Indicate which of the following statements are correct and which are incorrect.  Provide reasons for each.

  1. There is a 95% probability that the mean measurement in the year 2001 will be between 8 and 17 ppm.
  2. There is a 95% probability that the mean measurement in the year 2001 will be greater than 17 ppm.
  3. 95% of the measurements obtained in the year 2000 were between 8 and 17 ppm.
  4. 5% of the measurement obtained in the year 2000 were between 8 and 17 ppm.
  5. 95% of a statistical distribution that best fits the 26 measurements lies between 8 and 17 ppm.
  6. If someone else had obtained 26 independent measurements of the same quantity during the same period using the same sampling technique at the same location, then their estimate of the mean (which uses the same formula as yours) is 95% likely also to be between 8 and 17 ppm.
  7. You used a procedure to compute an interval that has a 95% chance of covering the true mean.
  8. You used a procedure to compute the interval that has a 100% chance of covering the true mean in 95% of all possible situations.
  9. There is a 95% probability that the next measurement will be between 8 and 17 ppm.
  10. There is a 95% probability that every one of the next 26 measurements will be between 8 and 17 ppm.
  11. At least 95% of the next 26 measurements will be between 8 and 17 ppm.
  12. No more than 95% of the next 26 measurements will be between 8 and 17 ppm.
  13. In thousands of computer simulations of 26 measurements, 95% of them had a mean between 8 and 17 ppm.
  14. The value of 8 equals the mean plus t times the standard error of the data, where t is the 2.5 percentage point of Student's t distribution with 25 degrees of freedom, and the value of 17 equals the mean plus t times the standard error, where t is the 97.5 percentage point of Student's t distribution with 25 degrees of freedom.
  15. Same as n, but read "standard deviation" in place of "standard error."

a.    Incorrect.  This is a statement of a prediction interval (for the mean of the next 26 measurements), not a confidence interval.

b.    Incorrect.  This is a statement of a lower prediction limit.  It ignores the lower value of 8, too.

c.    Incorrect.  This is a statement about a set of measurements, not about their mean.

d.    Incorrect.  This is a statement about a set of measurements, not about their mean.

e.    Incorrect.  This is a statement about a tolerance interval.

f.    Incorrect.  This looks correct, but is not: the someone else would have a 95% probability that their interval would cover the true mean (by definition of CI), but there is no assurance that their estimate of the mean would be covered by your interval.  Indeed, this statement is a prediction about the mean of 26 independent measurements.

g.    Correct.  This is the definition of a confidence interval.

h.    Incorrect.  This statement makes little sense: how much is 95% of "all possible situations" in the (usual) case of infinitely many possible underlying distributions?  And in most cases, it is not worth while demanding 100% probability of anything.

i.    Incorrect.  This is a statement of a prediction interval for the next measurement.

j.    Incorrect.  This is a statement of a prediction interval for the minimum and maximum of 26 measurements.

k.    Incorrect.  This is a statement of a k-of-m simultaneous prediction interval, where m=26 and k=95% * 26 = 25.

l.    Incorrect.  This is a statement of a k-of-m simultaneous prediction interval, too, albeit of a different kind than (k).

m.    Incorrect.  How did the computer simulate the measurements?  This statement is too vague to be meaningful.

n.    Incorrect.  This is correct only when the underlying distribution is Normal of arbitrary (but unknown) mean and arbitrary (but unknown) standard deviation.

o.    Incorrect.  This is incorrect even for the Normal distribution assumptions.  For a Normal distribution, this computes a tolerance interval.

2.    List at least two statistical assumptions about the data needed for the preceding CI to be correct.

The data must be adequately modeled using tickets in a box.  Thus, they must be (1) statistically independent and (2) have the same underlying distribution (come from the same box).  In addition, the CI computation most likely made assumptions about the possible underlying distribution (contents of the box) and (3) those assumptions need to be a good representation of reality.

Extra credit: list as many more assumptions (statistical, scientific, or practical) as you can in question 2.

See the answer to #2.    Assumption (2) has particularly strong implications, because it requires that the data be obtained in the same way, using the same sampling and analytical technique ("comparability"), and that they be "representative" of all the air for which the CI is meant to apply.  This means in particular that the analytical method should not have introduced any systematic bias in the results (and thereby give a false indication of true concentrations): this is the "accuracy" concern.  Also, we must assume (4) that a formula for a confidence interval (as opposed to some other interval) was actually applied and (5) the computation was carried out correctly.  Problem (1) should have highlighted how important assumption (4) is; to see that assumption (5) is not trivial, consider that many people apply confidence interval formulas (and other formulas) to nondetect values, which is problematic (see chapter 10).

Scoring: The passing score is 85.

Note:  There will be a smaller number of statements for question #1 on the actual quiz..

Return to the Environmental Statistics home page

This page is copyright (c) 2001 Quantitative Decisions.  Please cite it as

This page was created 19 March 2001.