Homework 12

Link to previous assignments.

If you cannot obtain an answer or are unsure of your answer, discuss the problem with each other or with anyone else who might be helpful.

Hints and answers to HW 12

Due Assignment
10 April  Final project

Purpose:  To apply what you have learned about statistical models to quantify the uncertainty in data, select and perform statistical tests, and interpret the results..

Data: Use the same dataset you analyzed for the mid-term project or select a new one.  If a new one, you will need to include an exploratory data analysis (EDA) section in your report to show you have closely examined the data.  You will not, however, need to provide the same level of detailed interpretation of the EDA results expected for the mid-term project.

Scope: Appropriately aggregate or segregate the data into portions that can be statistically modeled.  Justify your aggregation or selection procedures.  Identify a problem you will address with the data analysis.  State the decision(s) or action(s) that will result from your solution.  Establish limits on decision errors.  Select an appropriate statistical test or tests.  Establish that those tests are appropriate for the data.  Perform those tests.  Interpret the results in clear precise language.  Identify any limitations, weaknesses, or pitfalls in your analysis.  Describe how the data could have been improved to support the decision making process better.

Format:  Write your results as a memorandum or report.  Provide an electronic version of the data in text or spreadsheet format.  Keep the length to twelve pages, double-spaced, or less, plus figures and tables.

Medium:  Deliver your paper as a web page, a Word document, or in hardcopy.  You must deliver the data in electronic format (a reference to a web page where the data can be downloaded is acceptable).

Evaluation:  Generally correct, complete work that demonstrates effective use of the techniques taught in this course will earn at least a B.  Errors in computation or explanation will reduce the grade, as will incomplete or ineffective approaches.  Thoughtful, complete analysis, assessment of assumptions, and appropriate application of statistical techniques to your data will increase the grade.

3 April
  1. Finish reading chapter 3.  Read chapter 9.
  2. Text problem 7.10:
    "Use the aldicarb data [below] to show that the Wilcoxon signed rank test is really the same thing as the one-sample permutation test based on the signed ranks.  That is, for each well, create a vector that contains the signed ranks of the quantities (Aldicarb-7), then perform a one-sample permutation test for the null hypothesis that the population mean of these quantities is 0."
Month Well 1 Well 2 Well 3
Jan 19.9 23.7 5.6
Feb 29.6 21.9 3.3
Mar 18.7 26.9 2.3
Apr 24.2 26.1 6.9
  1. Text problem 7.11:
    a.  "Plot the pdf of a t-distribution with 12 degrees of freedom and add a vertical line at x=5.66.
    b. "Explain what part of this plot represents the p-value for the test of the null hypothesis that the average sulfate concentrations at two wells are the same against the alternative hypothesis that the average concentration of sulfate at the downgradient well is larger than the average concentration at the background well."
    (Use the Student's t-distribution spreadsheet if you like.)
  2. Do the T test criticism exercise.
  3. * (This is part of problem 2.1 of Kiefer.)  X is a B(1, p) random variable.  It is known only that 0 <= p <= 1 and it is desired to guess the value of p on the basis of X.  If the guessed value is d and the true value is p, the loss is (p-d)2.
    (a)  Specify the sample space, states of nature, decision space, and loss function.
    (b)  Determine and plot on the same graph the risk functions of the procedures t1, t2, t3, t4, t5, and t6 that are defined as follows:

t1(x) = x

t2(x) = (2x + 1)/4

t3(x) = (x + 1)/3

t4(x) = 1/2

t5(x) = 1 - x

t6(x) = 0.

(c)  From these calculations, can you assert that any of these six procedures is inadmissible?  ("Inadmissible" means there exists another procedure that consistently has equal or lower risk, regardless of the true value of p.)

(d)  On the basis of the risk functions, if one of these six procedures must be used, which procedure would you use, and why?  (Note: Do not consult any references in answering this.)

(e) (Kiefer's problem 2.1i)  ** Show that t6 is admissible.  (Hint: If t' is better than t6, show that rt'(0) <= rt6(0) implies t'(0) = 0 and then compare the two risk functions for p near 0 in terms of t'(1).)

(f)  ** Repeat (b), (c), and (d) for the loss function (p-d)2/[p(1-p)].

  1. Do practice quiz 12.
  2. Use t-tests to compare data among all three wells--Well 1 versus Well 2, Well 1 vs 3, Well 2 vs 3--shown in problem 2.  Construct a confidence interval for the difference in means between wells 1 and 2 (text formula 7.81).
  3. [Added 30 March]  The U.S. EPA's Public Review Draft, "Van Duzen River and Yager Creek Sediment Total maximum Daily Load" (October 1999; available at http://www.epa.gov/region09/water/tmdl/vanduzendraft.pdf), contains the following analysis:

    "The distribution of erosional features sampled during this analysis is not normal, but it is highly skewed, with a wide range of various sized sediment sources.  [1] Consequently, Tchebysheff's theorem indicates that at least 75% of the measurements in any sample population must lie within 2 standard deviations of their mean value.  [2] Thus, a single estimate of the erosion total has at least a 75% probability of lying within 2 standard deviations of its true value, regardless of the sample distribution. ...  [3] Therefore, the standard error ... is 1 standard deviation (in yards) for the total erosion, [4] and the 95% confidence interval is 2 standard deviations (in percent) for the total erosion.  [5] The 95% confidence interval indicates the plus or minus sediment yield volume which could be added to the total to have confidence that the true past sediment yield has been estimated."

    [Section 4.G, page 43.  The numbered references have been added.]

    This paragraph contains five numbered statements.  (The numbers precede the statements.)  Using a generous interpretation, one of them may be true; the others, however one attempts to remove their ambiguities, are false.  Identify the false ones and determine why they are incorrect.

Return to the Environmental Statistics home page

The URL for this page is

This page was created 24 March and last updated 28 March.