Collectively, the final project reports exhibited a mastery of all the techniques presented in class. However, each report exhibited some deficiencies or problems. The commonest of these are discussed here.
Whenever possible, you should reproduce the data in a statistical report, at least to the extent of displaying them graphically in detail. This will allow the interested reader to reproduce your results.
This reproducibility criterion is one of the foundations of modern science. It is a recurring theme in statistical reports. For instance, when you report the results of a statistical test, you need to provide details about the test and its calculation so it can be reproduced.
Always perform exploratory data analysis. It does not matter whether EDA is "required" or not. Without exception, the reports that did not perform EDA (or did not do it effectively) made significant errors that would have been made obvious with the simplest form of EDA.
Tables and figures should stand on their own. Do not be afraid to include lengthy captions if necessary. Describe the columns of tables and the axes of graphs. Short headers or computer variable names (such as "Result_1XX") are not usually meaningful. Provide units of measurement. Make sure the titles correctly describe the table or figure and that they distinguish the tables and figures from each other. Omit irrelevant or unnecessary material. (If the computer produced a number or figure, but you do not understand exactly what it means, then do not present it.)
Monitoring data are associated with specific times. A basic question about all such data is whether they are changing over time. In order to conduct the kinds of tests you know, you have to verify that these data behave as if they were tickets obtained from the same box each time and that no draw influences the outcome of any other draw. This is statistical independence. However, monitoring data behave this way only when the sampling times are sufficiently well separated.
The most basic technique for exploring such time series data is by plotting values (vertical axis) against time (horizontal axis). Clustering and regular, smooth fluctuation of value over time indicate lack of independence. Apparently random "wiggling" about a horizontal value suggests independence.
Most situations are complex. They require small batches of data to be aggregated or large batches of data to be divided into groups. Whenever you are presenting descriptive statistics, making comparisons, or giving the results of a test, state clearly and explicitly exactly which sets of data are being described, compared, or tested.
When using a statistical test of hypothesis, the principle of reproducibility implies you must describe
It is conventional to tabulate test results, especially when performing more than one test.
You have learned many powerful techniques for assessing data. So, for example, if you want to detect outliers, then compute a five-letter summary, construct the fence and outer fence, and classify values as "outside" or "far outside". If you want to compare batches to batches, then construct a Q-Q plot. If you want to assess whether a batch is approximately Normal, then draw its Normal probability plot. Use robust statistics such as H-spreads and the MAD to describe the spreads of data.
Avoid using software just because it is there. Reports that include redundant, contradictory, and uninterpreted computer output signal that the author is relying on the computer to think. Reports that include the results of sophisticated but obscure tests (such as Grubbs' test for a single outlier, chosen by several people) err by not describing those tests or justifying their use. Choose the test according to your decision making needs, not according to your computing capabilities.
Question automatic computer output. For example, Excel will calculate and display a "trend line" for any scatterplot. Unless this line closely approximates the data, it probably does not belong in the graphic. Delete it. Statistical packages give you a host of descriptive statistics. For example, a package some of you chose to use automatically reports "Fisher's g1" and "Fisher's g2." Don't know what they mean? Then edit them out (or learn what they mean and decide whether they are useful to you, then explain them to the reader).
The purpose of many tests in environmental statistics is to compare conditions to standards. For example, a regulation may require that a process produce an effluent "not exceeding 10 ppb lead." As it stands, that is an ambiguous criterion. Should the average concentration be less than 10 ppb? If so, averaged over what period of time? Or should all concentrations be less than 10 ppb? If so, all concentrations out of how many observations? Or should the 90th percentile of all concentrations be less than 10 ppb?
The mark of a true criterion is that you can determine unambiguously whether it has been met or not. That means the criterion must explicitly provide a formula, applicable in all cases, that states definitely whether a set of observations meets or does not meet the criterion.
Some examples of adequate (but informally stated) criteria are "the mean onsite soil concentration must not exceed the mean background soil concentration;" "any running seven-day mean lead concentration must be less than 10 ppb;" and "all groundwater concentrations must be less than or equal to the MCL."
Because one attempts to meet criteria with observations, and those observations have random components, no criterion is met with certainty. This implies that a confidence (or significance) level is usually needed to make any criterion truly unambiguous.
The conventional 0.01 (1%) and 0.05 (5%) levels derive from habit and the limitation of tables published in the early 20th century. Unless you are preparing a report for publication in a journal that requires such levels, or are preparing a report for a regulatory agency (such as the US EPA) that requires such levels, then you have no reason to use these values.
Remember that test levels and confidence are related to risk. One of the most valuable effects of writing a statistical report is that it forces you to consider the elements of risk, which are losses and their probabilities. Often the mere awareness of these within an organization is a great leap forward. Whenever possible, try to understand the potential losses in a statistical problem and choose test levels appropriate to manage those losses.
Any value that is computed from observations consequently is not "known" and is not some "bright line" or constant number: it is just an estimate, subject to uncertainty. It is usually a mistake to use an estimate in later calculations as if it is a constant value.
The independence assumptions of t tests (and of almost all tests that compare two or more batches of numbers) imply
| It is not valid to use a t test to compare a subset of a batch to the entire batch. | |
| It is not valid to use a t test to compare two batches that have some data in common. |
It is tempting to insert the words "statistical" and "significant" at every opportunity. They make a report sound statistical and significant, don't they? The problem is that in most cases "statistical" is meaningless and "significant" has a very special meaning.
A result is significant only when you have conducted a test of significance and obtained a sufficiently small P-value. Whenever you insert the word "significant" in a report, make sure you have included the details of the test you performed (see "reporting test results" above) to support the finding.
Other words with special statistical meanings are "independent," "random," "sample," "distribution," and "correlated." Use these with care and precision.
The memorizing you did in this course has a point: namely, to put at your disposal simple criteria for determining whether your calculations are correct. For example, if you compute that the probability of a standard Normal variable falling between -0.2 and +0.2 is 68%, you should know immediately an error has occurred (because you memorized the fact that the probability is 68% of the variable being between -1.0 and 1.0).
It is so easy to make mistakes, especially in complex calculations, that you should routinely check your answers as many ways as possible. Constantly ask,
| Are the answers internally consistent? | |
| Do the descriptive statistics and test results agree with the EDA results? Are the EDA results consistent with one another? | |
| Do the answers agree with rough approximations that can be done quickly with pencil and paper? | |
| Do the recommendations (that flow from the statistical calculations) make sense? Are they consistent with what is known or expected? |
![]()
Return to the Environmental Statistics home page
The URL for this page is
This page was created 27 April 2001 and last updated 27 April 2001.