Statistical Tests

The Pennsylvania "75%/10X" Rule

The text of this rule is available at http://www.dep.state.pa.us/dep/deputate/airwaste/wm/landrecy/MANUAL/Manual.htm. SectionIV.doc contains the statistical portion of the PA Act 2 ("Land Recycling") regulations, including the "75%/10X" rule:

"The 75%/10X rule is a statistical ad hoc rule that tests whether the true site median concentration is below the cleanup standard. This rule requires that 75% of the samples collected for demonstration attainment be equal to or below the risk-based cleanup standard and that no single sample result exceeds the risk-based standard by more than ten times."

[Section IV B.5.b)i)(a)]

It is offered as an alternative to traditional rules, such as testing to 95% confidence whether the mean concentration is below the same cleanup standard.

The decision maker should be concerned about the risk associated with use of any rule, especially one that has been made up for the occasion. The risk depends on actual site conditions. If the true median (or mean) is below the standard, the risk is the chance of failing the test. If the true median (or mean) is above the standard, the risk is the chance of not failing the test.

To explore this risk, I simulated samples (sizes N=10 and N=20) from lognormal distributions of varying means, medians, and standard deviations. Each simulation consisted of 10,000 sets of independent samples from the same distribution. Each sample was tested by the 75%/10X rule. Lognormal distributions were selected as simple models of sites with a tendency to produce occasional high values compared to the bulk of the values. Results for other positively skewed distributions should be similar.

Here are the results, expressed first in terms of medians, then in terms of means.

In the legends, s refers to the standard deviation of the logarithms and N to the sample sizes. The units on the horizontal axes are multiples of the cleanup standard, so in both figures a value of 1.00 on the x-axis corresponds to the standard.

Points on the power curves to the left of 1.00 correspond to situations where the site actually meets the standard (the median or mean are less than the standard). Points to the right of 1.00 correspond to situation where the site does not meet the standard.

We want the curves to be low to the left of 1.00 and high to the right of 1.00. A good curve sweeps sharply up from near zero at the left to near 100% on the right.

Notice that the concentration (x) axis for the mean-based curves is logarithmic. This enhances the visual differentiation of the curves.

These results tell us that it's difficult to distinguish contaminated sites from uncontaminated sites with just 10 or 20 samples when the concentrations are lognormally distributed. There is a good chance that a site with mean concentration below the standard will fail the test; there is a good chance that a site with mean concentration exceeding many times the standard will pass the test (especially when the concentrations are highly variable).

The Pennsylvania regulations specify minimum sample sizes (values of N) depending on volumes of soil tested. The smallest allowable sample size is N=8. Evidently, these minima are required to assure that heavily contaminated sites will fail the test.

Comparison with the Lognormal (Land Method) 95% UCL of the Mean

An alternative test allowed by the Act 2 regulations is the Land 95% UCL of the mean (using the infamous "H-factors"). I simulated this test for samples of size N=10 drawn from lognormal distributions of median 1 and varying standard deviations. The test was performed as described in the regulations [Section IV B.8].

The Land test performs as advertised: The failure rate for any mean exceeding the standard is at least 95%. This is evident as all three Land curves (labeled "UCL" and shown with solid lines) pass through the point (1.00, 95%).

Therefore this test controls the false negative rate at the desired level. However, it has shockingly poor performance for sites that meet the cleanup standard. Using this test, if the variability in concentration is high, one would need to achieve a mean concentration much, much lower than the cleanup standard in order to have a decent chance of not failing this test.

You can see the Land curves are much worse than the 75%/10X curves except for s=0.5: they are much less steeply sloped. This test has a very hard time discriminating between "clean" and "dirty" sites.

The Land test will perform better with larger sample sizes. But so does the 75%/10X rule. Furthermore, the 75%/10X rule is likely to maintain its performance characteristics even for non-lognormally distributed (but skew) distributions. The Land test probably will not, because its formulation is specific to the lognormal assumption.

It appears the Pennsylvania citizens' science advisory board (which formulated the 75%/10X test) has created a valid option to the 95% lognormal UCL.

Some Properties of the 75%/10X Procedure

Analysis of the procedure itself can provide insight beyond what the simulations showed, because the simulations evaluated only one possible underlying distribution of concentrations. The following analysis uses what we have learned about the Binomial distribution and the Normal approximation to it, but makes no explicit assumptions about the shape of the concentration distribution.

Consider the borderline case of a soil concentration distribution F whose median F_0.50 is marginally less than the cleanup standard C. How often would this case pass the 75%/10X test?

We need to evaluate two events: (1) that 75% of the N sample results are greater than C and (2) that the maximum result is less than or equal to 10*C.

Consider the first event. A sample is "successful" when its value is less than or equal to C and a "failure" otherwise. Since, by supposition, the true median is close to C, the chance of success (p) is close to 50%. Therefore the probability we seek is a binomial probability with distribution B(N, p). Passing the 75%/10X test means that 0.25N or fewer samples are failures. This probability is therefore

For values of N equal to 10 or greater and p near 0.5, the Normal approximation to the binomial should work well to compute central probabilities. The mean of B(N, 0.5) is 0.5N and its standard deviation is 0.5*Sqrt(N). The probability between 0.25N and N is therefore approximately the area under the standard Normal PDF from (0.25N - 0.5N - 1/2)/(0.5 * Sqrt(N)) to infinity. (The "-1/2" term is the continuity correction.) The lower endpoint expression simplifies algebraically to -(Sqrt(N) + 1/Sqrt(N))/2. The area to its left is the probability of passing. This area, by definition, is the value of the Normal CDF.

Therefore,

Prob(Passing the 75% test with N samples | true median = C)
= Normal CDF(-0.5 * [Sqrt(N) + 1/Sqrt(N)]).

(Do not forget that the 10X restriction can cause the test to fail, too.)

This value is less than 3% for all values of N equal to 10 or greater. (The exact value never exceeds 1.1%.)

Evidently, the probability of passing the test when the true median exceeds the cleanup standard is lower yet. This test is very accurate when the site is "dirty."

What if the site is "clean"? This means that the median is less than C. Another way to put it is that the cleanup standard is a higher percentile (p) than the median. The validity of the Normal approximation to the probability depends on how large p is; the larger it is, the higher we need N to be. Nevertheless, using the Normal approximation will inform us of the general behavior of this test, at least for large sample sizes.

Now, the mean is p > 0.50 and the standard deviation is Sqrt(p * (1-p) * Sqrt(N). The same analysis as above gives

Prob(Passing the 75% test with N samples | 100p^th percentile = C)
= Normal CDF((p - 0.75)/Sqrt(p*(1-p)) * [Sqrt(N) + 1/Sqrt(N)]).

That is, instead of the -0.5 term in the formula, we have a more complex (p - 0.75)/Sqrt(p * (1-p)) coefficient multiplying the square root of N terms. This coefficient is negative for all values of p less than 0.75 and positive otherwise. This implies we will see a diminishing pass rate as the sample size increases, unless more than 75 percent of the site meets the cleanup standard. That is,

The 75%/10X rule is not a test of the median (as claimed by the regulations). It is a test of the 75th percentile.

That was obvious, wasn't it? But our analysis can show more than that. Consider the probability of passing the "75%" portion of the "75%/10X" test when exactly 75% of the site meets the cleanup standard. This is close to 50%. When more than 75% of the site meets the standard, how many samples would it take be reasonably confident of passing the test? The next figure shows the values based on the Normal approximation:

For comparison, here are the values based on exact computation of the Binomial probabilities:

(The probabilities bounce around because the the proportion needed to pass the test bounces around. For example, eight of ten (80%) must be less than C, but only nine of twelve (75%). The approximate curves closely trace the upper envelopes of the exact curves, as we expected.)

Using this figure you can determine sample sizes needed to demonstrate attainment of cleanup standards with the 75% rule. For example, if you believe 80% of the site meets the cleanup standard and want to have a 95% chance of passing the test, then find the point at which the P=0.8 curve crosses a 95% probability value. This point is at 175 samples. If you believe 90% of the site meets the cleanup standards, only 11 samples would be needed to achieve the same success probability. (Remember, these values are estimates based on the Normal approximation and on the assumption that the maximum concentrations rarely exceed ten times the cleanup standard.)

Since we are exploring how the success rates depend on sample size, let's look at various realistic sample sizes. The regulations require one sample for every 250 cubic yards of soil. A one-acre site with three feet of topsoil to sample contains 4,840 cubic yards, which would require 20 samples. Ranges of sample sizes from N=10 to several hundred appear possible. (Larger sites are likely to be divided into smaller investigation units that could be separately sampled and evaluated.)

(This figure uses the exact Binomial calculations rather than the Normal approximation. The approximate values are very close to those shown here except for the N=10 curve, which is low because in fact 80%, rather than 75%, of the values must be less than the cleanup standard.)

As we have already seen, when less than 50% of the site meets the cleanup standard, the chance of passing the test is low, regardless of the sample size.

So far, we have ignored the "10X" part of the rule. This is easy to assess. A set of N samples will pass the 10X test only when every sample is less than 10C, ten times the cleanup standard. If there actually is some proportion q of the site containing soil concentrations above 10C, then a randomly-selected sample has a 1-q chance of being measured below 10C. N such samples, being independently chosen, therefore have a (1-q)^N chance of being measured below 10C. (That is a Binomial probability again, but it needs no approximation.) This is the success rate.

The most notable thing about this success rate is its dependence on the sample size N.

The figure shows two important properties of the 10X test:

It is possible for a relatively large proportion of the site to contain concentrations greater than ten times the cleanup standard and still pass the 10X test, provided the sample size is relatively small.
However, if any proportion of the soil contains a high concentration, the chance of passing the test diminishes as the sample size increases.

The first property is not good, but is unavoidable in any test that does not make assumptions about the underlying distribution. The second property looks good, but actually is not. Every site, no matter how clean, has some chance of yielding a soil sample with a high concentration of anything. Investigators know this. The 10X portion of this test therefore provides some incentive to keep the number of samples to a minimum.

The two tests (the 75% part and the 10X part) are not independent, but they will act independently to a good approximation. The performance on the 75% part depends on the behavior of the lowest 75% of the concentrations whereas the performance on the 10X part depends on the highest concentration only.

Summary of Theoretical Properties

Much more than 50% of the soils must meet the cleanup standard. To stand a reasonable chance of passing the 75%/10X test, a site should have 80% or more of its soils meeting the cleanup standard.
A part of the soils can contain arbitrarily high concentrations yet still pass the test. Provided the number of samples is low (less than a few hundred), a site can contain appreciable amounts of soil exceeding ten times the standard (one percent or more) and still have a fair chance of passing the test.
A site meeting the nominal requirements of the standard will fail the test. Sites whose median soil concentration barely meets the cleanup standard have almost no chance of passing the test.

To optimize a remediation whose results will be assessed with the 75%/10X test, then, it is wise to reduce concentrations at 80% or more of the site below the cleanup standard, but it is relatively safe to miss a few percent of the soil. A more thorough cleanup that brings all soil concentrations uniformly down to the cleanup standard, but not much below it, is likely to fail the 75%/10X test.

The 75%/10X rule could be improved by basing the proportion (75%) and maximum (10X) on the sample size. To control the median concentration, as the regulations claim, the proportion should decrease toward 50% and the maximum should increase with increasing sample size.

Links to web resources on statistical testing

http://www.dep.state.pa.us/dep/deputate/airwaste/wm/landrecy/MANUAL/Manual.htm -- Section IV of this zipped collection of documents describes, with examples, statistical tests allowed under Pennsylvania's "land recycling" act.

Return to the Environmental Statistics home page

This page was created 25 March 2001 and last updated 31 March 2001.