The full quiz is here. The answers appear below. Comments, which are not part of the answers, are italicized.
1. A batch A has the summary statistics listed below. The values in batch B are obtained by subtracting each value in A from 20. (For example, if 7 is in A, then 20-7 = 13 is in B.) Compute the corresponding summary statistics for B. In cases where it is not possible to deduce the value, or if the value is not defined, then please so indicate.
| Statistic | A | B | C |
| Count | 77 | 77 | 77 |
| Median | 3.2 | 20-3.2 = 16.8 | 3.2^2 = 10.24 |
| H-spread | 1.9 | 1.9 | indeterminate |
| 95th percentile | 12.3 | indeterminate | 12.3^2 = 151.3 |
| Variance | 11.0 | (-1)^2 * 11.0 = 11.0 | indeterminate |
| Third order statistic (X[3]) | 0.8 | indeterminate | 0.8^2 = 0.64 |
| Geometric mean | 2.7 | indeterminate | 2.7^2 = 7.29 |
| 10% Trimmed mean | 3.5 | 20-3.5 = 16.5 | indeterminate |
The 95th percentile and X[3] fooled almost everyone the first time around. When you subtract all values in A from 20, you reverse their order. Therefore the 95th percentile becomes the 5th percentile and the 5th percentile becomes the 95th. We do not know exactly what the 5th percentile of A originally is (although we will probably not be far off by supposing it is close to X[3], which would imply the 95th percentile of B should be close to 0.8).
Likewise, X[3] for batch B is the third lowest value, which will be 20 minus the third highest value of A. We do not know exactly what the third highest value of A is (although it ought to be close to the 95th percentile, since 3/77 is close to 95/100, which would imply X[3] for B should be about 12.3).
Would you expect A to be positively skewed, negatively skewed, or have approximately zero skewness? What about B?
A appears to be slightly positively skewed because trimmed mean > median > geometric mean and the 95th percentile is a far outlier. Therefore B will be negatively skewed (because the subtraction reverses the order) and C will be much more positively skewed (because squaring increases the extremeness of high values).
Batch A (the original values) was likely positively skewed, as many statistics indicate, including:
| The 95th percentile is much further above the center (median or trimmed mean or geometric mean) than X[3] (approximately the 5th percentile) is below the center, suggesting the right tail is much "heavier" than the left tail of the batch. | |
| The median and trimmed mean are both noticeably higher than the geometric mean. | |
| The trimmed mean is slightly higher than the median. | |
| The CV can be approximated as sqrt(variance)/trimmed mean = sqrt(11)/3.5 = 1, approximately, indicating a highly positively skewed batch. | |
| A robust version of the CV is the H-spread divided by the median, which is 1.9/3.2. This is high enough to indicate a slight positive skewness. Note that this measures the shape of the middle 50% of the batch, so it gives us a better look at the middle of the batch than the CV, trimmed mean, geometric mean, or extreme percentiles do. |
The skewness of B must be exactly the negative of the skewness of A, because the subtraction operation merely reverses the order of the data but does not otherwise change the shape of the batch. You can check this from the formula for skewness, too: the skewness is a sum of cubes of standardized residuals (with respect to the mean). Adding 20 to each value does not change the residuals, but multiplying each value by -1 multiplies every residual by -1 and multiplies the variance by 1. Therefore the standardized residuals are all multiplied by -1/1 = -1.. A value of (-1)3 = -1 factors out of each cubed term, causing the entire sum--the skewness--to be negated.
Extra credit: answer the same questions for a batch C that is formed from the squares of all values in A.
We cannot determine the H-spread because it will be the difference of the two hinges, which themselves will be the squares of the hinges of A. Let's write these A-hinges as H+ and H-. We know 1.9 = (H+) - (H-), but that does not give us enough information to compute (H+)2 - (H-)2.
We cannot determine the variance of C because squaring is a non-linear transformation. One can construct examples of two batches that have identical means and variances, but when squared give dramatically different variances.
On the other hand, the geometric mean of C can be computed! We know it exists for A, so all values of A are positive. When squared, they remain positive. Therefore the geometric mean of C exists. You can check from the formula that the geometric mean of C is the square of the geometric mean of A.
C will have large positive skewness. All values in A are positive (because its geometric mean exists), so the squaring operation is monotonic. (Squaring is not monotonic when both positive and negative values are involved.) Squaring tends to increase the deviations of the most positive numbers from the mean, thereby increasing the skewness of a batch that is already positively skewed.
2. Draw a histogram of the following batch of bis (2-ethyl hexyl) phthalate measurements in groundwater (micrograms per liter). Choose bins in a way that helps portray the "shape" of the data.
0.24, 0.58, 0.77, 1.30, 3.23, 3.91, 7.71, 11.57, 13.51, 19.75, 24.57, 26.64, 37.85, 55.00, 60.05, 89.76
(The values have been sorted for you.) Label the histogram appropriately so that it can be read on its own. Use relative frequency on the y-axis.
A good choice of bin cutpoints would be 0.1, 0.3, 1.0, 3.0, 10, 30, and 100. Using evenly spaced cutpoints will cram half the values into the first one or two bins. [The histogram is not shown here.]
3. A batch has 11 values. The five largest values are 33.0, 24.6, 17.0, 16.0, and 15.3 Compute the 80th percentile of this batch. Use Weibull plotting positions (the percentile at rank I is I/(N+1): text, page 96). Round the answer to one decimal place precision (the same as the data).
The percentiles for 11 values are 11/12, 10/12 = 83 1/3, 9/12 = 75, and so on. Therefore the 80th percentile is between 24.6 and 17.0. Linear interpolation gives the answer as (83 1/3 - 80)/(83 1/3 - 75) * 17 + (80 - 75)/(83 1/3 - 75) * 24.6 = 0.4*17 + 0.6*24.6 = 21.6, approximately. To get the answer to one decimal place precision, you must express intermediate results with equal or better precision. Using the value 83 instead of 83 1/3, for example, would give a close answer, but one that errs by too much.
![]()
Return to the Environmental Statistics home page
This page is copyright (c) 2001 Quantitative Decisions. Please cite it as
This page was created 29 January 2001 and last updated 12 February 2001 (to include additional comments on problem 1 and an extended discussion of the skewness question and the decimal precision issue in #3).