Notes on Chapter 4

Go to Notes on Chapters 1 and 3

Go to Notes on Chapter 5

These notes are intended to amplify the text, point out questions and areas for further thought, and identify and resolve ambiguities and (the rare) mistakes.  You definitely should be aware of the mistakes, so their pages are displayed in bold face.

Chapter.Page Comments
4.139 "Berthoux" should read "Berthouex" in the third paragraph.
4.139 Clearly it is impossible to flip any coin "an infinite number of times."  For the operation that consists of flipping a coin once, the "population" contains only two elements: "heads" and "tails".
4.140 The situation depicted in Figure 4.1 can be viewed in two ways that are subtly different.  It presents a summary of ten separate "experiments," each one consisting of one coin flip.  However, this series of ten flips may also be viewed (as the authors appear to do) as a single "experiment," in which case Figure 4.1 is just a graphical way of showing that the result included six tails and four heads.  I say "included" because summarizing the result like this omits some information about the experiment; namely, the sequence in which the six tails and four heads occurred.  The "population" in this case consists of 210 = 1024 possible sequences, of which 210 contain six tails and four heads.
4.140 "Countably infinite" means the values can be put into a sequence so that there is a first, second, third, ... value.  For example, the set of all integers {...-2, -1, 0, 1, 2, ...} is countably infinite because it can be put into the order 0, -1, 1, -2, 2, -3, ... and this order eventually "counts" every integer, but the set of all points on a line cannot be ordered because no matter how we try to do it, there will always be many points that are never counted in this way.  Infinities are not needed for practical applications, because nothing in the physical world is truly infinite, but infinities are very useful for developing the theories that justify practical techniques.
4.141 [First paragraph]  The will be in the first paragraph is unnecessary: random variables often describe events or observations that have already taken place.  Consider the shell game: somebody has put a ball under one of three cups and shuffled the cups.  The game is to bet on which cup contains the ball.  The ball definitely is already under one of the cups (unless someone is cheating); the "random" event has already happened.

There are two distinct ways to view soils concentrations as "random."  One is to think of them as resulting from some random process, much like the state of a coin (heads or tails) results from the process of flipping it.  Another is to consider the concentrations as just "being there."  (Analogously, the coin that has already been flipped is definitely either heads or tails: there's no randomness to it from this point of view.)  The randomness occurs in the selection of the samples to observe.  The first kind of randomness is sometimes called "model-based" randomness to distinguish it from the second, which is "design-based" randomness.  The distinction has important implications for how we can make inferences about all the soil (such as its average concentration) and can lead to different answers depending on how the randomness is viewed.

4.141 [Second paragraph]  A discrete random variable has distinct and separate possible outcomes.  There is therefore no need to collect the outcomes into bins.  We don't need the full histogram apparatus to depict the probabilities: all we need is a bar chart.  On a bar chart, the heights are frequencies or probabilities.  On a histogram, the heights are frequencies, proportions, or probabilities per unit interval.  On a histogram, the bar widths matter, because the histogram depicts frequencies, proportions, or probabilities by area; on a bar chart, the bar widths do not matter, because the bar depicts frequencies etc. by height.  (To maintain visual balance, the bar chart therefore must use a constant bar width.)  On a bar chart, all bars have to be less than 1.0 (100%).  On a histogram, bars can become arbitrarily tall, depending on their width.  Therefore, it does not matter what the widths of the bars in Figure 4.3 may be.
4.145 We can talk about the frequency with which a continuous random variable's value lies within some interval or "bin," but not (in general) about the frequency with which its value exactly equals some number.  The relative frequency used on the second line is the frequency divided by the bin's width, so it's really a frequency per unit interval.  This is why the relative frequency can exceed 1.0 (100%): see Figure 4.8 for some examples (the non-central beta and the triangular densities).
4.151 Not all random variables have probability density functions.  The "zero-modified lognormal density" does not.  Figure 4.8 shows the 50% probability mass at zero using a bar (as in a bar chart): the pdf at zero, were it defined, would have to be infinitely large (because the width of the bin at zero is infinitely small).  However, all random variables (whose values are numbers) have cumulative distribution functions (cdfs).  The demonstration is very easy: a random variable is determined by probabilities and the same probabilities determine the cdf.  The problem with the pdf is the need to compute relative frequencies, which is not always possible.
4.159 In words, formula (4.8) states that the pth quantile xp is that x-value where the cdf crosses height p.  If the cdf is horizontal with value p over some interval, then any point in that interval can be considered a crossing point.  In such cases, we often take the middle of the interval to be the quantile.  The text notes that S-Plus uses the left endpoint of the interval instead.
4.159 [Bottom]  A monotonic transformation preserves inequalities: that is, g() is monotonic if in every case that x0 < x1, g(x0) < g(x1).  g() is also monotonic if in every case that x0 < x1, g(x0) > g(x1).  The first kind is order preserving and the second kind is order reversing.  The graph of a monotonic function never has any peaks or valleys: although it may briefly level off, either it is always rising or always falling from left to right.  For example, exp(), ln(), and reciprocal (1/x) are monotonic functions.  The square function is not (it has a valley at zero).  However, if we are focusing on non-negative numbers only (as we might do for concentrations), then the square function is monotonic.  This points out the importance of defining the domain of a transformation, which is the set of values to which it might be applied.

A good way to think of a monotonic transformation is to visualize a variable stretching and expanding along the x-axis, accompanied possibly by a complete flip of the axis (for order reversing transformations).  The same operation will stretch and expand the graph of the cdf in the horizontal direction, but will not affect the heights of the cdf.  Therefore, even after the distortion, you can accurately read the quantiles from the graph.  That's all formula (4.10) is saying.  Specifically, the first line says to take a quantile p and find the point xp where the cdf crosses height p.  Applying the transformation g moves xp over to a new location that we will call yp, but since the cdf stretches horizontally along with the transformation, it still crosses height p exactly at yp.  That's what the second line of the formula says.  This should be glaringly obvious, but a moment's additional thought will suggest that it's wrong in some cases.  The exceptions occur where the cdf may be horizontal for a while: the midpoint of one interval often does not correspond to the midpoint of the transformed interval.  So which cdfs have horizontal portions?  Step functions, for one: the cdf of any discrete distribution is mostly horizontal.  Therefore formula (4.10) is valid only for certain kinds of continuous distributions.  Use caution when reasoning about percentiles of discrete distributions!

4.163 Formulas (4.11) and (4.12) for the expected value are the same: the integral is just a special kind of summation.  The summation in 4.12, although it does not look it, is almost the same as formula (3.1) (page 60) for the mean of a batch.  Rewrite (3.1) as
 
Evidently the term 1/n is playing the role of the frequency f(x) in equation (4.12).  But that's exactly right: the correct frequency for any single value in a batch of n values is 1/n.
4.165 Similarly, formulas  (4.13) and (4.14) for the variance of a random variable are the same and are essentially the same as formula (3.7) (page 63) for the variance (mm estimator).
4.167 The meaning of "distribution" suddenly changes in this section.  Previously, "distribution" was defined as "what a density histogram of outcomes would look like if you could keep taking more and more samples..." (p. 143, bottom).  Now, "distribution" is meant in the sense of "parameterized family of distributions."  For example, "the [sic] normal distribution" described on this page refers to a family of distributions varying according to parameters mu and sigma.  You can usually, but not always, tell from the context which meaning of "distribution" is intended.  Watch out!
4.167 You should recognize the standardized version of x in formula (4.19).  Writing z for (x - mean)/sd, and ignoring the constant multipliers (they are needed to make the total area of the pdf equal to 100%: think of them as assigning a unit of measurement to the y-axis in the pdf graph), we see the normal pdf is simple and easily remembered: it is exp(-z2/2).
4.171 This statement of the central limit theorem is too broad.  There are important theoretical and practical restrictions on the underlying random variables.  (The average of lots of Cauchy variables will never be anywhere near normal.)  Likewise, the average of lots of variables whose variances differ widely among one another may be badly approximated by a normal distribution.

For example, if you begin with a lognormally distributed variable (such as a concentration of an environmental contaminant in some medium) and then add zillions of small error components to model the measurement process, you still wind up with a distribution that's almost lognormal and can be highly skewed.  Be aware, too, that it can take lots of variables to participate in an average before the central limit theorem's conclusion even approximately holds.  Therefore you should never hastily assume that sums or averages of variables look normal; this has to be carefully checked in all cases.

4.173 An example of the haste we caution you against appears at the bottom of this page.  In many practical situations the standardized sample mean is well described by a normal distribution, as the book says, but sometimes it does not: you always have to check.
4.175 Formulas like the lognormal pdf  (4.31) (which is the same as (4.2) on page 145) look tantalizing and mysterious: the mystery is apparent to anyone who is not a statistician, but the tantalizing part is the strong similarity to formula (4.19) for the normal pdf.  But where did the 1/x term at the front come from?

The mystery is removed by changing relative frequencies (see the note for page 145) to actual frequencies.  This is done by multiplying the pdf's height (as given by the formula) by the bin widths.  This works because the actual frequencies must stay the same even when the values are re-expressed.  After all, changing how we represent numbers can have no effect on the phenomena we are studying.

The bin widths are very tiny values of standard size dx.  When you transform a variable, such as when you write y = ln(x), you also change the bin widths.  For monotonic transformations, the change is given by the derivative: if we write dy/dx = g'(x) (literally, the ratio of new bin width dy to the old bin width dx), then dy = g'(x)dx.  (When the transformation is order reversing, we need to take the negative of this expression to keep the areas positive.) 

In the present case, to say a variable x is lognormally distributed is to say its logarithm y = ln(x) is normally distributed.  That is, by re-expressing the original values x in terms of their logarithms y, we will get a normal distribution.  Up to a constant, the standard normal pdf for y is f(y) = exp(-y2/2)dy (note the appearance of the dy!).  From y = ln(x) we compute dy/dx = 1/x, so dy = dx/x.  Because ln() is monotonic, simply substitute ln(x) for y and dx/x for dy, giving the pdf for x as exp(-ln(x)2/2)dx/x.  To produce formula (4.31), put back in the standardization equation (y is replaced by (y - mean)/sd, which becomes (ln(x)  - mean)/sd) and put in the constants needed to guarantee the total probability is 100%.

Whenever you read a formula like (4.31) you should parse it into pieces you can understand.  To summarize, here are the pieces of the lognormal pdf and their meanings:

1/x: comes from the derivative of y = ln(x)
1/sd: comes from the derivative of (y - mean)/sd (the standardization of y).
1/sqrt(2*pi): needed to make the total probability 100%
exp(-1/2 * something2): the normal pdf
ln(x): because this is the lognormal pdf
(ln(x) - mean)/sd: the standardization after substituting ln(x) for y
x > 0: you can't take logarithms of non-positive numbers.

Ultimately, there are just three basic ingredients in this recipe: the part that is unique to the normal distribution (exp(-1/2 * something2)), the standardization, and the ln() re-expression.  When you understand these three pieces, you will understand the entire formula.

4.182 A common and important environmental application of the hypergeometric distribution is in managing fisheries.  Within a pond, stream, or even part of the ocean, fish are captured, marked, and later recaptured.  The initial capture and the final recapture are samples from the fish population without replacement.  Especially in small streams and lakes, an appreciable proportion of the total population of adult fish may be caught and counted, so it is indeed important to maintain the distinction between sampling with and without replacement.
4.185 The use of the Poisson distribution to model chemical concentrations was always suspect and has largely been discredited.  Its original motivation was to count rare random "hits" on a spectrophotometer to help assess extremely low-level concentrations of many related chemicals in a water sample.  This use of the Poisson is insightful and statistically valid, but the further attempt to take concentrations, rather than ion counts, as the basis of the statistical analysis is not so justified (one part per billion is far different than a single ion) and appears (on an empirical basis alone) not to work very well in general.
4.188 The sense in which there is a "limiting distribution" of the largest or smallest value among n independent random variables is technically delicate.  If we just track the distribution as n increases, then for many distributions--such as the lognormal--we find there is no limit; the maximum tends to get bigger and bigger, the more data there are.  Instead, you have to look at how the maximum values are distributed around their expected value.
4.189 The exponential distribution has a pdf proportional to exp(-y).  Remembering (see the note for page 175) that this is shorthand for exp(-y)dy, let y = exp(-x) so that dy = -exp(-x)dx.  Since y = exp(-x) is monotonic but reverses order, we take the negative of dy and plug that into the pdf.  After accounting for standardization, we get equation (4.48).  In short, the two -parameter extreme value distribution is an "exponential exponential:" the exponential of x is exponentially distributed.
4.191 [First line]  The proportion in a mixture distribution must be a constant: it cannot be a random value.
4.192 The equation for h() in formula (4.52) is not a bona fide pdf.  The problem is that the delta distribution does not have a pdf.  A pdf cannot describe the "mass" of p at 0.  If you interpret (4.52) literally and integrate h() over all possible values of x, you will come up with only (1-p), rather than 1.  The meaning of the mass term (the first equation) is that any integral covering the value of 0 must have a value of p added to it.
4.193 Likewise, you must interpret the bar at zero in Figure 4.32 differently than the rest of the graph.  The bar represents probability by means of its height, whereas the rest of the graph represents probability by means of its area.
4.195 "denotes" should read "denote" in problem 4.7

Go to Notes on Chapters 1 and 3

Go to Notes on Chapter 5

Return to the Environmental Statistics home page

The URL for this page is

This page was created 26 February.