Parameter estimation

Links to web resources on parameter estimation

http://risk.lsd.ornl.gov/homepage/bjc_or416.pdf  Detailed discussion of parameter and interval estimators for normal and lognormal distributions, including how to cope with censoring (nondetects).

Terminology

(statistical) sample A collection of observations obtained in a way that can be accurately modeled by draws of tickets from one or more boxes.
stochastic simulation The process of replacing numerical inputs to a mathematical model by probability distributions, drawing random values from those distributions in repeated independent runs of the model, and reporting on the distributions of model outputs.

Discussion

Building your own distribution

In order to de-mystify probability distributions, we constructed several of our own.

Building a discrete distribution

Take a bunch of numbers.  You can use infinitely many, but keep them separated from one another.  Draw a horizontal axis with these numbers marked.  Choose one of these numbers.  It is your start number.

Now draw a vertical axis.  Mark it off from 0 to 100%.  Choose any height between 0 and 100% (but not equal to 0 or 100).  Associate that height with your start number.  This will be the value of the CDF for the start number.

You now have a bunch of numbers less than the start and another bunch greater than the start.  (One or both of these bunches might be missing, for example if you start was the smallest number.  That's ok.)  Let's call these the "smaller bunch" and the "greater bunch," respectively.

Repeat the process for the smaller bunch and the greater bunch.  The only difference is that for the smaller bunch, you will select a height between 0 and the CDF of the start number.  For the larger bunch, you will select a height between the CDF of the start number and 100.

Do this procedure recursively until you have assigned a height to every one of your chosen numbers.  There are only two simple rules: (1) you have to make the heights approach 0 as you approach the smallest (leftmost) horizontal value and (2) you have to make the heights approach 100 as you approach the largest (rightmost) horizontal value.

You can now fill in the rest of the CDF so that it leaps upwards only at the horizontal values you originally chose.  It defines a discrete probability distribution because (a) it is monotonic (b) it rises from 0 to 100% and (c) it rises only in leaps at distinct horizontal points.

Building a continuous distribution

Building a continuous distribution is even easier.  We will build its PDF.  To do this, draw any kind of continuous curve in the plane.  Well, almost.  There are a few simple rules.  First, a vertical slice through any point on the curve must not intersect any other point.  Second, the curve's points must all be above some minimum height.  Third, the area between the curve and a line of constant minimum height must be finite.  (This is an issue if you extend your curve infinitely far to the right or left.)

To finish the process, draw the X-axis at the line of minimum height.  Change the scale on the Y-axis to make the curve's area equal to 1 (100%).  You now have a PDF.  It defines a continuous distribution.

Note:  There exist continuous distributions whose PDFs are not constructed in this fashion.  PDFs do not have to be continuous and indeed they can have vertical asymptotes ("singularities").

So what is so special about the distributions discussed in statistics books?

Many distributions have at least one of two nice features.  First, they have nice mathematical properties.  Second, they arise through consideration of some physical process, much as the Normal distribution arises in the theory of errors or the Binomial distribution arises by studying sequences of experimental "successes" and "failures."  This makes them useful tools for modeling many phenomena.  A distribution you make up might or might not be so useful.

Mean, mean, and mean: what do they mean?

We have to stay sharp when we read statistics books because they use nouns like "mean" in many distinct ways.  Some of the uses we have encountered include

  1. Any batch of numbers has a mean: it is the summary statistic computed as the sum of the numbers divided by their count.  There is no place in this definition for probability, sampling ("ticket-in-a-box model"), or randomness.  This "mean" is just one of many one-number descriptors of the batch.
  2. Most (but not all) random variables have means.  The mean, when it exists, is the expected value of the variable.  It is one characteristic of the random variable.  There is no place in this purely mathematical definition for sampling or for batches of numbers.
  3. A statistical sample is a set of values that we suppose occurred as if they had been written on tickets drawn independently from a box (or collection of prescribed boxes) with replacement.  The sample is a batch for which we have a probability model (box of tickets) to describe its genesis.  Any mathematical function of the sample values that is used to estimate the mean of the box is an estimator of the mean.

To illustrate the distinctions made in point 3, suppose you have a sample (x1, x2, ..., xN) from a box (N>=2).  That box has a mean (sense #2), but you do not know its value.  The sample, considered as a batch, has a mean (sense #1): it is equal to (x1 + x2 + ... + xN)/N.  There are many possible estimators (sense #3) of the box's mean (sense #2) that can be constructed from the sample.  Some of the better ones among these are:

The midrange (x[1] + x[N])/2
The interquartile range
The mean of the hinges
The trimean
The mean of the 16th and 84th percentiles of the sample
The median of the sample
And, of course, the mean (sense #1) of the sample.

Estimators: it's just a bunch of formulas

You can already see from the illustration above that many, many estimators of any distribution's properties can and do exist.  Each estimator is just a formula intended to be applied to the values in a statistical sample.  To use an estimator, you look up and apply the formula.  That's all there is to it!  

You can even make up your own estimator without any knowledge of probability and statistics whatsoever.  (We will see some examples later.  Creativity is not limited to people: even government agencies have gotten in on the act.)  Nobody will stop you.  You only need to specify two things:

  1. What it is you are estimating.  Common choices are a mean, standard deviation, variance, median, percentile, or higher moment of a distribution.  You can be creative though, and claim you are estimating a maximum, minimum, range, or anything else you can think of.
  2. A formula.  The formula, to be generally useful, should be flexible and apply to samples of any size.  The simpler the formula, the more likely it is you can persuade someone to use it.  There is no limit, however, on how complex and nasty the formula can get: use your imagination!  If you're really stuck for a formula, borrow somebody else's.  They probably won't mind.

No, there does not have to be any connection between #1 and #2.  It helps if there is, though, at least intuitively.  This makes it easier to persuade people of your prowess as a statistical expert.

The challenge lies in choosing the estimator.  If there are so many, are they all equally good?  (No!)  So then how do you measure how good an estimator is?  How do you compare estimators?  How do you select the best one if you have a bunch to choose from?  How do you even determine whether an estimator is adequate for your needs?  These are questions we will take up shortly.

Sampling can be modeled with a single box

Statistical sampling can be complex and often looks it.  The trick to dealing with complexity is to hide it.  We saw an example of this in class.

Our example concerns a sample of four values.  We assumed each of those values was independently drawn from an N(0, 2) distribution.  From that sample we can compute the standard deviation statistic.

This procedure I just described ultimately produces one number: a standard deviation.  One way to model the entire process is to systematically create a very large quantity of  four-number samples.  Write each sample's standard deviation statistic on a ticket and put that ticket into a box.  This new box approximates the sampling distribution of the standard deviation.

Another useful way to contemplate the sampling distribution is to think of drawing tickets out of a box as a process.  We don't actually need to know what is inside the box.  We just need to know that the process acts like draws of tickets, with replacement, from some box of definite, unchanging composition.

The process of obtaining four independent N(0, 2) values and computing their standard deviation defines a single new probability distribution, the sampling distribution of the standard deviation.

We used Crystal Ball software in class to see how this process works.  The next figure, however, was produced with  @Risk software, which is also an Excel add-in that performs essentially the same tasks as Crystal Ball.

Both these software products enable you to replace spreadsheet input cells with random variables--tickets in boxes.  They monitor other cells, usually containing calculated values.  With the push of a button these products independently draw tickets from all the boxes, recalculate the spreadsheet, and make a record of the output cells.  They repeat this process many, many times.  This creates a large statistical sample of the output cells.

The figure above, produced by @Risk, is similar to the one Crystal Ball produced in class.  It displays a histogram of 10,000 standard deviations, each one computed from four independent values drawn from an N(0, 2) distribution.  In this figure, the vertical axis evidently shows probability per unit interval, as a histogram should.

The histogram contains some interesting information.  In particular, its mean of 1.83 is noticeably lower than 2, the standard deviation of the "underlying" N(0, 2) distribution.

The @Risk software lets us treat this entire simulation process as if it were the drawing of a single ticket--the mean--from one box.  To speed things up, I reduced the simulation size from 10,000 to 1,000.  The software then drew 100 tickets from this box.  That is, it ran the 100-draw simulation (using different pseudo-random numbers each time) 100 times, collecting the mean from each run.  The following histogram summarizes the results.

Evidently, almost any statistical sample of 1,000 standard deviations will itself have a mean noticeably less than 2.  This is strong evidence that the standard deviation formula tends to underestimate the true standard deviation, at least when a sample of four values from a Normal distribution is concerned.

(Theoretical calculations show the expected value of the standard deviation of four independent N(0, 2) values is 2 * sqrt(2/3) * Gamma(2) / Gamma(3/2) = 1.843, approximately.  The general formula for n independent N(mu, sigma) values is sigma * sqrt(2/n) * Gamma((n+1)/2) / Gamma(n/2) .  This is always less than sigma, but approaches sigma as n gets large.)

Rules and shortcuts, tips and tricks

Here are some things we learned:

There's nothing mysterious about a probability distribution.  It's easy to create your own.
To understand a word like "mean" or "standard deviation," you need to pay special attention to the context.  First determine whether a probability model is involved.  Then identify the purpose of the word: is it to characterize a batch of numbers?  Characterize a distribution?  Estimate a distribution's properties?  Or something else?
A statistical estimator is just a mathematical formula.  To use it, you look it up or have a computer program calculate it.
Well-defined mathematical procedures let us create new probability distributions--in some very complex ways--from old.  Simulation software helps us understand the new distributions.

 

Return to the Environmental Statistics home page

This page is copyright (c) 2001 Quantitative Decisions.  Please cite it as

This page was created 3 March 2001 and last updated 1 April 2001 (to state the theoretical bias in the standard deviation estimate).