http://www.nstl.gov/personnel/sy/parkin/softw-saldmanual.html -- Statistical Analysis of Lognormal Data: Parameter Estimation and Confidence Interval Calculation, USDA-ARS Technology Transfer Document No. NSTL91-3. (Basic program includes tabulated factors for Land's confidence intervals.)
Binomial confidence interval calculations -- an Excel spreadsheet
Land's H-factors (Gilbert tables A.12-13) -- an Excel spreadsheet (contains data only)
![]()
(Added in response to a question on 25 March 2001.)
Can you please explain what Student's t describes?
Take n independent random variables distributed according to N(mu, sigma). Their mean is distributed according to N(mu, sigma/sqrt(n)), as we have seen. Equivalently, (mean - mu) is distributed as sigma/sqrt(n) times an N(0, 1) distribution. Hold that thought for a couple paragraphs.
The sample standard deviation has a distribution, too, which depends only on sigma and the amount of data. It is equal to sigma times a Chi(n-1) variable. This isn't saying much: it's just giving a name to the distribution. [I am fudging a factor of 1/sqrt(n-1) here because it does not matter for this discussion and can be absorbed into the definition of Chi(n-1).]
The mean and standard deviation are independent random variables. (This is not obvious, because mathematically the mean and sd are computed from the same n random variables, so we would not expect them to be independent at all. Their independence is a special property of normal distributions.)
The t-statistic is an attempt to standardize the residual of the mean (relative to the true mean mu) using the sample standard deviation (which we can compute from the data) rather than using the true standard deviation sigma, which we do not know. Therefore,
| t = (mean - mu) / (sd/sqrt(n)) | {by definition} |
| ~ sigma/sqrt(n) * N(0, 1) / [sigma/sqrt(n) * Chi(n-1)] | {as explained above: mean and sd are random variables} |
| = N(0, 1) / Chi(n-1) | {because the constant factors sigma/sqrt(n) cancel} |
All the parameters have disappeared, leaving a statistic whose distribution can be calculated because the numerator and denominator are independent. This ratio is the t distribution with n-1 degrees of freedom.
Therefore, what Student's t does is allow us to perform probability calculations even when we do not know the true mean or standard deviation of the underlying data. We just need to know the underlying data are described by a common Normal distribution and that they are statistically independent.
This last statement is perhaps the most important thing to learn and remember. Anybody with a calculator or computer can follow instructions for applying a t-test in any of a thousand statistical recipe books or software programs. However, the results, no matter how nicely formulated by the computer, will be garbage until you establish that these two statistical assumptions are valid (Normal distribution and independence).
The computation of t (that is, of its pdf or cdf) is a matter of integral calculus and was first carried out in 1908 (not very rigorously, but correctly), by the statistician W. S. Gosset writing pseudonymously as "Student". (Gosset was concerned that his employer, Guinness Brewing, would frown on publication of what amounted to proprietary business practice information--the t-test is useful in many kinds of experimental and quality control settings.)
When you have a large sample, the sample standard deviation should closely approximate sigma. In that case t should be close to a truly standardized normal variate with high probability, which means that t(n-1) should be close to N(0, 1) when n is large. You can verify that with the Student's t-distribution spreadsheet. You can also verify that by comparing tables of the t cdf with tables of the N(0,1) cdf.
![]()
Return to the Environmental Statistics home page
This page is copyright (c) 2001 Quantitative Decisions. Please cite it as
This page was created 6 March 2001 and last updated 9 May 2001.