Geometric Means, Exp(), Ln(), and All That

"The formula in our text for geometric mean goes something like exp[(1/n)Sum(log(x))].  This differs from the formula you used that contains a square root and what appears to be the product of both values in the set (or would those be the "min" and "max"?).  Can you please explain further your calculation for question 8?" -- Question, January 24, 2001

The formula quoted above is number 3.4 on page 62 of the text.  The expression I used for the geometric mean in question 8 of the sample quiz, for N=2, was sqrt(X1 * X2).  This introduces a long line of interesting questions about what powers, roots, exp(), and log() really mean.  I will quickly review a standard method of defining these because it's a good opportunity to prepare you for material later in this course.  If you just want the short answer to the question, skip to paragraphs 4 and 5 below.  However, you do need to know all the facts listed here.

1.      The natural logarithm ("ln" here; "log" in the quotation above) of any number z > 0 is the area under the curve y = 1/x from x=1 to x=z, using the convention that if z is to the left of 1 (that is, 0 < z < 1), then the area is considered to be negative.  Using basic properties of areas you can prove these fundamental facts, which you should commit to memory:

*       Ln(1) = 0 (draw a picture: there's no area).
*       As x becomes arbitrarily large, so does ln(x) (the area gets bigger, obviously, but the point is that it becomes arbitrarily large, without approaching a limit).
*       As x approaches zero, ln(x) becomes arbitrarily negative (approaches -infinity).
*       Ln(u*v) = Ln(u) + Ln(v) for any positive numbers u and v.  (This is the only property listed here where calculus is helpful in the demonstration.  A sophisticated understanding of Euclidean geometry is sufficient, however.)
*       If Ln(u) = Ln(v), then u = v.
*       Ln is monotonically increasing.

2.      The exponential exp(x) of any number is the inverse of the logarithm.  That is, to say exp(x) = y is exactly the same as saying ln(y) = x.  Therefore exp() has these basic properties, which you also should know:

*       exp(ln(x)) = x for all x > 0.
*       ln(exp(x)) = x for all x.
*       exp(x) > 0 for all x.
*       exp(0) = 1.
*       As x becomes arbitrarily large, so does exp(x).
*       As x becomes arbitrarily negative (approaches -infinity), exp(x) approaches zero.
*       exp(u + v) = exp(u)*exp(v) for any numbers u and v.
*       if exp(u) = exp(v), then u = v.
*       exp() is monotonically increasing.

3.      For any numbers x and y, the value x^y (x to the power y) is *defined* to be exp(y*ln(x)).  Clearly x must be greater than zero for ln(x) to be defined.  This is the only restriction.  Using the properties of ln() and exp() above it is straightforward to demonstrate that powers have their "usual" properties, including:

*       a^0 = 1
*       a^1 = a
*       a^(b+c) = a^b * a^c
*       (a^b)^c = a^(b*c)
for any numbers a, b, and c where these expressions are defined.  From these you can derive the "definitions" used in elementary texts, such as

*       a^n = a * a * ... * a (n terms) for integers n >= 2
*       a^(-n) = 1/(a^n)
*       (a^(1/b))^b = a, so a^(1/b) is a "bth root" of a.

4.      Let N=2, so the batch in question in formula (3.4) can be written {X1, X2}.  Writing out the summation gives

        geometric mean = exp(1/2 * (ln(X1) + ln(X2)))

By *definition* (#3), this is the 1/2 power ("square root") of exp(ln(X1) + ln(X2)), which by a basic property of exp() is exp(ln(X1)) * exp(ln(X2)), which by *definition* of exp() is simply X1*X2.  Whence the formula reduces to the more familiar looking

        geometric mean = square root of (X1 * X2).


5.      For N >= 1, you can similarly demonstrate that formula (3.4) implies

        geometric mean = Nth root of (X1 * X2 * ... * XN)

This formula looks a little nicer, but (3.4) is a little easier to implement on a calculator or computer and is more efficiently computed.  Formula (3.4) also makes it clear how the "geometric mean" really is the *mean* of something; namely, first you take logarithms, then compute their mean, then "undo" the logarithm (that's the exponential).  It also makes it easy to see that geometric means are defined only for batches of strictly positive data.  (Although the Nth root formula above actually works for additional batches, such as {-1, -4}--the square root of (-1 * -4) is 2, but evidently 2 is not any reasonable kind of "mean" of {-1, -4}--it does not make much sense.)


6.      Another basic, but slightly subtler property, of ln() (namely, its "convexity") implies that geometric means of batches can never be greater than arithmetic means.  This is called the "geometric-arithmetic inequality."  It is quite a powerful tool used to solve a large class of optimization problems, for example.  More to the point, we will use this inequality later to demonstrate that geometric means are "biased": on the average, they underestimate the true average concentrations or masses of chemicals in an environmental medium, for instance.  This is key to understanding the regulatory concern about so-called "lognormal" distributions.

By the way, the "convexity" of the logarithm amounts to the fact that the graph of y = 1/x is always decreasing.  That brings us full circle back to the definition of logarithm.  The whole theory is simple and elegant.

In class we will do some exercises to become more familiar with ln() and exp(): we will learn how to compute their values (well, to one or two significant figures, anyway) quickly enough to do most problems involving ln(), exp(), powers, and roots in our heads.  (See Answers to HW 5.)  For these exercises I will assume you know all the bulleted properties listed above.  In addition, we will need the Taylor Series approximations for exp() and ln():

        exp(x) = 1 + x/1! + x^2/2! + x^3/3! + x^4/4! + ...
        ln(1+x) = x - x^2/2 + x^3/3 - x^4/4 + ... provided -1 < x < 1.

(Recall 0! = 1, 1! = 1*0! = 1, 2! = 2*1! = 2, 3! = 3*2! = 6, and in general n! = n*(n-1)!.  These series can be derived directly from the definition of ln() above, but the derivations require subtler properties of integration, more involved algebraic equations, and some consideration of limits: after all, they are infinite series.)

The patterns in the series are simple and should be memorized.

For more on logarithms and exponentials see the  Logarithms page.

Return to the Environmental Statistics home page

This page is copyright (c) 2001 Quantitative Decisions.  Please cite it as

This page was created 16 February and last updated 3 May 2001.