Density Calculations

A "density" is an amount of something per unit area.  Population, for example, is defined only at the sampling unit level--household, block, census tract, county, state, or other region--but is frequently reported as a density: people per square mile, per square kilometer.

The density is well-defined at all points provided one specifies what neighborhood, or region around a point, is used to summarize the data.  The population per square mile at any point could be computed by summing all discrete population values within a circle of radius Sqrt(1/Pi) miles centered at the point.  It could also be computed by summing all discrete population values within a circle of radius 100 miles and dividing by the circle's area (10,000 * Pi square miles).  Of course other circle sizes are possible.

Since densities depend on areas and distances, you must be careful to represent your data in a projection that distorts areas and distances as little as possible.  In particular, using no projection--so called "geographic" projection--introduces enormous errors in densities at nonzero latitudes.

One needn't restrict the neighborhood to circles--neighborhoods may have arbitrary shapes and sizes.  Furthermore, to obtain more accurate values you could try a distance-weighted sum of nearby population data.

To create comparable densities at different locations, one usually chooses comparable distance weighting functions around each point.  Mathematically, these weights are functions K(s,t) of the vector displacement (s,t) of any point relative to the location (x,y) where the density is computed.  A population (or other value) f(x+s, y+t) located at (x+s, y+t) will receive weight K(s,t) and thereby contribute a term f(x+s, y+t) * K(s,t) to the density estimate at (x,y).  The density at (x,y) is the sum of all such terms.  (The dimensions of K() therefore are 1/area.)  For this to be a true density, the integral of K(s,t) over all possible (s,t) must be unity.

Translating this description into a method to compute densities is usually not a good idea, however.  Normally, one has a relatively small number of data points (x,y) where values f(x,y) are defined and a very large number of grid points at which densities need to be computed.  If the nonzero values of K(s,t) extend for K cells horizontally and L cells vertically, and if the density grid will have M rows and N columns, then the number of computations required is proportional to K*L*M*N, which quickly grows very large.  For a fixed size grid and fixed K(), each time the cell size is halved, the number of computations increases 16-fold.  This is truly terrible performance.

Instead, it is straightforward to observe that the term f(x+s, y+t) * K(s,t) contributing to the density at (x,y) can also be expressed as a term f(X,Y)*K(-s,-t) contributing to the density at (X+s, Y+t) simply by substituting X for x+s and Y for y+t.  This leads to a much faster algorithm:

Begin by setting the density d(x,y) := 0 for every point on the grid.

For each point (X,Y) where f() has a value

For each (s,t) where K(-s,-t) is nonzero and (X+s, Y+t) is on the grid,

add f(X,Y) * K(-s,-t) to d(X+s,Y+t).

The number of computations now is proportional to K*L*A, where A is the number of points where f() has a nonzero value.

Typically K(s,t) is symmetric--K(-s,-t) = K(s,t)--so that we do not need to worry about the negative signs.  The function K(s,t) (or sometimes K(-s,-t)) is called the Kernel.  We have shown that the density is the convolution of f() with the kernel.  For more on convolutions see http://www.directionsmag.com/features.asp?featureid=19.

Spatial Analyst Density Calculations

Spatial Analyst supports only two kernels.  One is the unweighted circular kernel K(s,t) = 1/(Pi * R^2) for values (s,t) inside the circle of radius R (and equal to zero otherwise).  The other is a quartic approximation to a Gaussian kernel.

Ideally, the Gaussian kernel is

K(s,t) = exp(-(s2 + t2)/(2C2))/(2*Pi*C2)

In polar coordinates s = r*cos(theta), t = r*sin(theta), this expression is

K(r, theta) = exp(-r2/2C2)/(2*Pi*C2)

The value of C determines the effective range of K; points (s,t) beyond a distance of about 3*C barely contribute to K.  K() is essentially flat beyond 3*C.  The quartic approximation capitalizes on this.  The only symmetric quartic polynomial equal to 1 at r=0, to 0 at r=3, and with derivative 0 at r=3 is

q(r) = (1-(r/3)2)2

Therefore Spatial Analyst's "Gaussian" kernel is a multiple of

K(r, theta) = (1-(r/3C)2)2

suitably normalized to have an area of 1.  These values are very quick to calculate and so the density is quick to compute.

This table compares the two density kernels offered by Spatial Analyst.

  "Kernel" density "Simple" density
Map of the kernel

(Contours of the kernel density start at 0.0 and increase by 0.1 units to 0.9.)

Radius 1.00 1.00
Maximum value 0.955 (3/Pi) 0.318 (1/Pi)
Area 1.00 1.00
Smallest nonzero value 0.001 0.318
Example of "kernel" density:

Maximum value attained is 4,336.

Example of "simple" density:

Maximum value attained is 2,841.

It is evident in the pictures, and supported by theory, that

The (approximate) Gaussian kernel on the left produces smooth-looking density grids
The simple kernel on the right produces discontinuous-looking density grids
For any given radius, the Gaussian kernel produces more localized estimates (typically, a Gaussian kernel with radius 2R is about as localized as a simple kernel with radius R).

Additional discussion of Spatial Analyst's quartic approximation to the Gaussian kernel appears in a discussion forum at http://forums.esri.com/Thread.asp?c=93&f=995&t=98735 (July 2003).

Computing densities with Spatial Analyst

This is simple.  Activate a point theme with the data (called f() above) and select the Analysis|Calculate density menu item.  Fill out the form.  If you are not offered a choice of area units, that is because you have not specified the map units in the View Properties dialog.

Unfortunately, Spatial Analyst does not directly compute densities for data supported on polylines or polygons.

Case study: Ecological Risk Assessment

(To be continued)