Introduction to Map Algebra

"Map Algebra" is an informal language for manipulating representations of continuous variables defined over a common domain.  As such it does not introduce any new capabilities within a GIS.  However, by providing a systematic, succinct way to articulate processes performed on GIS datasets, map algebra enhances how we think about and implement these processes.

Data types

This section focuses on map algebra operations available for gridded datasets, such as those implemented in ESRI's Spatial Analyst extension for ArcView.  Similar operations are available in other grid-based GISes, such as GRASS.

The data associated with any grid cell can be of any type whatsoever.  It is conceptually useful to divide data types into several classes, however.  These include:

Categorical data.  These are non-numerical data.  Grids that classify land use or land cover exemplify this category.  Other examples are proximity grids (values identify the nearest object) and feature grids (only two values are possible: one value for cells where features occur, another value--typically zero or NoData--where features do not occur).

Integral data.  These data may be relative ranks or preferences or they may be counts of occurrences or observations, for example.  Thus, what they measure is inherently integral.

Floating-point ("real" data).  These typically represent a real  surface, such as elevation, or the values of a scalar function (a "conceptual surface," if you will).  Examples of such functions would be temperature, slope, amount of sunlight received per year, distance to the nearest feature, population density.

Vector data.  These are ordered tuples of real values that represent fields of directions.  For example, hydraulic gradients (for two-dimensional groundwater models), wind velocities (again for two-dimensional models), and ocean currents are two-dimensional vector fields.  Vector data may have more than two dimensions, even though they are defined over a strictly two-dimensional domain.  For example, models using astronomical data, such as climate models, may make use of information about the three-dimensional location (on the earth's surface) of each grid point.

(Scientific visualization systems usually have built-in support for vector data, whereas most GISes require the modeler to represent vector data as an ordered collection of floating-point grids.)

Representation details

An important issue in GIS concerns unifying the representation of features.  Many of the techniques presented in this course bear witness to this issue: reprojection, changing datum, changing scale, and changing units of measurement are all methods "georeference" feature data so they can be accurately compared.  We have seen how converting vector features to a raster representation is necessary for carrying out many grid-based operations.

For raster data, the unification issue is made more complex by the choice of grid extent and size.  Two identically georeferenced raster datasets can (and usually do) have different extents and different cell sizes.  Most grid algorithms, however, treat the grids involved as matrices of data--arrays of numbers.  To accomplish this, all input grids must have identical extents and cell sizes.

Consider one of the simplest operations possible: determining whether two grids are equal.  Here we mean "equal" in the sense of representing the same geographic feature, rather than having strictly identical computer representations.  Thus, having different cell sizes does not automatically imply two grids are unequal.  A comparison of their values is still possible.

Look at the two grids illustrated on top.  The yellow grid on the left has a cell size of 1.6 meters.  The blue grid on the right has a cell size of 1.3 meters.  Some of their values are posted in the illustration.  Their overlay appears beneath the two.  To show both grids, the blues were made translucent.

Consider this region covered by the upper left yellow cell, with a value of 594.  It is overlapped by portions of four blue cells having values 133, 283, 556, and 846.

 

Conceivably, these two grids represent the same spatially-varying quantity in this region, even though none of the blue values agree with the yellow value.  This is possible because the blue values (shown) bracket the yellow value of 594, so that some appropriately weighted average of the blue values really will equal the yellow value.

If this is the case, then when comparing the blue grid to the yellow grid, we will first want to perform an appropriate interpolation of the blue values at the center of each yellow grid cell.  This process is called resampling.  There are three methods in wide use:

Nearest-neighbor interpolation

Bilinear interpolation

Cubic convolution (or bicubic interpolation)

An example of each, as applied to an image, appears at http://us.ceo.org:8080/ams/tutorials/geomcorr/geomcorr.html#irm.

There is a trade-off between computational complexity and accuracy.  The methods are listed in order of increasing complexity and accuracy.

Nearest-neighbor, the simplest method, simply assigns to each yellow cell the value in the blue cell nearest its center.  By not requiring any numerical computations, this method works for categorical data.  Because it does not really interpolate values, it generally performs poorly for numerical data.

Bilinear interpolation uses the four blue values surrounding the center of each yellow cell.  By translating and rescaling the coordinates, which will not change the interpolation, we may suppose the yellow square is centered at (x, y) and the centers of the surrounding blue cells are located at (0,0), (1,0), (0,1), and (1,1), where they have values Z00, Z10, Z01, and Z11, respectively.

Linear interpolation on the bottom row of neighbors, between (0,0) and (1,0), estimates the value Z0 at (x,0) as x * Z10 + (1-x) * Z00.  Likewise, linear interpolation on the top row of neighbors, between (0,1) and (1,1), estimates the value Z1 at (x,1) as x * Z11 + (1-x) * Z01.  Finally, linear interpolation between Z0 and Z1 estimates the value Z at (x,y) as y * Z1 + (1-y) * Z0.

By substituting the expressions for Z0 and Z1 into that last formula you can see that the formula for Z is a polynomial involving powers of x and y no greater than 1, so it has four coefficients: Z = a + b*x + c*y + d*x*y.  Because these four coefficients were determined by four values (Z00, ..., Z11), they are in general uniquely determined by the data.  This immediately implies that the comparable procedure of first interpolating along columns (in the y-direction) and then interpolating the results in the x-direction will give the same result, because it, too, will have a similar formula with a unique solution.  

Note that the term "bilinear" derives from the process of linear interpolation (twice in one direction, then once in the perpendicular direction), not from the formula for Z.  The formula involves a term with x * y, which is not linear.

Calculating a bilinear interpolation on a grid takes about five times as long as a nearest-neighbor interpolation.

Cubic convolution generalizes bilinear interpolation.  The sixteen blue cells surrounding each yellow cell determine the interpolated value.  First, four cubic "convolutions" are performed in one direction (horizontally or vertically) and then one more cubic convolution is performed in the perpendicular direction.  A "convolution" replaces each neighboring value by a standard curve centered at the neighbor's location and scaled according to the neighbor's value.  The graph below shows an example of one standard curve, which consists of pieces of four separate cubic curves joined at points X=-1, 0, 1 (the branch from -2 to -1 is not shown):

The graph shows how a value of 1.0 at the location X=0 is replaced by the curve, which is a symmetric function of X.  It also shows the nearest-neighbor interpolator for comparison.

In these figures, interpolation occurs only between the values X=0.0 and X=1.0 because this is the range of X values whose neighbors are located at points -1, 0, 1, and 2.  Outside this range there are different neighbors.

Interestingly, because the potential values of this standard function extend beyond the interval from 0.0 to 1.0, this "interpolator" can extrapolate values:

In this graph, the neighboring values for any point X between 0 and 1 are 1, 0, 0, and 0.  All "interpolated" values, however, are negative.  In a sense, the high value of 1.0 at X=-1 "overshoots" as it descends to the value of 0.0 at X=0 and then "recovers" to the value of 0.0 at X=1.

Here is a generic picture of a cubic convolution:

This figure was constructed by centering standard curves at X=-1, 0, 1, and 2 and rescaling them by the four values attained at those points.  The convolution is the sum of the four curves.

Evidently cubic convolution produces a more continuous result than either nearest-neighbor or bilinear interpolation.  It also does a better job at reproducing the statistical and spectral properties of the original grid, because it does not merely smooth (average down) all values.

Cubic convolution requires about seven times as long as bilinear interpolation.

For formulas and a more detailed discussion of all three interpolation methods (plus some others), see ftp://earth1.esrin.esa.it/pub/stb_ftp/asd_26-27.pdf, Section 2.6.7.  There is a typographic error in the formula for the cubic convolution curve; the coefficient "-5" should be "-5a".  The Spatial Analyst cubic convolution uses a value a = -1, which is the value used for the illustrations in this section.  Spatial Analyst uses an undocumented technique to adjust for edge effects along the borders of a grid.

Resampling with Spatial Analyst

Spatial Analyst will automatically perform nearest neighbor resampling when it needs to.  For example, suppose you have a grid theme named "blue" with a cell size of 1.3 meters but that the analysis properties are currently set to a 1.6 meter cell size.  The following Map Calculator expression, which simply creates a copy of [blue], will also resample it to the required cell size:

[blue]

You can control the resampling by using the Resample request.  Read about this in the ArcView help.  The following examples illustrate all the possibilities of its use.

  1. Use bilinear interpolation to resample to a 1.6 meter grid

[blue].Resample(1.6, #GRID_RESTYPE_BILINEAR)

  1. Use cubic convolution to resample to a 1.6 meter grid

[blue].Resample(1.6, #GRID_RESTYPE_CUBIC)

  1. Use nearest-neighbor interpolation to resample to a 5 meter grid, even though the analysis properties are set to another size

[blue].Resample(5, #GRID_RESTYPE_NEAREST)

Things to watch out for

Spatial analyst will maintain the type of data: integer grids will be resampled to integer grids by rounding the interpolated results, if necessary.  This will cause a loss of precision for grids that contain numeric values rather than categorical data.  Avoid this problem by first converting the grid to floating-point representation, as in

[blue].Float.Resample(1.6, #GRID_RESTYPE_CUBIC)

The float request, which converts grids to floating point, is applied before the Resample request, because ArcView executes requests left to right.

Make sure you only use nearest neighbor interpolation for grids with categorical data.  For these grids, interpolation has little or no meaning.  For example, if your grid uses a value of 0 for wetlands, 1 for desert, and 2 for urban land cover, then interpolating between wetlands (0) and urban land (2) can easily produce values near 1 (desert), which is ridiculous.

Remember, any image using a color palette is categorical.  USGS topographic maps (so-called digital raster graphics, or DRGs) are typically represented this way.

Resampling usually changes the statistical and spectral qualities of a grid.  For example, both bilinear and bicubic interpolation, being true interpolation methods, rarely exactly reproduce the maximum or minimum values in a grid.  Nearest-neighbor techniques will lose data when resampling to a larger cell size.  Therefore, perform statistical and spectral analyses before any resampling, if possible.  If that is not possible, consider resampling to a smaller cell size than needed.  This will improve accuracy at the cost of creating much larger grids, thereby slowing down all the computations.