Exercise 29c (continued)--More Uses of Distance Grids

Let's return to the situation of Exercise 18b.  You have been exploring the relationships between middle school locations and population within four counties in the Atlanta metropolitan area.  Using vector analysis techniques--finding tracts with low population (densities), finding middle schools within them, restricting to middle schools in Cobb County--you have been able to locate a few areas to focus your search for a place to live.

Suppose that upon calling schools in those areas you discover that your children will have a choice.  They need not attend the middle school within their tract, but they will need to attend the nearest school.  This changes the analysis.

Let us begin by creating a map of distances to the nearest school.  Because you are familiar with ArcView now, we will merely sketch the steps to follow.

  1. Create a view containing the three shapefiles from GTKAV/Data/Ch18: midschol.shp, county.shp, centract.shp.  (Remember that this can be done in a single operation by selecting all shapefiles simultaneously in the Add Theme dialog.)  Provide the themes with meaningful names, such as "Middle schools", "Counties", and "Census tracts", respectively.

    We have symbolized the middle schools and the counties very simply because later more themes--the results of our analyses--will be added.  The counties are shown in outline only.
  2. Notice the coordinates are in degrees.  Since we will be working with distances, we must select an adequate projection.  The figures here use Georgia State Plane-West, 1983 datum.

  3. Set reasonable values in the Analysis Properties dialog.  We used the extent of the counties theme, rounding the left and bottom values down to the nearest 100 meters and the right and top values up to the nearest 100, thereby just including all four counties.  For accurate work a cell size of 50 or even 25 meters might be needed, but for now choose a size of 100 meters or greater, to keep the grids under one million cells.

  4. Computing distances beyond the county boundaries will be distracting and wasteful of time.  Therefore, we shall begin by creating a mask grid from the counties.  This is simple: activate the [Counties] theme and choose the Theme|Convert to Grid option.  The dialog asks you to name the grid dataset.  Watch out here: you must use a DOS-like name (eight characters at the most, no special characters or spaces).

  5. Go back to the Analysis|Properties dialog.  At the bottom, specify that your new counties grid will be the mask grid for analyses.  This will cause all grid operations to compute values only where the counties grid has data.  Where the counties grid has NoData values, every new grid will also have NoData values.

  6. Activate the [Middle schools] theme and select Analysis|Find distance.  Move the resulting grid beneath [Middle schools] and [Counties] in the Table of Contents.

    The default legend is too coarse to visualize fine differences in distance.  In the legend editor, under the Classification item, we requested 256 classes (by typing in the value--the largest value offered in the dialog is only 64).  We created a custom color ramp with the ColorRamp extension to emphasize the short distances.
  7. You can use this new [Distance to Middle schools] grid theme to explore distances interactively.  Use the identify tool with the grid theme active.  Here, we have made the [Census tracts] theme visible for reference, visualizing it in outline only so the distances will also be visible, and we have zoomed in.

    The distance contours form "bubbles" around each middle school location.  Bubbles meet at points equidistant from the two centers--that is, along the perpendicular bisectors of the lines joining two schools.  The bubbles divide the counties into "nearest-neighbor" polygons, also known as polygons of influence, Thiessen polygons, Voronoi polygons, and Dirichlet cells.  (This situation has occurred in analyses of many phenomena in the past century, whence the multiple independent discoveries and large variety of names.)
  8. Often, you do not need the distances, but simply want to know the name of the nearest point at any location.  Compute this grid by activating the [Middle schools] theme and selecting Analysis|Assign proximity.  The dialog will ask you for a field to use for identifying schools.  (Use [school_nam]; the obvious and innocent-looking [schools_id], although correct, will cause a Spatial Analyst bug to occur: "Integer lookup table range exceed [sic] 1000000".)  Notice the close relationship between the proximity grid and the distance grid above.  (In fact, both are computed using the same grid request in Avenue.  See the ArcView help for "EucDistance".)

    The colors are random.  They were lightened by 25% using the desaturate legends script.

    The analysis mask has limited the polygons of influence to the four counties.

  9. You can simultaneously activate both the [Proximity to Middle schools] and [Distance to Middle schools] grid themes.  (Use Theme|Hide legend to help manage the activation and positioning of themes in the Table of Contents.)  The identify tool will then report both the distance and name of the nearest middle school.

Here's an interesting follow-up analysis you can perform, once you have gotten this far.  A natural question to ask concerns the population naturally served by each middle school.  That is, if students always attend the nearest school (in terms of straight-line distance on the map), then what is the size of the population sending children to its school?  The answer to this question, when compared to actual school enrollment, is very useful for planning new school construction and setting school district boundaries.

The steps you might follow are these:

  1. Convert [Census tracts] to a grid.  Ask Spatial Analyst to use the tract identifiers [tracts_id].  Join the other attributes to the grid table, too.  Name this grid "Census tracts", too.

  2. The total population is not interesting.  What we need is the population density.  Unfortunately, there are two serious problems with the [Area] field in the [Census tracts] data set: it is in square decimal degrees, which is practically meaningless, and insufficient space was created in the attribute table, so many of the areas were rounded to zero.  As a fortunate byproduct of the conversion to a grid, however, we have a [Count] field in the grid theme of [Census tracts].  This is the number of cells used to represent each tract.  It is therefore directly proportional to area.  For example, if you were using a 100 meter cell size, each cell is one hectare, so the count is simply the number of hectares occupied by each tract.  (This is an approximation, but a good one when each count is large.)  Therefore, to compute population density, create a new field ("Density") in the attribute table for the [Census tract] grid theme, and compute it as [Pop_90] / [Count].  This estimates population per grid cell.

  3. Because of a limitation in the ArcView interface, you will need to create a new grid theme that displays the population densities before you can use them in a calculation.  Within the Map Calculator dialog (Analysis|Map calculator), simply double-click on the [Census tracts.Density] grid.  This will create a new grid whose values are the densities.  Rename the resulting grid theme [Density].

  4. Now summarize the population densities by [Proximity to Middle Schools].  Activate the [Proximity to Middle Schools] theme and select Analysis|Summarize zones.  Specify [Density] when asked what you want to summarize.

  5. The [Sum] field in the resulting table adds the population densities covered by each proximity polygon.  Because you computed the densities as a population per cell in step 2, this sum is exactly the (estimated) total population served by each school.

  6. You can join the summary table to the [Proximity to Middle Schools] theme using the common [Value] field.  Visualize the result using the [Sum] field.

In this map, darker values correspond to higher populations served.  Values range from 11,000 to 93,000.  The counties are dimly outlined and the middle school locations are still shown as gray circles.

Some of the urban schools (the cluster in the middle) serve relatively small populations.  Those serving the largest appear to be clustered in the east.

As a double-check, use the Field|statistics item to compute total populations based on (a) the original [Census tracts] theme and (b) the final summary table.

You will find an error in the original theme: fourteen tracts are listed twice, evidently because they span counties.  This means their populations are listed twice.  You can find these double-counted tracts by summarizing on the [tracts_id] field and looking for [count] values larger than 1.  Link the [Census tracts] theme to the summary and select [count] greater than 1.  This will cause the double-counted tracts to be selected in the [Census tracts] theme.  Using the Field|statistics item, compute the total of [pop_90] (answer: 204,432).  Subtract half of this from the total 1990 population (the value for (a) above) to adjust for the error (answer: 1,824,540).  This should agree with the total from your analysis (the value for (b) above) to within five significant figures.

Conclusions

This last calculation is difficult (but not impossible) to do without representing the data in grid format.  With the grid format, it is relatively easy and the computations are fast, provided the grids are not too large.  In general,

When the features are many and complex, and the analysis is complex, calculations with grids often are easier and faster.
When high accuracy is needed, grid sizes become very large and calculation times grow proportionally.  In these cases, vector-based analysis is preferable.