Chapters 17-20: Analyzing Spatial Relationships

Chapters 17, 18, and 19 all address the same thing: using features in one theme to select features in another theme.  You perform this "theme-on-theme selection" using a standard ArcView dialog.

This section contains several subsections:

A "recipe" for theme-on-theme selection
A discussion of spatial joins
A list of ArcView limitations with respect to spatial analysis capabilities
Exercises (these will take more time than most of the previous ones)

The trick with the "Select by Theme" dialog is to fill it in bottom to top.  Here's the recipe:

Recipe: Theme-on-theme selection

You want to select features in the target theme using certain features in a selector theme.

  1. Begin by selecting the features in the selector theme you will use for this process.
  2. Activate the target theme.  Choose Theme|Select By Theme.
  3. Drop down the list at the bottom of the dialog and select the name of the selector theme.
  4. Drop down the list at the top of the dialog and choose the selection method.
  5. (If necessary, specify a selection distance.)
  6. Review the dialog by reading it aloud, substituting the actual theme names for "active themes" (see below for examples).
  7. Press the appropriate button.

Here are some examples of what we mean by reading the dialog aloud, following the exercises in GTKAV:

Exercise 17a

Find gas stations (points in the [Stations] theme) within 1,000 feet of the I-40 freeway (a selected polyline in the [Streets] theme).

Because you are selecting stations, that is the active (target) theme.  Read the dialog aloud as "Select gas stations that are within 1000 feet of the I-40 Freeway."  The substitutions are underlined; specifically, they are:

Original phrase Substituted words Source of substituted words
features of active themes gas stations description of target theme (things to be selected)
Distance 1000 feet Selection distance value
selected features of Streets I-40 Freeway description of the features used to make the selection (selected features in the selector theme)

Exercise 17b

Find gas stations within a quarter mile of Ann's Mart Station #1963.

Selector theme:    [Business]

Target theme:    [Business]

Dialog:    "Select gas stations that are within 1,320 feet of Ann's Mart Station #1963."

Spatial Joins

There is a much more general definition of a database join that extends the concept introduced in Chapter 16.  It still creates a new table out of source and destination tables and it still involves one field (we will call them "key" fields) in each table.  It generalizes the idea of what it means for a source value to "match" the value in a destination record.  This generalization changes how a successful lookup is determined.

Part of the formal definition of a field is its domain.  This is the set of all allowable values for the field.  Typical domains include sets of numbers, sets of strings up to a given length, boolean values (just true or false are allowed), and ranges of dates.  A GIS may also allow domains to include certain kinds of features, such as the set of all points on the earth's surface: that would be the natural domain for the [shape] field of an ArcView point theme.

In short, a domain is a set.  We will think of this set abstractly and refer to its members as "points."  Why introduce this potential for confusion?  Because you can draw pictures of points.  We can draw points individually; when we want to consider more points than we can draw, we will create a line of points to represent the domain, like this:

Figure 1: A domain

To talk about joins, we need to show unique combinations of points from two domains.  We will emulate the procedure René Descartes used a long time ago and draw the second domain as a vertical line segment:

Figure 2: The product of two domains

The rectangle is the Cartesian product of domain X and domain Y and is written X ´ Y ("X cross Y").

Now we are ready to describe generalized joins.  What we want to do is to specify exactly which combinations of field values constitute "matches."  For the usual join, only combinations where the two values are exactly equal are allowed.  In this case, the two domains are the same and the set of allowable matches is described by the equation X = Y:

Figure 3: The (usual) join

You describe a generalized join very simply: just draw a set of points in [Destination domain] ´ [Source domain].  That's it.  Each point in the set describes an allowable combination for the join; points not in the set are not allowed:

Figure 4: A generalized join (shown in green)

By the way, the mathematical term for a set of points within a Cartesian Product is relation.  This is the same "relation" in the more familiar term "relational database."  For more information see Date, C., An Introduction to Database Systems.

No doubt you would like some concrete examples now!

Approximate matches: join source records whose key value (a number) approximately equals the destination record's key value (also a number, perhaps rounded).  (ArcView cannot do this.)

Partial matches: join source records whose key value (a string) matches some portion of the destination record's key value (also a string).  (ArcView cannot do this.)

Containment matches: join source records whose key value (a shape) is contained wholly within the destination record's key value (also a shape).  ArcView does this.

Proximity matches: join source records whose key value (a shape) is the nearest (among all source records) to the destination record's key value (also a shape).  ArcView does this.

Intersection matches: join source records whose key value (a shape) intersects the destination record's key value (also a shape).  (ArcView cannot do this.)

In ArcView you do not specify the relation.  Instead, it is implied by the dimensionality of the shapes.  Points and multipoints have dimension zero, polylines have dimension one, and polygons have dimension two.  This gives nine distinct combinations of dimension for selector and target themes.  The following table, taken from the ArcView help for "Spatial Join," describes how the combination of dimensions determines the relation used in the join.  In the table, "point" also includes "multipoint" shape types:

Destination shape type Source shape type
Point  Polyline  Polygon
Point Nearest Nearest Inside
Polyline Nearest Part of (that is, inside) Inside
Polygon     Inside

This is ambiguous, so we state it in words: when performing a spatial join, ArcView will:

Join a containing  polygon to any shape inside it
Join a containing polyline (source) to any piece of that polyline (destination)
Join the nearest point or polyline (source) to any point (destination)
Join the nearest point (source) to any polyline (destination).

Some Limitations of ArcView's Spatial Analysis

Here are some popular analyses that ArcView does not support (at least not efficiently and not without a lot of Avenue scripting).  Some other GISes do perform these analyses.

Find all distances between pairs of points (either in the same or different themes).  The usual application is to compute a distance table for points on a map, but this information is also a useful starting point for many kinds of statistical analyses of spatial point patterns.  See Ripley, Brian D., Spatial Statistics.
Within one theme, find the nearest neighbor to each point.  (A spatial join of the theme to itself will simply match each point with itself, which is not very useful.)
Associate with each (destination) polygon the first (source) point or polyline found lying within that polygon.  (These analyses correspond to the empty dark region of the table above.)
Perform a "nearest" spatial join, but also report the locations (along polylines or polygon boundaries) at which the closest distances are actually attained.

Exercises

  1. Write down the English equivalent of the theme-on-theme selection dialogs shown in GTKAV exercises 17c, 18a, 18b, 19a, and 19b.  In each case, as above, indicate which theme is the selector theme and which is the target theme.
  2. How would a multipoint-to-multipoint join differ from a point-to-point join involving exactly the same locations?  Can you think of any situation that would call for such a join?  Create a multipoint theme (use the summarize button on a point theme and ask for a merged shape) and experiment with joins involving multipoints to verify that they behave like joins involving points.

Answer the following questions using theme-on-theme selection in ArcView.   Use the U.S. data installed with ArcView (by default in C:/ESRI/ESRIData/USA).  Go in order--the first six get more difficult as you proceed.  Have fun!

  1. Through which states does the Mississippi River pass?   Save the result as a shapefile.  Display it as a theme in a view.
  2. How many (five-digit) zip code areas are within 100 miles of Pennsylvania (or New York, or your favorite state)?  What is their total area?  Save the result as a shapefile.   Display it as a theme in a view.
  3. How many non-Pennsylvania counties share a border with Pennsylvania?  What is their total area compared to the area of Pennsylvania?   Give the answer to the nearest five percent.
  4. Approximately what proportion of the U.S. population lives in a city located within 20 miles of a major river?  (Let "city" be defined as a region contained in the ESRI/ESRIData/USA/Cities.shp shapefile.)  Give the answer to the nearest five percent.
  5. (This is a follow-up to 4.)  Show, on a map, the states whose capitals are located more than 20 miles from a major river.
  6. Make a map of the major U.S. highways within the 48 conterminous states that clearly distinguishes those crossing major rivers from those that do not.
  7. Which of the preceding questions can be answered easily without using a GIS?  Which are answered much more quickly with a GIS?  Which ones would you not even want to attempt without a GIS?  Assume you have easy access to any printed atlases, census data, or whatever other raw information would be needed for the GIS-free approach.
  8. Which of questions 1-6 required using an appropriate projection?  Which ones would get the correct answer regardless of the projection used?  Why?

If you really get stuck, follow the links page to the Class Notes for 15 July 1999--it provides step-by-step solutions.