Chapters 23 and 24: Converting, creating, and editing shapes

Perhaps the most interesting thing about ArcView's editing tools is what is not there.  If you stay around GIS for very long you will eventually hear whispers of "topology," "digitizing," "cleaning," "slivers," "gaps," and other monsters from the deep.  This section explains what these things are and how they arise naturally in the process of creating and editing shapes.  This should provide a framework for a deeper appreciation of the shape-editing tools ArcView offers.

The contents of this section are

Topology
Digitizing and Conflation
Exercises
Resources

Topology

Topology is the mathematical description and study of continuity--kind of the Platonic version of "the footbone's connected to the anklebone, ..."  It is especially powerful and interesting when applied to infinite sets and sets of objects in three or more dimensions, neither of which applies to GIS (currently).  So we're talking about really straightforward stuff here.

Representing and, more importantly, performing spatial analyses with vector features requires some representation of their connectivity: which points get connected in which order to form polylines, which polylines form the boundaries of polygons, which polygons are immediately adjacent to each other, how the polylines are connected to form a network.  The explicit representation of connectivity information, in the form of additional data, is what GIS users mean by topology.  But topological information does not have to be explicit: it can mostly be figured out, especially when the software is provided some hints.

For example, ArcView's shapefile format represents a polyline as the collection of its connected components.  This polyline has three connected components, so it has three parts (but just one attribute record) in the shapefile.

 

 

The shapefile represents each component simply as an ordered sequence of its nodes, which are the points where the straight line segments join.  Here, we have labeled the nodes in the order they appear, beginning with node 0.  The arrows visualize the sequence of the points in the shapefile.

 

The DCEL

Everything the GIS needs to know about this polyline is apparent in the picture.  However, for many analytical procedures, such as determining proximity, computing intersections, overlay analysis, or assessing traffic flows across a network, it is helpful for the GIS to maintain specific information about continuity.  One such structure is the doubly-connected edge list, or DCEL (Preparata and Shamos, Computational Geometry).  This is worth mentioning here mainly because some popular GISes (such as ESRI's ArcInfo) use a DCEL, or something like it, and datasets they produce ("coverages") are littered with bits and pieces of these topological data.

The [Fnode_], [Tnode_], [Lpoly_], and [RPoly_] fields identify continuous portions of arcs.  The [Lpoly_] and [Rpoly_] values 2, 3, and 4 correspond to the polygons these arcs bound, as suggested in the lower table.

ArcView apparently makes no use of this information, so you can safely ignore it, too.

Basically, a DCEL is a list of the edges (straight segments) in a polyline network or collection of polygons.  It describes each edge by giving the names of its vertices.  For example, the edges in the polyline above are 0-->1, 2-->3, 4-->5, 5-->6, and 6-->7.  This naming imposes a direction, or orientation, on each edge.  The orientation is useful for network analyses and it is useful for describing the relationships between the edges and any polygons they bound.

This configuration shows two adjacent polygons, Green and Blue.  Green has a "hole" in it.  The DCEL also describes the relationships between the edges and the polygons, or faces.  It will specify that Green is to the right of edges 1-->2, 2-->3, 3-->1 (the inner loop) and to the right of edges 4-->5, 5-->6, 6-->7, 7-->8, 8-->9, and 9-->4 (the outer loop).  It will also indicate that Blue is to the left of edges 5-->6, 6-->7, 7-->10, 10-->11, and 11-->5.

Finally, the DCEL provides information about the sequence of edges around each node.  In the preceding picture it will specify that 4-->5, 5-->6, and 11-->5 appear in counterclockwise order around node 5, and that 7-->8, 7-->10, and 6-->7 appear in counterclockwise order around node 7.  This information enables analysis algorithms to traverse the polyline boundaries in a systematic, regular fashion.

"Implicit" Topology

The ArcView shapefile format represents the Green and Blue polygons above as two separate, apparently unrelated features.

Green and Blue are shown slightly separated simply to demonstrate they and their nodes and edges are completely separate.

Green's nodes are numbers 1 through 3 and 4 through 9; Blue's nodes are numbers 10 through 14.  Although edges 5-->6 and 14-->10 coincide (for example), they are separately represented in an ArcView shapefile.

The shapefile contains no explicit information stating that arcs 5-->6-->7 and 13-->14-->10 run along the common boundary between Green and Blue.

The only way ArcView can know these polygons are adjacent is through a computation.  That computation would have to recognize that nodes 5 and 10, 6 and 14, 7 and 13 have the same coordinates and thereby form a common arc.  As a general rule, if you can see a relationship among features on the map, then--using the shapefile information alone--ArcView can figure out the topology, too.

The major exception to this rule occurs when features overlap.  Consider a simple network whose features are polylines.  This network might be part of a road map, for example.

The network contains two features, a blue line and a green line.  They intersect--or do they?  Suppose the horizontal blue line represents a major highway and the green line represents a small local road.  Perhaps there is no direct access to the highway from the road--the road may tunnel under, have a bridge over, or simply terminate on one side of the highway and continue on the other.  The picture provides no information to distinguish these possibilities.
This picture shows the same network with the nodes revealed.  The lines cross, but they do not have a connection.

This example correctly suggests that topological information goes beyond what can simply be seen in a map.

Ultimately, with very few exceptions, all the relevant topological information can be computed from the sequences of nodes used for describing polylines and polygon boundaries.  This is why shapefiles are an effective format for recording features.

Although you may hear that "ArcView (or the shapefile format) does not have topology" (often spoken in a condescending or patronizing way), you should now understand that any software capable of performing spatial analyses must be representing topological information.  The only question is whether that information is explicit in the data files or implicitly computed from the data.  In most situations, it does not matter.

Digitizing and Conflation

Spatial analyses frequently relate features obtained from two or more different sources.  The problems that can arise appear clearly when two sources represent parts of the same feature, because this affords a direct visual comparison of the two features.

A portion of the east coast of the United States

The overlay of the two maps to the right

As represented as part of a world map (that is, as s single country polygon)

As represented as a collection of individual states from a different feature source

The orange areas show portions of states that do not overlap the country polygon, and of course the gray areas peeking out beneath the overlay are areas within the country polygon not overlapped by the states.  It is apparent that the difference is not merely a matter of one map (on the far right) being more detailed than the other: they are also systematically displaced with respect to each other.

The imperfect matching of polygons that should be exactly adjacent produces splinters or sliver polygons.  The California coastline (from comparing the same two maps) produces a nice example.

This is the area showing points within the United States (as represented by the U. S. states map in dark blue) that are not contained within the United States (as represented by the U. S. map in gray).

Sliver polygons can also be created between features within the same map.  Typically, map features are separately digitized.  For example, the GIS (or CAD) operator will trace the boundary of each state with a digitizing tablet or mouse.  Boundaries common to pairs of states will therefore be traced twice.  The human hand is not going to digitize exactly the same points along common boundaries of two polygons.  There will be places where the two polygons overlap (but should not) and there will be gaps between the states that should not occur.

The next figure shows slivers and gaps created by digitizing North Dakota (blue) and Minnesota (green).  The region within the yellow box is magnified on the left to show the gaps and slivers that occurred.

(Such gaps and slivers did not appear in the original data set because it had been "cleaned."  We had to hand-digitize these boundaries in order to produce the gaps and slivers shown.)

These gaps and slivers create many problems:

They represent obvious inaccuracies.  The maps look bad.

They can cause errors in spatial analyses: polygons that should be adjacent might not be, points that should lie within some polygon might lie within no polygon, points may lie in more than one polygon, and distances may be inconsistent.

A GIS that computes and stores topological information will likely want to represent each gap and sliver individually.  This can add many undesired features the dataset and complicate its management.

Editing features becomes very difficult.  Features with small splinters, separate connected components, and holes are much more difficult to modify than are connected, simply-connected features without sharp angles.

Some GISes resolve these problems by providing "cleaning" capabilities to identify and help eliminate the gaps and slivers, while adjusting the neighboring polygons so their boundaries exactly match.  (This conflation is "the procedure of reconciling the positions of corresponding features in ... data layers."--Aronoff, Stanley, Geographic Information Systems: A Management Perspective.)  Others, such as ArcView, ignore them altogether.  To an extent, ArcView can get away with that, because its internal topological computations tolerate overlapping polygons and it does not have to separately represent each little piece of the overlap.  However, this implies the user--you--must be sophisticated enough to anticipate, identify, and correct potential problems, especially when you are using ArcView to create features or to analyze spatial relationships among features from different sources.

Exercises and Things to Think About

  1. This is an in-class exercise best performed on networked computers with access to a shared disk.  Each class member will complete Exercise 23b in GTKAV.  The result is a shapefile with two features for the cinder cone and lava flow.  Save the resulting theme as a shapefile on the shared disk.  Give it a unique name.  (If you share a machine with someone, let each of you do this exercise, creating two separate shapefiles.)  When everyone is finished, add all these themes to your view.  Use ArcView's tools (zoom and measure in particular) to quantify the deviations among the features.  Multiply by the view scale to adjust map units to actual distances on the monitor.  In terms of actual distances on the monitor, what is the typical imprecision in the shape outline?

  2. Exercise 24b shows you how to "union" features.  Experiment with the other three operations on the Edit menu--"combine," "subtract," and "intersect."  Describe exactly what they do.  Can you describe them in terms of boolean operators (see the notes for Chapter 13)?  Which operation produces two results (rather than one)?  Match the operations with the figures below, which result from applying each of these four operations to two overlapping rectangles:

Which of these operations was used to produce the sliver polygons in "A portion of the east coast" figure?

  1. Suppose you needed to digitize a map of the 48 conterminous States or of the counties in New York.  One approach is to digitize each state or each county separately, then conflate and clean the results to eliminate slivers and gaps.  Describe a strategy, using ArcView's feature editing tools, that completely avoids this by never digitizing any shared boundary arc more than once.  Try out your strategy by tracing over a theme of the states or counties.

  2. In ArcView, how would you go about creating disconnected shapes (like the state of Michigan)?  Shapes with holes in them (like the country of South Africa)?  Shapes that span both sides of the 180 degree meridian?  (The problems occur when you try to project such shapes.)  On your computer, create examples of each kind of shape.

  3. A simple polygon is one that does not intersect itself in its interior (that is, we will allow the boundary of a simple polygon to contact itself; for example polygon 1 in the figure above can be represented as a two-part simple polygon contacting itself at just two points).  What happens in ArcView when you attempt to create a polygon that is not simple?

Resources

ArcView produced all the figures on this page.  It needed help, though, in the form of scripts and extensions to enhance its data processing capabilities.

ArcView comes with scripts to convert polygon features to polylines and polyline features to polygons.  (Look in the help system under Contents|Sample Scripts and Extensions|Sample Scripts|Views|Data Converion/Alteration.)  These scripts have some artificial limitations that cause them to fail on "interesting" features (such as polygons that have holes).  Here is a modification (polygon_to_polyline.ave) that successfully converts all valid polygon features to polylines.

To show the nodes of polylines, we needed a special extension.  Here is one (poly2pts.avx) that converts polylines to points and labels the points in a meaningful way.