| The feature/symbol dichotomy: A GIS divides the functions of representing data on a map into at least five conceptually different parts: the attributes are classified (classification) and associated with geometric figures (projection), the classifications are associated with graphic symbols, and those symbols determine how the geometric figures will appear (symbolization of the graphic shapes). | |
| Emphasize substance over form: When working with complex software, like a GIS, deal with the important things first and take care of the details as late in the process as possible. | |
| Look for the familiar within the strange terminology: Many disciplines have, or could have, contributed to the development and evolution of GIS. Many techniques may already be familiar to you under different names. Do not let the terminology daunt you! | |
| Maps are statistics: Symbolizing features on a map to reflect underlying data is a statistical procedure. You can use statistical techniques to help determine appropriate symbolization methods. | |
| You are in charge: Always have a reason for your choice of data symbolization: do not let the computer choose. |
The Legend Editor is a complex dialog providing access to almost all ArcView’s capabilities to classify and modify the display of features. Every feature has a geographic location. The view erects a graphic shape at each feature’s location in order to show it to a person. A graphic shape is a combination of a geometric shape and a symbol, which records the visual features of the shape, such as its colors, line styles, and so on. The shape is determined by the geographic data, but the means of displaying the shape are not. Determining how the shape will display is the role of the Legend Editor. The Legend Editor can make this determination based on the values of one or more attributes of each feature. In this manner ArcView can show data on a map.
![]() |
The Legend Editor sets up a sequence of processes. The first is classification. This assigns features to groups according to the values of one or more of the feature attributes. The second is symbolization. This associates a symbol with each class. The figure sketches this process.
By glancing at the figure you can work out exactly what must be specified in the Legend Editor:
| The attributes whose values will determine the symbols | |
| The method by which attributes will be classified | |
| What each classification will look like (colors, line styles, hatch patterns, point symbols, text fonts) |
In addition, the Legend Editor controls how the classifications will be named or labeled in the View’s table of contents and provides capabilities to save and reload a legend.
GIS software differs in how it implements the Legend Editor capabilities, but it must implement these capabilities at some level.
Invoke the Legend Editor through a menu item (Theme|Edit Legend),
a button
,
or most simply by double-clicking on a theme’s legend in the view.
Things to watch out for:
| You have to press the Apply button before any changes will take effect. | |
| The Legend Editor dialog is in a separate Window that can be moved outside the ArcView window. However, it will never disappear behind ArcView. It can remain open while you do other ArcView operations (it is “modeless," not a child window). | |
| The Legend Editor will act differently on different kinds of themes: feature themes, image themes, and grid themes. | |
| The Legend Editor dialog has been long in need of improvement. Typically, changes to the legend type, values field, or classification type will cause unwanted side-effects, often destroying any colors or symbols you have already specified. The trick lies in specifying this fundamental information first. Only when you are sure you have chosen the right kind of legend, the correct attribute(s) to display, and the correct number of classifications should you bother with the details of specifying the symbols themselves. |
This last rule applies to all complex software: deal with the important things first and take care of the details as late in the process as possible (or never, if you can get away with it).
Find the answers by guessing and experimenting.
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Every map is to some extent a distortion of reality. Through this distortion some maps reveal patterns and other maps lie [see Monmonier or Tufte, for instance]. One of the subtlest forms of lying with maps is associated with the method of classifying numerical attributes.
We will experiment with different classification methods. The material here is background. For more information see http://www.colorado.edu/geography/gcraft/notes/cartocom/cartocom_f.html (section 6).
For each classification type below, N numerical values are to be divided into K classifications (“classes”) according to the rules to be described. Suppose the ordered numerical values are X1 £ X2 £ … £ XN and that they are associated with features F1, F2, …, FN respectively. In every case the classifications consist of non-overlapping intervals of numbers. The endpoints of the classes are the “breaks” or “cut points.”
|
Quantile |
The K classes will be constructed to contain as near to N/K values as practicable. For example, if N is 18 and K is 5, then N/K = 3.6, so the classes will contain either 3 or 4 values each. The cut points will always coincide with one of the original data values. |
|
Equal interval |
The range from the smallest value X1 to the largest XN is divided into N intervals of equal length. The length is therefore L = (XN - X1)/N and the endpoints are X1, X1 + L, X1 + 2L, …, XN – L, and XN. Some intervals may contain none of the original values (can you think of a simple example?). |
|
Standard deviation |
This is almost another kind of equal interval classification. However, the interval length is set to be some multiple of the standard deviation of the data. Multiples of 1 and 0.5 are common; smaller multiples may be used with lots of data. The starting value for laying off multiples is the mean of the data. Chebysheff’s Theorem states that no more than 1/L2 of the data values can lie beyond ±L standard deviations of the mean, so typically the intervals beyond L=±3 or L=±4 are merged since they will not contain many data at all. |
|
Equal area |
The breaks between the classes are set so that the total area of the features in each class is as close as possible to 1/K times the total area of all features. (This of course makes sense only for features with areas—that is, for polygonal features.) The result usually is a map that has about the same amount of every symbol on it—a kind of visual balance. (Thought question: could you describe, in detail, an algorithm for determining the equal area cutpoints?) Equal length and equal number legends are conceivable for polyline and multipoint themes, but ArcView does not implement these. (The features of a "multipoint" theme consist of zero, one, or more points.) |
|
“Natural breaks” |
Given a desired number of classes, K, the Natural Breaks method partitions the data into K subsets that minimize the sum of the "spreads" within each subset. ("Spread" is an informal term employed solely for this description.) When the data represent a random sample from a population consisting of two more more distinctly different subpopulations, and you know--or can accurately guess--how many different subpopulations there are, then Natural Breaks can do a good job of choosing classes which reflect the subpopulation groupings.
|
There are many other types of classification, but (for data sets with no “ties”) all can be reduced to equal intervals after applying a preliminary transformation of the data, Y = f(X). For example, dividing Y = log(X) into equal intervals is equivalent to dividing the range of the X’s into intervals of equal ratios, such as 1-2, 2-4, 4-8, and so on. Therefore classification methods for numerical data are usually determined by their effects on the resulting map rather than by some statistical method based purely on the data.
The classification method is an easy target for criticism when you make a map. Therefore, always have a reason for selecting your method. It is not sufficient to say, “the computer chose it.” Who is in charge, you or the machine?
Experiment with the data provided in GTKAV chapter 9 (counties.shp). Apply all five classification methods and vary the numbers of classes. Each team (computer) will select one of the several dozen numerical attributes to study.
![]()
This page was last updated 11 March 2004. It was reformatted. The "Summary of principles" section was added. Minor editorial changes were made to clarify portions of the text.
This page was updated 20 October 2002 to expand on the Natural Breaks description. Thanks to Daniel Karnes of Dartmouth College for pointing out the need for improvement.