Chapter 15: Tables

Now, more than ever before, you will need to keep in mind the distinction between an ArcView document as a way of viewing data and the actual data.  When you "edit a Table" you are really editing one part of the underlying data file.

Exercise 15a--Displaying Tables

In ArcView you typically will be editing files in dBase format--dbf files.  Understanding this format will help you appreciate some of the limitations to data modification and help you find work-arounds for those limitations.  It is worthwhile understanding the details of at least one database format because it provides insight into database formats in general.

dBase Files

Data in dBase format are maintained in three or more related places in the computer's file system.  Only two of those places, both of them actual files, contain the data.  The others, whose structures may vary according to the database software that uses them, are index files.  Index files provide information for rapidly finding or summarizing the data, but they do not contain the actual data.

One of the two data files is an old kluge for maintaining variable-length data, or "memo" fields.  ArcView (and most other systems) do not use the memo file (dbt extension, usually).

That leaves one file holding the core data.  This is the file with a dbf extension.  It is structured into three parts:

The header is a fixed set of 32 bytes that contains basic information about the dBase file: how many physical records it has, the date it was last changed, its format (there are several versions of dbf formats), and--most importantly--how many fields it has.  This information is encoded in binary form: you cannot directly read it with a word processor or directly print it out.

The field descriptor is a series of 32-byte blocks (followed by a newline character) containing information about the fields.  This information consists of field type (numeric, character, date, logical, or memo), width, and precision (applicable of course to numeric fields only).  Only ten bytes are reserved for the field name itself; there is simply no place for longer names.  All this information, except the field names and type codes, is also in binary format.

The data records are a series of fixed-length blocks containing the data.  The data are in ASCII format: you can easily read them with a text editor if you can separate them from the header and field descriptor that precede them.  However, (a) there is nothing separating one block from another and (b) each block is preceded by a single character (think of it as a hidden field if you like) used by the original dBase software to mark blocks as "deleted."  An end-of-file character follows the last data record.

(For all the details, consult the dBase file formats web page.)

The table used in Exercise 15a, [Attributes of Clayton County], resides on disk (or CD) as GTKAV/Data/Ch15/Clayton.dbf.  The following is a schematic of its physical structure: to present it, the binary data have been translated into readable values.  Where binary data occur a light yellow background appears; where ASCII data appear, the background is white.  [If you like, download a small DOS executable that prints a dbf header and field descriptor.]

Byte offset Function Contents
Header  
0-31 header Signature: 03 [that is, this is in dBase III format]
Date: 6/6/96
# Records: 24
Reclen: 102 [this is the number of characters in each data record]
Field descriptor  
32-63 first field AREA N 12.3 (this means the field [Area] is numeric (N) of width 12 and 3 decimal places)
64-95 second field PERIMETER N 12.3
96-127 third field CLAYTON_ID N 16.0
128-159 fourth field TRACT C 12.0 [character fields have no decimal places]
160-511 11 more fields  

HH_SEG1, ..., HH_SEG10, TOWNSHIP descriptors

 

512 end of field descriptor Newline character
Data  
513-614 First data record        0.001       0.182               213063040100 0  0  0  0  0   0 000  0College Park            
615-716 Record 2        0.001       0.098               313063040301 0  0  0  0  0   0 000  0Forest Park-Morrow      
717-2858 Records 3 through 23  

...

 

2859-2960 Record 24        0.004       0.438              2513063040704 0  0  0  0  0   31201  0Lovejoy                 
2961 End of file EOF character

ArcView presents these data to you in a Table like this:

If you closely compare the first two records in the schematic and Table views, you will see how ArcView picks apart each record into its constituent parts and places each value into an appropriate column.

The most important aspect of this format is that field descriptor records, data records, and even the places for fields in the data records are fixed in size.  If you wanted to change a field's width, for example, then you would have to change almost the entire file (to push subsequent characters away or to squeeze out the unwanted characters).

Here are some additional consequences:

The restriction to ten-character field names is inherent in the dbf structure.  It is not imposed by ArcView.

If you want to change a field's width, you will have to create a new field, copy values from the old field to the new, and then delete the old field.  Again this is not an ArcView limitation; it is a physical one.

The deletion bytes preceding each record may cause confusion when exchanging data using dbf files.  Some software (like dBase itself) will pretend that records marked for deletion do not exist--they are effectively hidden.  Other software, such as ArcView, simply ignores the deletion byte and shows all records.

Every value for every field has its space reserved for it.  Something has to go into that space.  Thus, there is no such thing as a truly "empty" or "null" value.

Adding a record to the end of the file is quick.  Inserting a record into the file requires moving all subsequent records, so insertions can require a lot of disk input/output.

Likewise, deleting records, especially those early in the file, can be time-consuming.

Literally anything can go into the data records: characters in numeric fields, for example.  The validity of the data depends on software enforcement of the field types. 

How a field is to be interpreted depends entirely on its field descriptor.  For example, the string "00.0" in a field may be interpreted as a number or as a string of text depending on the field type.  In short, the format does not identify field types through any intrinsic properties of the data values.

There is no information on how data should appear in the software: nothing about fonts, colors, justification, and so on.

There is no special information restricting the kind of data that may be in a field: for example, there is no way to specify that a numeric field must contain only positive values.

Using dBase files for exchanging data

Most database, spreadsheet, and statistical programs can read dBase files; many of these programs can also create dBase files.  We will take Microsoft Excel as an example to illustrate the issues that arise in exchanging data with ArcView. 

Consider this spreadsheet, shown here in its entirety as it appears in Excel 2000 (earlier versions of Excel, at least back to 3.0, act similarly):

The bordered area was highlighted and saved as a dBase file using Excel's File|Save As command.  It issued several harmless warnings--it can save only one sheet, saving to dBase III files may lose some Excel features (as discussed above)--but otherwise indicated a successful save.  Indeed, the screen shot above was taken after the file was converted to dBase format; the Excel title bar says  "Microsoft Excel - sample.dbf".

Here is what the same file looks like when added to ArcView as a Table document:

ArcView is indeed showing the data correctly.  A lot of bad things have happened, some of which are due to dBase format limitations, but most of which are not:

Most of the [Name] values have been truncated.

The [Conc] values have been rounded to integers (3.2 became 3, 1.9 became 2).

The [X] and [Y] fields are string fields (note that they are not right-justified in ArcView, which hints at this).  This means you cannot query, sort, or calculate with them as you would expect.

The [Conc] value of "<100.0" has disappeared.

"Numeric Identifier" has been shortened and changed to "Numeric_id".

The dBase file contains a blank record (record 5).

The dBase file contains a new numeric field, [N7], all of whose values are blank.

The errors are listed from most to least serious.  Evidently, Excel's conversion of spreadsheet data to dBase format has caused unexpected yet inconsistent data loss and data modification.  (Earlier versions of Excel were worse: Excel 5.0 would change the value of 1,000,010 to 10!)

The problem has two root causes:

  1. Excel can make the data look like something they are not.
  2. Excel makes assumptions about field structures based largely on the first record saved.

For example, Excel correctly converted the data in the "Numeric Identifier" column to numbers because the value in the first row was explicitly numeric (0).  However, it incorrectly converted data in the "X" and "Y" columns to strings, because the empty cells in the first row were interpreted as strings, even though these columns contained only numbers (apparently: have you spotted the error in the "Y" column yet?).  Finally, Excel estimated the width of the [Name] field based on the width of its column, which was too small to hold the values actually in it.

Therefore, if you are going to convert spreadsheet or other non-database data to dBase (or similar) format for exchange with other software, you need to learn how to avoid such problems.  For Excel, good results are obtained with the following procedure.  The steps are designed to provide as many hints as possible to Excel about the nature of the data you will convert.

Recipe for using Excel to create dBase files

  1. Reorder the data rows if necessary so that the first one has valid, non-blank values in all fields.  If there is no such record, create a "fake" first record and fill it with valid non-blank values.  You can delete it after the conversion has occurred.

  2. Format each numeric and date column explicitly.  Make sure the entire column is formatted, not just part of it.  Specify the number of decimal places you want.

  3. Widen all columns to contain all the data in them.  One way is to increase the font to 14 points and auto-size all columns.  This seems to work well.  You can increase column widths beyond this minimum to accommodate future values that might be even longer in width.

  4. Type short meaningful field names in the single header row.  Remember, the dBase format limits you to 10 characters.

  5. Delete any blank rows placed in the spreadsheet merely for formatting purposes.

  6. Check that all values in all numeric columns really are numbers.  Fix any errors.

  7. Remove any hidden rows or columns within the data.  (Excel will not export data in hidden rows or columns, anyway, so unhide any data you want to export.)

  8. Verify that there is no named array called "Database."  If there is, remove it or set it equal to the array of data (including the header row) you want to save in dBase format.

  9. Save the spreadsheet at this point in its native Excel format.  This preserves everything you see, including all formatting, giving you a point you can return to if the next steps reveal errors.

  10. Convert the spreadsheet to dBase format using Excel's save as command.  dBase III and IV formats are compatible with ArcView.

  11. Close the spreadsheet immediately.  (Minimizing does not work: it has to be closed.)  Now open the .dbf version you just created.

  12. Check the results carefully.  If they are unacceptable, close the .dbf version and reopen the Excel version.  Fix the problem(s) and resume at step 9.

  13. Clean up: if the results are ok, delete your initial "fake" record if you used one and re-save the file in dBase format.

We performed these operations while ArcView was open.  Here is our sample spreadsheet after its conversion to dBase format in step 10:

And here is what it looked like in ArcView:

What happened?  The structure and contents of the file changed while ArcView was open.  It is still trying to parse the data according to the old file structure.  The solution is to have ArcView re-read the entire file so it is informed of the changes in data structure.  Use the Table|Refresh menu item.  Now the table looks like this in ArcView:

You can see from the right-justification of [Id], [X], [Y], and [Conc] that they are indeed numeric fields.  Evidently, we omitted step 13 (the clean-up step) which should have deleted the fake record.  That's no problem--start editing the table, select the fake record, delete it, and save the edits.

Last updated 29  May 2000