|
|
Evaluating Encryption Methods Using ArcView's Spatial Analyst ExtensionIntroductionArcView's Spatial Analyst extension is useful for visualizing and analyzing more than geographic images and data. This page provides an example of its versatility in evaluating some simple encryption schemes: alphabet substitution, Vigenere, and a modified alphabet substitution method. The evaluation demonstrates that a Vigenere code with short period and localized displacement is the second-worst conceivable choice for general purpose encryption.BackgroundEncryption is the process of converting a message (the plaintext) into a string of characters (the ciphertext) that is not readily readable An encryption method is commonly called a "code" or a "cipher". The alphabet substitution is the basis for the ciphers you find in the comics section of the newspapers. To each of the 26 letters (in English) is assigned a unique symbol (often just a different letter). To encrypt the plaintext, each letter is replaced according to the assignment.. The ciphertext is decrypted by reversing the assignments and following the same character-by-character substitution procedure. This method is similar to that used by the Cap'n Crunch decoder rings found in cereal boxes decades ago. These rings consisted of an outer annulus that could be rotated about the circular center of the ring. Around the perimeter of the center are the 26 letters of the English alphabet. The same 26 letters are written at corresponding locations on the outer annulus. Any rotation of the annulus would create a one-to-one matching between inner and outer letters that could be used for a substitution cipher. For example, if the outer ring were offset three characters relative to the inner ring, then an A would be replaced by a D, a B by an E, a C by an F, and so on. Since the mechanism is circular, at the end of the alphabet an X would be replaced by an A, a Y by a B, and a Z by a C. A nice way to describe this substitution is by enumerating the letters: let A be 0, B be 1, ..., Z be 25. In this particular instance, the substitution amounts to adding 3 to each letter's value. The result is "wrapped around" to zero if it exceeds 25. (This is called addition modulo 26.) The Cap'n Crunch substitution scheme has a formula if we think of the letters in terms of their numerical equivalents: Encrypt(#) := # + 3 modulo 26, where "#" stands for any letter. This simplest of all substitution methods has the general formula Encrypt(#) := # + k modulo 26, where k is anything from 1 through 25. (The value k=0 doesn't do anything and so does not qualify as an honest encryption method.) It is perhaps more revealing to write this formula in a different way:
When this constant is small, each letter is replaced by one close to it in the alphabet, making the nature of the encryption even more apparent to the casual eye. Without needing to be more precise, I will refer to an encryption method that replaces letters by those "close" to them as a localized method. The Vigenere (vee zhen air') method can be defined in terms of a sequence of pre-set Cap'n Crunch rings. To encrypt a message, you would use the first ring for the first letter, the second ring for the second, and so on, returning to the first after running through all the rings. The sequence of ring settings is needed for easy decryption of the ciphertext. This sixteenth century method was considered very strong until perhaps the mid-nineteenth century. Strong encryption methods, like the DES standard (U.S.), produce random looking output, regardless of any regularities or patterns in the input (plaintext). Encryption is sometimes used by software vendors to protect the source code of scripts, macros, or programs written using their products. Thus, anyone can encrypt as much plaintext as they like, provided they have control over the keys (such as the settings of the Cap'n Crunch rings). "Breaking" these encryption schemes becomes particularly easy because there is then limitless data to evaluate. Indeed, the main trick lies in determining what encryption scheme is being used. Visualizing an Encryption MethodSuppose you can feed a lot of text into an unknown encryption program (a "black box"). Out comes a lot of ciphertext. Visualization is a powerful way to compare the ciphertext to the plaintext. Patterns in an appropriate visualization give vital clues to the workings of the encryption program and suggest appropriate decryption methods. Since, as noted above, letters in an alphabet can be considered numbers, it becomes possible to map these letters into a space for visualization. Even better, as suggested by equation [1], you can visualize the differences between ciphertexts and their plaintexts. Here we will consider only the simplest kinds of visualizations, but remark that the procedure can be carried out in much more sophisticated ways. The standard way to represent letters (and other characters, such as digits, punctuation, and spaces) is the ASCII method, but any systematic method will do. All the characters appearing on the usual keyboard are assigned values between 32 (a space: ) and 126 (a tilde: ~). Other values between 0 and 255 represent other kinds of characters, such as tabs, newlines, and accented characters. This is, in effect, an alphabet of 256 characters, not 26, but the ideas remain the same. Differences will be computed modulo 256, that's all. Taking a tip from equation [1], the kind of visualization presented below looks at how the ciphertext differs from the plaintext in each character position. The differences will be shown as a series of square dots in a two-dimensional space (see the images below, where each pixel or cell in the image is one of the dots). The horizontal position (X coordinate) is the position of the character within the text. The vertical position (Y coordinate) for the moment is arbitrary. It will be used for displacing one plaintext/ciphertext difference above or below another, allowing quick review of lots of encrypted text within a compact space. The dots are colored according to the numeric values of the differences at their positions. For example, no matter what the plaintext is, with a Cap'n Crunch substitution, all differences will be a constant. The visualization will be a single color. This fact alone tells you that your black box is a Cap'n crunch ring. The color tells you the ring's setting. Because this is as simple as the picture can get, I characterize it as the worst possible encryption method of this type available. The simpler the encryption, the simpler its visualization. You will see examples below. Where the visualization is particularly simple, you can bet that a casual inspection of some ciphertext will turn up patterns and provide important clues to the decryption.
The colors graduate from yellows through green to blues as the differences between encrypted and original characters range from -128 (the smallest possible value) to 127 (the largest possible value). You're actually looking at only a part of the plot: the bottom full stripe shows what happened to a line of "1" (ASCII 49): the value added to each is bright yellow, representing -116. Thus the encrypted version of a "1" is a value of 49 - 116 = 189 (mod 256). The horizontal stripes--which are just connected sequences of the square dots that make up the visualization-- show that the encryption does not depend on the position of the characters. The relative simplicity of this image (its apparent nonrandomness) belies the simplicity of the encryption method. Indeed, with no clues, no computer, and just a small amount of ciphertext, most people can break this method in a few minutes. With actual plaintext and ciphertext to compare, decryption is a simple and mindless exercise.
The verticality of the stripes shows that in any given position a Cap'n Crunch method is being used: the differences are constant. That the stripes vary in color shows that several ring settings are used, depending on position. The repetition in the pattern--it cycles every eight positions--proves it to be a Vigenere code. It ought to surprise you that the various colors of these stripes represent a very tight range of values: between -8 and 5. This is very limited compared to the range of -128..127 that is possible. This makes it a highly localized code. By comparing this image to the preceding one, you can see that this particular code is inherently simpler than a random alphabetic substitution (which does not show a repeating pattern in its stripes)!
Compare the picture at right (the third one) to the first picture. Note the vertical repetition. (This picture goes approximately from ASCII 45 to ASCII 81, so you see slightly over four cycles.) It looks just like the Vigenere code, but turned sideways. This is the sense in which the two schemes have the same "inherent" non-randomness. Thus, a localized Vigenere code with short cycle is actually simpler than an alphabetic substitution. In terms of apparent non-randomness, this is the second worst encryption method (after the pure Cap'n Crunch method).
The colors range from gray through red this time, just for variety. There should be no apparent pattern within any narrow vertical strip, since the tables were randomly generated, but the table reuse should be apparent in the horizontal repetition. (100 positions are shown here; the characters shown range from ASCII 32 to ASCII 102, approximately.) Nevertheless, your visual impression is correct: this code is inherently more complex than the previous ones and should produce more random-looking ciphertext.
This one shows the values in grayscale: black is -128, white is 127, and various levels of gray depict intermediate values. These values were generated at random using ArcView 3.1's Number.MakeRandom request. There should be no trends in colors or textures within this pattern at all. ConclusionsThe simplest encryption methods appear particularly simple using the two-dimensional visualization method described here. More complex encryption methods will appear much more random, but often succumb to higher-order analyses (such as correlation analyses of the images). Visualizing an encryption method is a powerful way quickly to evaluate its rigor. The commercial encryption visualized above (in the second image) proves to be particularly poor. It is virtually impossible to produce a break-proof encryption method to hide algorithms and program source files that are widely distributed. (For a nice case study see how the Netscape SSL encryption was broken.) However, there is much that a software vendor can do to increase the quality of its encryption methods and make it less tempting for the casual attacker. Standard techniques for encrypting program code, simple to implement and rapid in performance, include "uglification" (removing all comments and nonessential whitespace, changing all local symbols into random names, and so on) and pre-compilation (such as is done for Visual Basic programs). Neither is completely reversible, but both retain the information needed to execute the program code correctly. It is easy to XOR a string of pseudorandom numbers with the plaintext before or after encrypting it, thereby producing ciphertext that is highly random. None of these approaches would violate U.S. federal export restrictions on strong encryption. Therefore there is no excuse ever to use an algorithm like those studied here. How the Visualization Was PerformedAll the visualizations above are variants of the same basic procedure, which can be outlined thus: Create an output grid data structure, organized by rows. The starred (*) line is implemented by calling the black box. This is a procedure that is provided a string of ASCII characters, the plaintext, and returns another string, the ciphertext. Actually, for deeper study, this procedure was modified to produce two grids: one whose rows are the original plaintext and another whose rows are the ciphertext. Spatial Analyst's map calculator then readily subtracts one grid from the other to produce the grids shown above. Link here to view the Avenue code implementing the basic procedure. --William A. Huber, Ph.D., 2 July 1999. Minor editorial changes made 12 August 1999. |
ColorRamp, Memorized Calculations, Rotate, Sample, XSect, and Tissot are trademarks of Quantitative Decisions. All other products mentioned are registered trademarks or trademarks of their respective companies.
Questions or problems regarding this web site should be directed to information
(@quantdec.com).
|