|
Summary and ConclusionsThe ArcView Number.MakeRandom request does not work correctly. It has three major flaws:
However, these limitations can be overcome. Avenue scripts are provided below to experiment with the MakeRandom request, to replace the MakeRandom request, and to mimic Excel's RAND() function. ESRI (the manufacturer of ArcView) would be wise to fix these problems in future releases of ArcView and to verify that any other random number generation in its software is not subject to the same problems. Note added 9 July 1999: Excel 95's RAND() function is much worse than Number.MakeRandom (after compensating for the problems noted in #1 and #2 above). 25 October 2001: Dilbert has an apt comment about random number generators. IntroductionThe ArcView GIS software produces random numbers in many ways:
The last technique is useful for testing, simulation, and other analytical pursuits. Therefore it should produce reasonably random-looking numbers. It does not, but it can be modified to do a pretty good job. BackgroundThe syntax for the Number.MakeRandom request is
All that the ArcView help says about this request is "Returns a random number between min and max inclusive. UNIX systems base MakeRandom on the system service rand() which is not a great random number generator (see man pages for details)." This seems to imply that MakeRandom is a "great" random number generator on Windows systems. There is evidence, presented below, that it may be based on a reasonably good random number generator, but its implementation (as exposed through the MakeRandom request) is awful. ProcedureA little experimentation on a Windows machine running ArcView3.1 reveals the true behavior of MakeRandom. 1. MakeRandom returns only integer values.Compile and repeatedly run this little script.
It displays a "random number" between 0 and 1 each time, to six decimal places. You will see that only "0.00000" and "1.00000" are ever returned. Play with the constants 0 and 1 in this script: even when they are replaced by non-integral values (such as 0 and 3.1416), MakeRandom returns only integers. 2. MakeRandom does not return uniformly distributed values.Since only integer values are returned, one would expect MakeRandom to return them with equal frequencies (in the long run). The next script generates a lot of random values between fixed endpoints and displays the frequencies with which they were produced.
You will see that there are serious problems when NMin or Nmax are negative or exceed 2^31 -1. You will also discover that the frequencies with which MakeRandom(NMin, NMax) returns NMin, 0 (if NMin <=0 and NMax >= 0), and NMax are strange. After many runs with different values of NMin and NMax, including non-integral values and negative values, you should be able to verify that MakeRandom behaves as if its implementation were as follows:
The two truncations, the reductions modulo 2^32, and the additions of 0.5 (in step 3) all contribute to the anomalous behavior. The following table illustrates the long-run frequencies of MakeRandom using various starting values and compares them to the expected uniform frequencies.
These non-uniform frequencies have to be considered extremely serious flaws in Number.MakeRandom. As described below, however, they can be overcome by appropriate programming, so they are not critical flaws. 3. MakeRandom does not produce independent values.We applied G. Marsaglia's Diehard battery of tests to the Number.MakeRandom output. This is a collection of tests of random number generators. As of 1996 there exist many simple, fast pseudo-random number generators that pass all these tests. Passing the tests is not assurance of true "randomness" but failing one or more provides evidence of non-randomness. These tests were motivated by simulation needs. Thus, simulation results produced using a random number generator (RNG) that fails Diehard should be viewed with caution. There is a complication. Diehard requires about 80 million random bits. A 32-bit RNG can be iterated 2.5 million times to produce this many random bits. From the study in the previous selection we suspect the underlying procedure for Number.MakeRandom is a 32-bit generator, but it is not exactly clear how a 32 bit result is converted into a random value. There are many, many ways to produce the required 80 million bits (10 million bytes) of random values using Number.MakeRandom. We tried three natural ones, after compensating for the non-uniform frequency problem.
(The frequency anomalies noted above are inconsequential when using Methods 1 or 2. Method 3 compensates for the frequency anomalies; if it did not, it would fail most of the tests.) Here is the Avenue script for Method 3.
(In this script the conversion to hexadecimal is computed explicitly rather than using ArcView's Number.AsHexString request, which does not produce the required kind of output.) The script will take about a half hour to execute on a 400 MHz Pentium II class machine. The output of this script is a file, random3.txt, which is post-processed by Marsaglia's asc2bin.exe program to produce input for Diehard. Diehard consists of 15 tests. All 15 were run on all three output files. For reference, they were also run using a file of 2,560,000+ values of RAND() produced by Excel 95. Here is a summary of the results:
* Method 1 is guaranteed to fail some of the tests simply because it produced 31-bit values rather than 32 bit values. Such a failure is not an adequate test of the generator in these cases. The first two methods failed miserably, but the nature of their failures suggested the third method. The third method performed pretty well. The OPSO, OQSO, and DNA tests are stringent tests of correlation among sequences of pseudorandom values. They have the power to explore correlations at individual bit positions. The DNA test indicates that bits 6 and 7 of each byte generated by Method 3 are the causes of the failure. 4. A note on Excel's RAND() function (9 July 1999)The failure of Excel's RAND() function is particularly dramatic. To do the tests, a spreadsheet was filled in all 256 columns through row 5024 with the formula INT(2^32 * RAND()). This was calculated and saved as tab-delimited ASCII. It was recalculated and saved again as tab-delimited ASCII to a separate file. These files were concatenated to produce a file with 256 * 5024 * 2 random values in the range 0..2^32-1. The following AWK program converted this output into the packed hexadecimal format needed for Diehard to process: # The nature of the test failure is particularly awful: the frequencies of "1"s in bits 0-2 and 29-31 of the output were significantly different than the frequencies of "0"s, so the output is non-uniform! One possible fix is to use the fractional part of 8*RAND() instead of RAND() itself, which effectively rejects bits 0-2 and starts the values with bit 3. I have not tested this, in part because the algorithm used for RAND() (which Microsoft has published) is poor, so there's really not much hope. Discussion1. How to produce uniformly distributed values.The flaws in Number.MakeRandom are readily overcome without too much compromise in performance (which is already poor, since Avenue is an interpreted language). Here are some rules to follow:
The following Avenue script generates uniform pseudo-random integers in the range provided by its two arguments.
2. How to produce values with reasonably good randomness using ArcView.Generate random bytes (values in the range 0..255) using the procedure just given above. This is essentially Method 3 (the byte-by-byte method), which passed most of the Diehard tests. Create larger ranges of values by combining bytes. For example, to produce uniform values in the range 0..(2^16-1) use av.Run("Number.MakeRandom", {0,255}) * 256 + av.Run("Number.MakeRandom", {0,255}). To produce values in ranges that are not powers of two, you will have to rescale and round the results. A good way to start is with a script that returns uniform random floating point values between 0 and 1 (not including 1). Here is one based on Method 3 that should give good results. Since it behaves like Excel's RAND() function, name it "Rand":
Since the values are based on 32 random bits, you get about 10 decimal places of precision. For more precision yet (but slower execution), replace the range 0..3 in the script by 0..4, 0..5, or 0..6 (wider ranges will do no good). This script is easily used to produce uniform integers within any desired range (within the precision available in ArcView). For example, (av.Run("Rand", NIL)*1001).Floor is an Avenue expression that will return uniformly distributed integers between 0 and 1000 inclusive (each will be produced with a long-run frequency of 1/1001). ReferenceKnuth, Donald E. The Art of Computer Programming. Volume 2, Seminumerical Algorithms, Third Edition. Addison-Wesley, 1998. Chapter 3. (William A. Huber, 5 July 1999. Updated 9 July 1999.) |
ColorRamp, Memorized Calculations, Rotate, Sample, XSect, and Tissot are trademarks of Quantitative Decisions. All other products mentioned are registered trademarks or trademarks of their respective companies.
Questions or problems regarding this web site should be directed to information
(@quantdec.com).
|