ArcView Random Numbers

Home
Up

 

 

Summary and Conclusions

The ArcView Number.MakeRandom request does not work correctly.  It has three major flaws:

  1. It does not produce uniformly distributed pseudo-random numbers (except in special situations).
  2. It fails for values less than -2^31 or greater than 2^31-1 but provides no indication of error.
  3. Depending on how it is used, it can fail many tests of randomness, even after compensating for flaws #1 and #2.

However, these limitations can be overcome.  Avenue scripts are provided below to experiment with the MakeRandom request, to replace the MakeRandom request, and to mimic Excel's RAND() function.   ESRI (the manufacturer of ArcView) would be wise to fix these problems in future releases of ArcView and to verify that any other random number generation in its software is not subject to the same problems.

Note added 9 July 1999: Excel 95's RAND() function is much worse than Number.MakeRandom (after compensating for the problems noted in #1 and #2 above).

25 October 2001: Dilbert has an apt comment about  random number generators.

Introduction

The ArcView GIS software produces random numbers in many ways:

  • Colors in new unique-value and chart legends are selected randomly.
  • Dot legends place dots randomly within polygonal features.
  • Color ramps for new grids (using the Spatial Analyst extension) are selected randomly.
  • The Grid.MakeRandom and related requests produce grids of random values in the Spatial Analyst extension.
  • The Number.MakeRandom request produces "random numbers".

The last technique is useful for testing, simulation, and other analytical pursuits.   Therefore it should produce reasonably random-looking numbers.  It does not, but it can be modified to do a pretty good job.

Background

The syntax for the Number.MakeRandom request is

Number.MakeRandom(min, max)

All that the ArcView help says about this request is "Returns a random number between min and max inclusive. UNIX systems base MakeRandom on the system service rand() which is not a great random number generator (see man pages for details)."  This seems to imply that MakeRandom is a "great" random number generator on Windows systems.  There is evidence, presented below, that it may be based on a reasonably good random number generator, but its implementation (as exposed through the MakeRandom request) is awful.

Procedure

A little experimentation on a Windows machine running ArcView3.1 reveals the true behavior of MakeRandom.

1.    MakeRandom returns only integer values.

Compile and repeatedly run this little script.

MsgBox.Info(Number.MakeRandom(0,1).SetFormatPrecision(6).AsString, "")

It displays a "random number" between 0 and 1 each time, to six decimal places.  You will see that only "0.00000" and "1.00000" are ever returned.  Play with the constants 0 and 1 in this script: even when they are replaced by non-integral values (such as 0 and 3.1416), MakeRandom returns only integers.

2.    MakeRandom does not return uniformly distributed values.

Since only integer values are returned, one would expect MakeRandom to return them with equal frequencies (in the long run).  The next script generates a lot of random values between fixed endpoints and displays the frequencies with which they were produced.

NMax = 10 ' Upper limit of range of random values
NMin = 0 ' Lower limit of range of random values
NIter = 10000 ' Number of iterations

K = (NMax + 1 - NMin).Ceiling ' Number of random values possible
dctF = Dictionary.Make(K Min NIter)

for each i in 1..NIter
    a = Number.MakeRandom(NMin,NMax)
    count = dctF.Get(a)
    if (count = NIL) then count = 0 end
    dctF.Set(a, count+1)
end

s = ""
lstKeys = dctF.ReturnKeys
lstKeys.Sort(true)
for each i in lstKeys
    a = dctF.Get(i) ' Count
    f = a/NIter ' Frequency
    sd = (f*(1-f)/Niter).Sqrt ' Approximate std. dev.

    s = s + i.AsString + ": " +
    a.SetFormatPrecision(0).AsString +
        " (" +
        (100*f).SetFormatPrecision(1).AsString +
        " +-" + (200*sd).SetFormatPrecision(1).AsString +
        "%)" + NL
end

MsgBox.Report(s, "Frequencies for" ++ NIter.AsString ++ "Iterations")
' end of script

You will see that there are serious problems when NMin or Nmax are negative or exceed 2^31 -1.  You will also discover that the frequencies with which MakeRandom(NMin, NMax) returns NMin, 0 (if NMin <=0 and NMax >= 0), and NMax are strange.

After many runs with different values of NMin and NMax, including non-integral values and negative values, you should be able to verify that MakeRandom behaves as if its implementation were as follows:

  1. Truncate the decimal parts of NMin and Nmax so that they become integers.
  2. Reduce NMin and NMax modulo 2^32 so that they are in the range -2^31 .. (2^31-1).  If in so doing NMax < NMin, then reverse these values to assure that NMax >= NMin.
  3. Generate a 32-bit pseudorandom value.  Using floating point arithmetic, scale this value so it lies in the range from NMin + 0.5 to NMax + 0.5 inclusive.
  4. Truncate the decimal part of the result so it becomes an integer.   Return this value.

The two truncations, the reductions modulo 2^32, and the additions of 0.5 (in step 3) all contribute to the anomalous behavior.

The following table illustrates the long-run frequencies of MakeRandom using various starting values and compares them to the expected uniform frequencies.

Value Frequency Expected Uniform Frequency Difference

Min=0, Max=2

     
0 25.00% 33.33% -8.33%
1 50.00% 33.33% 16.67%
2 25.00% 33.33% -8.33%

Min=-3, Max=0

     
-3 0.00% 25% -25.00%
-2 16.67% 25% -8.33%
-1 33.33% 25% 8.33%
0 50.00% 25% 25.00%

Min=-1, Max=1

     
-1 0.00% 33.33% -33.33%
0 25.00% 33.33% -8.33%
1 75.00% 33.33% 41.67%

These non-uniform frequencies have to be considered extremely serious flaws in Number.MakeRandom.  As described below, however, they can be overcome by appropriate programming, so they are not critical flaws.

3.    MakeRandom does not produce independent values.

We applied G. Marsaglia's Diehard battery of tests to the Number.MakeRandom output.  This is a collection of tests of random number generators.  As of 1996 there exist many simple, fast pseudo-random number generators that pass all these tests.  Passing the tests is not assurance of true "randomness" but failing one or more provides evidence of non-randomness.   These tests were motivated by simulation needs.  Thus, simulation results produced using a random number generator (RNG) that fails Diehard should be viewed with caution.

There is a complication.  Diehard requires about 80 million random bits.   A 32-bit RNG can be iterated 2.5 million times to produce this many random bits.   From the study in the previous selection we suspect the underlying procedure for Number.MakeRandom is a 32-bit generator, but it is not exactly clear how a 32 bit result is converted into a random value.  There are many, many ways to produce the required 80 million bits (10 million bytes) of random values using Number.MakeRandom.   We tried three natural ones, after compensating for the  non-uniform frequency problem.

  • Method 1: Number.MakeRandom(0, 2^31 -1) was called 2.5 million times in succession.   Each time the result was doubled and then output in hexadecimal to a file, as required by Diehard.
  • Method 2: Number.MakeRandom(0, 2^16 - 1) was called 5 million times.  Each time the result was output in hexadecimal to a file.
  • Method 3: Number.MakeRandom(0,2^8) was called 10 million times.  Any result of 2^8 (=256) was converted to 0.  All results were output in hexadecimal to a file.

(The frequency anomalies noted above are inconsequential when using Methods 1 or 2.  Method 3 compensates for the frequency anomalies; if it did not, it would fail most of the tests.)

Here is the Avenue script for Method 3.

strOut = "Random3.txt"
fnOut = strOut.AsFilename

lfOut = LineFile.Make(fnOut, #FILE_PERM_WRITE)
if (lfOut = NIL) then
    MsgBox.Error("Unable to open file for writing", "")
    exit
end

chars = "0123456789ABCDEF"
char2 = ""
for each i in 0..15
    for each j in 0..15
        char2 = char2 + chars.middle(i,1) + chars.middle(j,1)
    end
end
char2 = char2 + "00" ' For the case y = 256

N = 2^8-1 ' AV can't handle anything above 2^31 - 1.
M = 2^18*1.1' Will need to be 2^18, approximately

av.ShowStopButton
av.ShowMsg("Computing...")
for each i in 1..M
    doMore = av.SetStatus(100*(i-0.5)/M)
    if (doMore.Not) then break end

    s = ""
    for each ii in 1..40
        y = Number.MakeRandom(0, N+1) ' Cases 0 and N+1 have half the expected frequency.
        s = char2.Middle(2*y, 2) + s ' Case N+1 converts to "00".
    end

    lfOut.WriteElt(s)
end

lfOut.Close

av.SetStatus(100)
av.ShowMsg("")
' end of script

(In this script the conversion to hexadecimal is computed explicitly rather than using ArcView's Number.AsHexString request, which does not produce the required kind of output.)   The script will take about a half hour to execute on a 400 MHz Pentium II class machine.

The output of this script is a file, random3.txt, which is post-processed by Marsaglia's asc2bin.exe program to produce input for Diehard.   Diehard consists of 15 tests.  All 15 were run on all three output files.   For reference, they were also run using a file of 2,560,000+ values of RAND() produced by Excel 95.  Here is a summary of the results:

Test Method 1 Method 2 Method 3 Excel 95's RAND()
Birthday spacings FAIL FAIL Pass Pass
OPERM5 FAIL Pass Pass Pass
Binary rank, 31 X 31 FAIL FAIL Pass Pass
Binary rank, 32 X 32 * FAIL Pass Pass
Binary rank, 6 X 8 Pass Pass Pass FAIL
Bitstream * FAIL Pass Pass
OPSO, OQSO, and DNA FAIL FAIL FAIL FAIL
Count the ones * Pass Pass FAIL
Parking lot FAIL Pass Pass FAIL
Minimum distance Pass Pass Pass Pass
3-D spheres Pass Pass Pass Pass
Squeeze Pass Pass Pass FAIL
Overlapping sums Pass Pass Pass Pass
Runs Pass Pass Pass Pass
Craps Pass Pass Pass FAIL

* Method 1 is guaranteed to fail some of the tests simply because it produced 31-bit values rather than 32 bit values.  Such a failure is not an adequate test of the generator in these cases.

The first two methods failed miserably, but the nature of their failures suggested the third method.  The third method performed pretty well.  The OPSO, OQSO, and DNA tests are stringent tests of correlation among sequences of pseudorandom values.   They have the power to explore correlations at individual bit positions.  The DNA test indicates that bits 6 and 7 of each byte generated by Method 3 are the causes of the failure.

4.    A note on Excel's RAND() function (9 July 1999)

The failure of Excel's RAND() function is particularly dramatic.  To do the tests, a spreadsheet was filled in all 256 columns through row 5024 with the formula INT(2^32 * RAND()).  This was calculated and saved as tab-delimited ASCII.  It was recalculated and saved again as tab-delimited ASCII to a separate file.  These files were concatenated to produce a file with 256 * 5024 * 2 random values in the range 0..2^32-1.  The following AWK program converted this output into the packed hexadecimal format needed for Diehard to process:

#
#    Input: lines of 256 random 32-bit integers in base 10
#    Output: lines of 80 hex characters.
#    NB: Might omit up to the last 9 values.
#
BEGIN {
    T31 = 2^31 # Saves a little computation time
}
{
    for (i=1; i<=NF; i++) {
        s = s sprintf("%-8.8x", $i-T31) # Flipping the high bit prevents overflow during conversion to hex.
        if (++j >= 10) {
            print s
            s = ""
            j = 0
        }
    }
}

The nature of the test failure is particularly awful: the frequencies of "1"s in bits 0-2 and 29-31 of the output were significantly different than the frequencies of "0"s, so the output is non-uniform!  One possible fix is to use the fractional part of 8*RAND() instead of RAND() itself, which effectively rejects bits 0-2 and starts the values with bit 3.  I have not tested this, in part because the algorithm used for RAND() (which Microsoft has published) is poor, so there's really not much hope.

Discussion

1.    How to produce uniformly distributed values.

The flaws in Number.MakeRandom are readily overcome without too much compromise in performance (which is already poor, since Avenue is an interpreted language).  Here are some rules to follow:

  1. Use MakeRandom only to generate pseudorandom integers.  If you want floating point values, generate integers within a wide range and then rescale.
  2. Do not attempt to generate random values in a range wider than 2^31 -1.  Use multiple MakeRandom requests if wider ranges are needed.  For example, to generate random double-precision floating point values (about 50 bits of precision), consider generating two independent values in the range 0..(2^25 - 1).  Call these x and y.   Then (x/(2^25) + y)/(2^25) should produce a uniform value between 0 and 1 (not including 1).
  3. Do not use MakeRandom directly to generate negative random values.  Instead, subtract a uniform result from a random positive value.
  4. Compensate for the low frequencies at the endpoint of MakeRandom's range.

The following Avenue script generates uniform pseudo-random integers in the range provided by its two arguments.

'
' Name this script Number.MakeRandom.
'
' Example of use:
' av.Run("Number.MakeRandom", {0, 10})
'
NMin = (SELF.Get(0) min SELF.Get(1)).Truncate
NMax = (SELF.Get(0) max SELF.Get(1)).Truncate

N = NMax - NMin + 1

if (2^32 <= N) then return NIL end

x = Number.MakeRandom(0, N)
if (x = N) then x = 0 end

return x + NMin
' end of script

2.    How to produce values with reasonably good randomness using ArcView.

Generate random bytes (values in the range 0..255) using the procedure just given above.  This is essentially Method 3  (the byte-by-byte method), which passed most of the Diehard tests.  Create larger ranges of values by combining bytes.  For example, to produce uniform values in the range 0..(2^16-1) use av.Run("Number.MakeRandom", {0,255}) * 256 + av.Run("Number.MakeRandom", {0,255}).  To produce values in ranges that are not powers of two, you will have to rescale and round the results.  A good way to start is with a script that returns uniform random floating point values between 0 and 1 (not including 1).  Here is one based on Method 3 that should give good results.  Since it behaves like Excel's RAND() function, name it "Rand":

k = 0

for each i in 0..3
    a = Number.MakeRandom(0, 256)
    if (a = 256) then a = 0 end
    k = k/256 + a
end

return k/256
' end of script

Since the values are based on 32 random bits, you get about 10 decimal places of precision.  For more precision yet (but slower execution), replace the range 0..3 in the script by 0..4, 0..5, or 0..6 (wider ranges will do no good). 

This script is easily used to produce uniform integers within any desired range (within the precision available in ArcView).  For example, (av.Run("Rand", NIL)*1001).Floor is an Avenue expression that will return uniformly distributed integers between 0 and 1000 inclusive (each will be produced with a long-run frequency of 1/1001).

Reference

Knuth, Donald E.  The Art of Computer Programming.  Volume 2, Seminumerical Algorithms, Third Edition.   Addison-Wesley, 1998.  Chapter 3.

(William A. Huber, 5 July 1999.  Updated 9 July 1999.)


Google
ColorRamp, Memorized Calculations, Rotate, Sample, XSect, and Tissot  are  trademarks of Quantitative Decisions.  All other products mentioned are registered trademarks or trademarks of their respective companies.
Questions or problems regarding this web site should be directed to information (@quantdec.com).
Copyright © 2000-2005 Quantitative Decisions.  All rights reserved.
Last modified: Monday November 10, 2003.