Doc/Articles/Play144

From J Wiki
Jump to navigation Jump to search

At Play With J ... Table of Contents ... Previous Chapter ... Next Chapter

16. A Newer Random Link Generator

. By Eugene McDonnell. First published in Vector, 14, 4, (April 1998), 122-127.

The random link generator used in many J\APL systems is an example of the linear congruential kind introduced in 1948 by the mathematician D. H. Lehmer, of the University of California in Berkeley, California. It depends on two numbers, 16,807 and 2,147,483,647. The first is 7^5 , a primitive root of the second, which is Euler's prime, <:2^31 . The basis of the algorithms for the roll and deal primitives is the expression for computing the next link rl in the chain of random links:

    rl =: 2147483647 | 16807 * rl

The chain is 2,147,483,646 links long, and contains all of the integers between 1 and 2147483646, but not in any easily discernible order. The first link in the chain is 16807, and the next few links are

    282475249 1622650073 984943658 1144108930 470211272

The values seem to be suitably random. What's more, they satisfy many of the tests mathematicians and computer scientists have devised for judging randomness.

This column discusses a generator of a very different kind, that has its roots in a method devised in 1958 by G. J. Mitchell and D. P. Moore. Donald Knuth's book Seminumerical Algorithms had its first edition in 1969 (the year I bought my copy) and discusses random numbers extensively, but doesn't mention this method. It appears in the book's second edition (1981), but is treated warily. Knuth writes, "The only reason it is difficult to recommend [it] wholeheartedly is that there is still very little theory to prove that it does or does not have desirable randomness properties". By the time of the third edition (1997) it is obvious that a great deal of effort has been expended on it, and it is almost part of the canon. I first saw the technique described in Knuth's book, The Stanford Graph Base, Addison Wesley, Reading, Massachusetts, 1993. He describes it in the section called GB_FLIP, the name of the program used to generate random links for the other programs described in the book. The generator described here comes from this book.

The basis of the generator is a bit of mathematics the gist of which Knuth gives as follows:

The subtractive method. If m is any even number and if the numbers a,,0,,, a,,1,,, ... , a,,54,, are not all even, then the numbers generated by the recurrence

a,,n,, = (a,,n-24,, - a,,n-55,,) mod m        (1)

have a period of at least 2^55^ - 1, because the residues a,,n,, mod 2 have a period of this length. Furthermore, the numbers 24 and 55 in this recurrence are sufficiently large that deficiencies in randomness due to the simplicity of the recurrence are negligible in most applications.

If something has a period of n, it means that it can't be made up as an integral number of repetitions of a smaller sequence. Knuth says that 2^55^ - 1 is the smallest the period can be, but it is plausible that it is 2^85^ - 2^30^. Furthermore, the low-order bits of the generated numbers are just as random as the high-order bits. These are both very large numbers.

    <:2^55x
36028797018963967
    (2^85x)-2^30x
38685626227668132516855808

There is a usage problem in giving a name to them. In the UK the names of numbers go up in powers of a million. Thus billion, trillion, and quadrillion are a million to the second, third, and fourth power. In the US they go up in powers of a thousand, so million, billion, trillion, quadrillion are a thousand to the second, third, fourth, and fifth powers. The names are more appropriate in the UK usage, but the size of the denominating numbers are smaller and usually more convenient in the US usage. The first large number would be about 36029 billion in UK usage, and 36 quadrillion in US usage.

   36028,797018,963967     (UK)
   36,028,797,018,963,967     (US)

The larger number would be about 39 quadrillion in the UK and 39 septillion in the US. Whichever way you call it, it's big.

   38,685626,227668,132516,855808     (UK)
   38,685,626,227,668,132,516,855,808     (US)

Since there are only <:2^31 , or 2,147,483,646 positive integers, it is clear that each integer will appear many times in the chain, but the period of the chain is indeed enormously long.

Here's how the generator works: If a random link is needed, select the one indexed by gb_fptr , from a list A of 55 random numbers, and subtract one from gb_fptr . Thus getting a random link requires trivial computation, an indexing and a subtract, simply

   gb_fptr { A
   gb_fptr =: <: gb_fptr

There must be more to it than that, you say, and you are right. The tricky problems are how to get more random links when we've used all that are in the list, and, ultimately, where did the initial list come from? The processes involve a fair amount of crabwise sidling and snakewise slithering. I'll make it as clear as I can.

What does one do when all 55 links in the list have been selected? A program gb_flip_cycle is used to provide a new list of 55 links, guided by recurrence (1). Knuth's C program uses two successive for loops; the first subtracts in sequence the last 24 items of A from the first 24; the second subtracts the last 31 from the first 31 in sequence, depending on the explicit sequencing in the for loop specification to make sure that the overlap of 7 items in the middle of the two has the proper values each time it is traversed. We could do that, too, using the shiny new control words available in J, but prefer instead to emphasize the potential for parallelization by using the following scheme: Subtract the last 24 from the first 24, then the new first 24 from the second 24, and lastly the new 7 following the first 24 from the last 7.

All these subtractions are mod m, and this is easily computed at the machine level by anding the result of subtraction with 16b7fffffff . In the function below it is shown at the J language level by taking the residue mod 16b80000000 , yielding the same result.

The function gb_flip_cycle takes as argument a list of 55 random numbers, and yields a new such list. In terms of recursion (1) its first use replaces a,,0,, through a,,54,, by a,,55,, through a,,109,,. Each of a , b , and c is a two row matrix of indices to the list. For example, c is the matrix

   48 49 50 51 52 53 54
   24 25 26 27 28 29 30

The first row gives the indices of the minuend, which are also the indices of the result. The second row gives the indices of the subtrahend. For example, the amend c g } subtracts a,,79,, through a,,85,, from a,,48,, through a,,54,,, yielding a,,103,, through a,,109,,, and these replace a,,48,, through a,,54,,.

gb_flip_cycle=: monad define
a=. 0 31 +/ i. 24        NB. 1st 24 get (1st 24) - (last 24)
b=. 24 0 +/ i. 24        NB. 2nd 24 get (2nd 24) - (new 1st 24)
c=. 48 24 +/ i. 7        NB. last 7 get (last 7) - (7 after 1st 24)
NB. v0 is difference mod m of selected parts of argument
v0=. 16b80000000 | [: -/ {
NB. v1 is top row of 2-row left argument -- result address
v1=. [: {. [
NB. v2 is right argument: a 55 item list
v2=. ]
g=. v0`v1`v2              NB. gerund for amend
c g } b g } a g } y      NB. amend in three overlapping steps
)

If you look closely at the numbers, you'll notice that recurrence (1) isn't faithfully followed. Instead we have

a,,n,, = (a,,n-55,, - a,,n-24,,) mod m        (2)

Knuth claims that this doesn't make the results any less random.

You are probably asking, "Yes, but where did you get the original list of random links?" This is the part that happens only once, and that's good, because it is the most complicated and time-consuming part of the whole method.

In outline, a next random link is formed from three values: the current value next, the previous value prev, and the seed. The current value is subtracted from the previous value, and the modified seed is subtracted from this, mod m.

The random links are stored as they are developed in list A, but not sequentially. The first one is stored in the 21st position, index 20. The next number found will be stored 21 to the right, in position 41. And the next one 21 further to the right, in position 62. Oops. That addition is made mod 55, so it will actually be stored in position 7. And so on. There's a trifle of Fibonacci magic here: there are 34 positions to the right of the first one, and 55 in all, and the numbers 21, 34, and 55 are consecutive Fibonacci numbers. Also, 21 is relatively prime to 55, which means that the index will take on all values from 0 to 54, inclusive, once each. Knuth comments that the successive values are thus stored as far as possible from each other, and that the initializing would be rather poor if this dispersal were not done.

The value of seed is not constant; it can be any one of 31 possible values. For each of the initial list of random links, a new value of seed is obtained by rotating its 31 rightmost bits one position to the right.

   bwr=: [: #. _1 |. (31$2) #: ]  NB. rotate bits 1-31 to the right

   seed=. bwr seed

and then performing

   temp=. next
   next=. m | prev - next + seed
   prev=. temp

The seed used by Knuth is _314159 mod m, or 2147169489. This is also the initial value of prev. The initial value of next is 1, chosen because the method requires that there be at least one odd number in the first set of links.

gb_init_randx=: monad define   NB. initial list of random numbers
A=. 55 # 0
m=. 2^31
seed=. m | y
next=. 1
prev=. seed
k=. 20
whilst. k ~: 20 do.
  A=. next k } A
  seed=. bwr seed
  temp=. next
  next=. m | prev - next + seed
  prev=. temp
  k=. 55 | k + 21
end.
A
)

To verify that A has been properly formed, you can look at the first few links formed. These should be:

   A=: gb_init_randx _314159
   20 41 7 28 { A
1 2147326568 1073977445 536517481

Knuth explains that the first set isn't random enough to be used. Once the sequence gets far enough from its beginning, however, the initial transients become less perceptible. Thus after the initial A is formed, in order to ensure the necessary degree of randomness, a new list is formed from it by using gb_flip_cycle five times. When this is done, the list is ready to be used. The complete initializing procedure is thus

gb_init_rand=: monad define
gb_fptr=: _2
gb_flip_cycle ^:5 gb_init_randx y
)

The initial value of A is obtained by

A=: gb_init_rand _314159

A link in the random link chain is obtained using the gb_next_rand function.

gb_next_rand=: monad define
if. gb_fptr > _56 do.
  z=. gb_fptr { A
  gb_fptr=: <: gb_fptr
else.
  A=: gb_flip_cycle A
  z=. _1 { A
  gb_fptr=: _2
end.
z
)

The links are accessed using the index value gb_fptr , and in reverse order. This doesn't affect the randomness of the links. Thus the first link taken would be the one at index position 54, which may be accessed (using contracurrent indexing) with gb_fptr set to _1, and then subtracting 1 from gb_fptr . The links are taken from the current list of links in the if clause until gb_fptr reaches _56, at which time the else clause is used to form a new list of links. Item _1 of this new list is returned, and gb_fptr is set to _2.

To verify that initializing has been done correctly, check the result of gb_next_rand :

   x: gb_next_rand ''
119318998

Unlike the linear congruential generators, subtractive generators are much less choosy about the seed value used. I believe that any value with a generous balance of 1s and 0s will do. Thus this one method gives rise to an enormous number of random link chains, each enormously long, obtainable simply by varying the seed.

Here is a script with the code in this article.

gb_flip_cycle=: monad define
a=. 0 31 +/ i. 24        NB. 1st 24 get (1st 24) - (last 24)
b=. 24 0 +/ i. 24        NB. 2nd 24 get (2nd 24) - (new 1st 24)
c=. 48 24 +/ i. 7        NB. last 7 get (last 7) - (7 after 1st 24)
NB. v0 is difference mod m of selected parts of argument
v0=. 16b80000000 | [: -/ {
NB. v1 is top row of 2-row left argument -- result address
v1=. [: {. [
NB. v2 is right argument: a 55 item list
v2=. ]
g=. v0`v1`v2              NB. gerund for amend
c g } b g } a g } y      NB. amend in three overlapping steps
)

bwr=: [: #. _1 |. (31$2) #: ]  NB. rotate bits 1-31 to the right

gb_init_randx=: monad define   NB. initial list of random numbers
A=. 55 # 0
m=. 2^31
seed=. m | y
next=. 1
prev=. seed
k=. 20
whilst. k ~: 20 do.
  A=. next k } A
  seed=. bwr seed
  temp=. next
  next=. m | prev - next + seed
  prev=. temp
  k=. 55 | k + 21
end.
A
)

gb_init_rand=: monad define
gb_fptr=: _2
gb_flip_cycle ^:5 gb_init_randx y
)

A=: gb_init_rand _314159

gb_next_rand=: monad define
if. gb_fptr > _56 do.
  z=. gb_fptr { A
  gb_fptr=: <: gb_fptr
else.
  A=: gb_flip_cycle A
  z=. _1 { A
  gb_fptr=: _2
end.
z
)

Vector Vol.14 No.4

Editor's note. Knuth announces in Seminumerical Algorithms (1997) a proof by R. P. Brent that the period of the subtractive method is exactly 2^85^ - 2^30^ for m = 2^31^. See exercises 23 and 30 in Section 3.2.2.