# Doc/Articles/Play104

**At Play With J**
... Table of Contents ... Previous Chapter ... Next Chapter

# 3. The 10,000,000,000th Prime Number

. *By* Eugene McDonnell. *First published in* Vector, 10, 4, *(April 1994), 110-113*.

What is the 10,000,000,000th prime number?

This column tells a story, and it has a moral. It does not concern itself directly with J, the ostensible reason for these columns, but it can be justified because one of the direct antecedents of J is the language A, developed by Arthur Whitney while he worked at the investment banking firm of Morgan Stanley. It was one page of C code for an A-like interpreter, written one afternoon by Whitney at Ken Iverson's Kiln Farm in Ontario that gave Roger Hui the direction he needed to start work on what was to become J. Roger exhibits this code in Appendix A of "An Implementation of J", published by Iverson Software Inc. Whitney no longer works at Morgan Stanley: he has set out as a free lance and is developing a language called K, which has some affinity with A and J. Hui now works at Morgan Stanley, and it is his adventure hunting down a large prime that the story is about, and Hui used A as the weapon with which he targeted the large prime.

The story begins when Hui's boss at Morgan Stanley challenged him by saying, "You think you're smart, but you don't even know what the 10-to-the-10-th prime is." Hui's immediate response was, "Do you start counting from 0 or 1?" The boss was so taken aback that for a minute or so he didn't understand Roger's question.

The boss's challenge seems to have been meant as an example of a theoretically attainable but practically impossible computational task. This article tells how Roger went about achieving the impossible. I look on it as a triumph of the client-server technology. This column is not so much an article about programming as it is about computer logistics. The programming aspects, while important, are secondary to the story of how Roger went about organizing the solution.

The germ of Hui's solution was to envision a Boolean vector` p `of length` k `such that the` i`th element of` p `is 1 if` i `is prime, and 0 otherwise. Just sum-scan this very long vector and look for the index of` 1e10 `in it. How long should such a vector be? The Prime Number Theorem says that the number of primes less than` k `is roughly` k%(log k)` .` `Solving for` k `in the equation` 1e10=k%(log k) `gives a value for` k `about` 2.63e11` .` `Roger, out of prudence, used the value` 2.7e11` .` `A vector of` 2.7e11 `elements is unrealizable in the present state of computer memories, especially since A doesn't have a Boolean type: Boolean vectors require the same space as integer vectors. A vector of` 4*2.7e11` ,` `or` 1e12 `bytes long is simply not in the cards. Even a Boolean vector taking just 1 bit per element would have to be more than` 3e10 `bytes long, so it was clear that the problem had to be partitioned.

Hui's central program computes the primes between` m `and` n` ,` `using the sieve method, eliminating multiples of 2, 3, 5, 7, 11, 13, 17, etc. This can be done independently in parallel on many small intervals that make up the larger interval of interest, and, if portioned out to computers that can communicate with a common central file, will permit the problem to be solved in shorter time than if only one computer was to tackle it.

The method of partitioning was suggested by the presence of more than 150 workstations on the floor in Hui's part of Morgan Stanley. They are all interconnected Unix machines, and any machine can be used from any other machine. With relatively small effort a multi-processor solution could be set up, using these machines in parallel. The machines are not heavily used on the weekends, and it was on a weekend that the experiment took place.

The strategy was to let the machines tackle the problem a billion integers at a time. Hui's colleague at Morgan Stanley, Seth Breitbart, suggested creating 271 files, named 0, 1, 2, ..., 270, each denoting the named interval of` 1e9 `consecutive numbers, and each one empty.

What a machine would do once it was set going was to look at the list of files, pick one at random (named` m`), erase it, work on the interval` (1e9*m)+i.1e9` ,` `a million numbers at a time, and after finishing, write a file containing a record giving the number of primes in each of the million-number intervals within that` 1e9 `(there are a thousand of them). After finishing, it repeats that process, stopping only when the list of files/intervals are empty. The machines were set up to process a million numbers at a time since the smallest machine available had enough memory to handle that many numbers at once. Roger notes that there's no great harm if two machines accidentally happen to pick the same file/interval. In the flexible Unix universe additional machines could be brought on stream at any time.

If one is a Unix system superuser it is possible to take all sorts of liberties with these machines, like finding the ids of all other machines, but Hui prefers (wisely, I think) not to be tainted by such capabilities, so to get the machine's names he went about the floor reading the names of the machines from strips of paper affixed to each one, then sat down at his machine and made inquiries about the state of each machine on his list. If it was idle, he set it going on the problem.

As he was doing this, there was a nice dilemma to resolve. Should his time be spent improving the algorithm before launching more machines, or should he spend time looking for additional machines? He favored the latter approach for the novelty of it, and ended up using about 15 IBM RS/6000s and 60 Sun Sparc 2s and Sparc 10s.

After 20 hours, he had 271 files, each with 1,000 records. From these he made a 271,000 element vector of the number of primes in the intervals` 1e6*i.271000` .` `By sum-scanning this he knew the interval containing the` 10^10`-th prime. His function` psieve `returns a Boolean list selecting the primes between m and n. Applying this to the magic interval gets the actual` 10^10`-th prime.

Some details about his program "sieve": If a number q is not divisible by any number less than or equal to` sqrt(q)` ,` `then` q `is prime. Therefore, to test a number less than` 2.7e11 `for primality one need only use trial divisors less than` sqrt(2.7e11) `or roughly` 6e5` .` `In practice, Hui precomputed a list of all the 78,498 primes less than 1e6 recursively, bootstrapping up from 2 3 5 7. (This only takes a few seconds.) It was then a routine matter to determine for any number less than 1e12 whether or not it was prime: just see whether its residues with respect to each of the primes less than a million was nonzero; if so, it was a prime.

For the curious, here is a condensed list of the first 10,000,000,002 primes, with their 0-origin ordinal numbers.

0 2 1 3 2 5 3 7 4 11 ... 1e10-1 252,097,800,623 1e10 252,097,800,629 1e10+1 252,097,800,637

After doing this, Hui found that there is a table in William Judson LeVeque's "Fundamentals of Number Theory", section 1.1, giving the number of primes less than` 10^3+i.8` .` `Hui's table agrees with LeVeque's for` 10^3+i.7` .` `For` 10^10` ,` `however, LeVeque says 455,052,512 and Hui says 455,052,511. It turns out that LeVeque is wrong, Hui having checked his results with some help from Lee Dickey at Waterloo University. Dickey tells Hui that his colleagues speculate that LeVeque may have gotten his numbers from lists that D.N. Lehmer compiled, which included 1 as a prime, and LeVeque may have slipped in not subtracting 1 from that particular count. (1 isn't a prime since it doesn't satisfy the definition of a prime: a positive integer n with exactly two distinct factors, 1 and` n` .)

Now for the moral of the story: Hui tells me he has also since found some work that would have made it much easier to discover the nth prime, for any n. E.D.F. Meissel, a German astronomer, found in the 1870s a method for computing individual values of` pi(x)` ,` `the counting function for the number of primes` <:x` .` `His method was based on recurrences for partial sieving functions, and he used it to compute` pi(1e9)` ,` `where` pi `is a function that computes the number of primes less than or equal to its argument. D.H. Lehmer simplified and extended Meissel's method. Recently, further refinements of the Meissel-Lehmer method which incorporate some new sieving techniques have been reported by Lagaria, Miller, and Odlyzko, in "Computing pi(x): Meissel- Lehmer Method", Mathematics of Computation, Volume 44, Number 170, 1985 4, pages 537 to 560. In this article the authors give an asymptotic running time analysis of the resulting algorithm, showing that for every` e>0 `it computes` pi(x) `using at most` O(x^(2r3)+e) `arithmetic operations and using at most` O(x^(1r3)+e) `storage locations on a computer using words of length` 1+<.2^.x `bits. The algorithm can be further speeded up using parallel processors. They show that there is an algorithm which, when given` M `parallel processors, computes` pi(x) `in time at most` O((%M)*x^(2r3)+e) `using at most` O(x^(1r3)+e) `storage locations on each parallel processor, provided` M <: x^1r3` .` `A variant of the algorithm was implemented and used to compute` pi(4e16)` .

They report that` pi(4e16) `took them 1730 minutes on an IBM 3081K;` pi(2e12) `took 3 minutes;` pi(3e12) `took 4 minutes. Had he known about this method, Hui could have used a binary search technique to find the` 1e10`th prime, using an` O(2 log n) `technique, after narrowing the interval to be searched in by a reasonably generous use of the Prime Number Theorem. The value Hui computed was` pi(2.52e11) `and it took him 20 hours on 60-70 workstations. Brains win again over brawn: a well-designed, mathematically knowledgeable algorithm beats brute force!