Essays/235Problem

From J Wiki
Jump to navigation Jump to search

Here's a statement of the "2-3-5 Problem", a brief outline of its solution, and how to check the solution, in J.

The original problem statement from a blog entry titled "(Re)introducing the 2-3-5 problem" by Andrew Koenig on the Dr. Dobb's Codetalk Forum (at http://dobbscodetalk.com/index.php?option=com_myblog&show=-Re-introducing-the-2-3-5-problem.html&Itemid=29):

In 1971, Edgar Dijkstra published a book named A Discipline of Programming, with the goal of presenting elegant solutions to several programming problems.  One of those problems, which he attributed to Richard Hamming, has stuck with me ever since.

Recently I had the occasion to give a talk to a collection of college students and faculty members, so I pulled this problem out of mothballs to revisit it.  It turns out to be even more interesting than I had remembered, so I'd like to talk about it here.

The problem statement is surprisingly simple: Write a program that produces, in ascending order, the sequence of positive integers that have only 2, 3, and 5 as their prime factors.  The first such integer is 1 (which has no prime factors at all; hence all of its prime factors are either 2, 3, or 5); the sequence continues with 2, 3, 4 (which has only 2 as a prime factor), 5, and 6.  It skips 7, which is prime, continues with 8, 9, and 10, skips 11, includes 12, and so on.

As I said, the problem is simple; what's interesting is to write a good solution.  For that matter, it's even interesting to discuss what good means in this context.  Rather than prejudice you, I'll let you think about the problem for a while and then discuss solutions in future postings.  As you think about solutions, I'd like to suggest that you also think about how you might test your solutions.

There was some confusion over this definition as evidenced in the subsequent responses, particularly over the inclusion of "1" as a member of the series. Subsequently, Mr. Koenig attempted to clarify the problem by re-stating it as follows.

[from http://dobbscodetalk.com/index.php?option=com_myblog&show=The-2-3-5-problem----a-first-look-at-some-solutions.html&Itemid=29]

When I described the 2-3-5 problem in my last post, I did not expect as many comments as I got.  The first wave of comments was about the precise meaning of the words in the problem description, so I eventually re-described the problem using Dijkstra's original terms: Write a program that produces the elements of a sequence such that

    * 1 is an element of the sequence.
    * If n is an element of the sequence, then so are 2n, 3n, and 5n.
    * All elements of the sequence are determined in this way.

An alternative way to state the problem is that it is to find positive integers such that if n is a prime factor of one of these integers, then n is 2, 3, or 5.  So, for example, 12 is an element of the sequence, because its only prime factors are 2 and 3 (even though 2 appears twice), but 14 is not an element because its prime factors include 7.  Moreover, 1 is an element of the sequence under both definitions, because the fact that it has no prime factors at all means that it satisfies the condition "if n is a prime factor" vacuously for all values of n.

[My initial response to this was in error. Here is my next response with a J solution. In it, I refer to some of the other responses on the forum by people who had posted solutions or examples from the series. Unfortunately, the message response software on the blog mangled my reply, so I'm putting it here for clarity.]

As Mr. Klauder pointed out, my J solution was incorrect. I didn't read the problem statement completely. If I may test my understanding, believing that generative solutions are probably superior to modulus-based ones, here's another way to state it:

Generate all products (2^m)*(3^n)*(5^p) where m, n, and p are non-negative integers.

This clearly includes 1 in the case where the exponents all are zero, and excludes the erroneous cases in my first solution which includes numbers with prime factors other than 2, 3, or 5. Remember too that we all seem to have been ignoring the second part of the problem which asks how to check your solution.

I quickly came up with an answer based on my re-statement of the problem. However, given my initial mis-step, I took some time with the verification. My own verification seems to bear out my solution, but I had some disagreement with some of the numbers posted by Mr. Klauder.

First, when I count the number of terms less than or equal to 2^30, I get 1545; since this is only off by one from his number of 1544, I suspect that maybe he's counting only the terms less than this and doesn't include the end-point itself. The second highest number I get agrees with his "1,062,882,000", so I think we're getting the same result here.

However, for larger terms, I (at first) had a more substantial disagreements. So, for his value of 18,443,628,785,295,251,988,480,000,000,000,000 as the 71,194th element, I got this to be the 52887th element (starting from the zeroth element, 52888th starting from one). However, I subsequently realized this has to do with the way I generate the sequence as seen in the following. (Subsequent, more extensive generation, of sequences found me in agreement with the place of this number.)

To speak for a moment on the validation question, for the big number in the preceding paragraph, I get its factors, in vector notation as 2 3 5 ^ 25 37 13 (or 2^25 , 3^37 , and 5^13 ). These evaluate to 33554432 450283905890997363 1220703125. J has extended integers built in, so I figured out these exponents with this expression:

   +/|:2 3 5=/q: 18443628785295251988480000000000000x
25 37 13

(the indented line is my input, followed by the non-indented response from the interpreter). J has built-in factorization using the "q:" symbol.

In any case, here's how I solved this in J: first, I picked an arbitrary starting point of 2^64 since Mr. Klauder was already using this. Then, I figured out how many times each of the prime factors divides into this evenly:

   2 3 5^. 2x^64
64 40.379504 27.5633

The "x" after the "2" casts it to an extended-precision integer; "^." takes the log of the right argument using the base specified by the left argument which is the vector "2 3 5" in this case. So we see the expected answer for the base 2 log, and smaller, non-integral answers for the other two.

So, rounding each of these down gives us the integer exponents.

   <.2 3 5^. 2x^64
64 40 27

Now it gets a little tricky and harder to display, so I'm going to step back and use a maximum of 2^10 just to demonstrate the method. This gives me the integral exponents

   <.2 3 5^. 2x^10
10 6 4

The tricky thing is to generate each of the possible exponents (see m, n, p above) from zero to each of these numbers:

   i.&.>>:<.2 3 5^. 2x^10
+----------------------+-------------+---------+
|0 1 2 3 4 5 6 7 8 9 10|0 1 2 3 4 5 6|0 1 2 3 4|
+----------------------+-------------+---------+

The result here is a vector of three "boxed" vectors which are my possible exponents for each of the respective three prime factors.

J has a catalog function "{" which gives all the combinations from a set of boxed vectors, e.g. for three vectors of letters:

   { 'pts';'ao';'pt'
+---+---+
|pap|pat|
+---+---+
|pop|pot|
+---+---+

+---+---+
|tap|tat|
+---+---+
|top|tot|
+---+---+

+---+---+
|sap|sat|
+---+---+
|sop|sot|
+---+---+

Since all the combinations of even my smaller numeric vector is too big to display easily, we'll just look at the shape (using "$") of the catalog ("{"):

   $ { i.&.>>:<.2 3 5^. 2x^10
11 7 5

which makes sense as this is each one more than the "10 6 4" vector we got for the logs of 2^10 above; it's one more because we're counting from zero through the maximum value in each case. Lets assign this to the variable "exps" for "exponents"

   exps=. { i.&.>>:<.2 3 5^. 2x^10

Looking at the first few elements to assure ourselves that they look like sensible exponents:

   2 2 3{.exps
+-----+-----+-----+
|0 0 0|0 0 1|0 0 2|
+-----+-----+-----+
|0 1 0|0 1 1|0 1 2|
+-----+-----+-----+

+-----+-----+-----+
|1 0 0|1 0 1|1 0 2|
+-----+-----+-----+
|1 1 0|1 1 1|1 1 2|
+-----+-----+-----+

Now we raise the prime factors to each boxed set of exponents. Again because the result is too big, just look at a few of the results:

   2 2 3{.(<2 3 5)^&.>exps
+-----+-----+------+
|1 1 1|1 1 5|1 1 25|
+-----+-----+------+
|1 3 1|1 3 5|1 3 25|
+-----+-----+------+

+-----+-----+------+
|2 1 1|2 1 5|2 1 25|
+-----+-----+------+
|2 3 1|2 3 5|2 3 25|
+-----+-----+------+

Now, take the product within each block, unbox the results, and assign it to variable "ans10":

   ans10=. */&>(<x: 2 3 5)^&.>exps

The "x:" in front of the prime factors again casts them to extended integer even though we don't need it for results this small. However, when we look at a bigger version of the problem, we'll want to remember this or we'll only get floating-point approximations of the results.

Again looking at only the first few to re-assure ourselves they make sense:

   2 2 3{.ans10
1  5  25
3 15  75

2 10  50
6 30 150

Finally, for this little example, turn it into a vector and sort it ascending:

   ans10=. /:~,ans10
   $ans10
385

Using "$" to check the shape shows us we have 385 results. However, most of these are greater than 2^10:

   +/ans10>2^10
298

In fact, 298 of them are. This is because we're including combinations like 2 3 5^10 1 1 and so on. To reduce this to only elements less than or equal to 2^10:

   ans10=. ans10#~ans10<:2^10
   $ans10
87

Now we have 87, few enough to display them all:

   3 29$ans10
  1   2   3   4   5   6   8   9  10  12  15  16  18  20  24  25  27  30  32  36  40  45  48  50  54  60  64   72   75
 80  81  90  96 100 108 120 125 128 135 144 150 160 162 180 192 200 216 225 240 243 250 256 270 288 300 320  324  360
375 384 400 405 432 450 480 486 500 512 540 576 600 625 640 648 675 720 729 750 768 800 810 864 900 960 972 1000 1024

Shown as a 3 by 29 table just for neatness.

Now, to check our result, factor each element using "q:". First, just look at the factorization for the last three as a basic sanity check:

   q:&.>_3{.ans10
+-------------+-----------+-------------------+
|2 2 3 3 3 3 3|2 2 2 5 5 5|2 2 2 2 2 2 2 2 2 2|
+-------------+-----------+-------------------+

Looks OK - now do them all but include the check that the results only consist of 2s, 3s, and 5s:

   *./&>(<2 3 5) e.~&.> q:&.>ans10
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

The ones indicate "True". To ensure that these are all ones, we could add them up and compare them to the number of elements:

   +/*./&>(<2 3 5) e.~&.> q:&.>ans10
87
   $ans10
87

Or logically "and" them all together:

   *./*./&>(<2 3 5) e.~&.> q:&.>ans10
1

So, they're all true. This check will be more useful when we have many more values to examine.

Re-doing this same exercise for a larger universe:

   6!:2 'pr64=. /:~~.,*/&>(<x: 2 3 5)^&.>{i.&.>>:<.(x: 2 3 5)^.2x^64'
0.98847284
   $pr64
74620
   >./pr64
1670936817355466758479855747072000000000000000000000000000
   +/pr64<:2x^64
13283

The "6!:2" times the evaluation and assignment to the variable "pr64" - it takes about a second. We have over 74,000 results, the largest of which is the big number shown above but there are 13,283 less than or equal to 2^64.

Checking the factoring of the highest value:

   q: >./pr64
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 ...

gives too long a result to display easily, so let's count the number of each factor:

   +/|:2 3 5=/q: >./pr64
64 40 27

Verify first the prime factors to each respective exponent, then their product:

   (x:2 3 5)^64 40 27
18446744073709551616 12157665459056928801 7450580596923828125
   */(x:2 3 5)^64 40 27
1670936817355466758479855747072000000000000000000000000000

One problem with this method of generation is that it gives some very high values which are in the sequence but lack values prior to this highest value. So, when I at first disagreed with Mr. Klauder's value for the 71194th element of the sequence (based on the above generation), I found agreement if I used a higher limit:

   pr120=. /:~~.,*/&>(<x: 2 3 5)^&.>{i.&.>>:<.(x: 2 3 5)^.2x^120
   pr120 i. 18443628785295251988480000000000000x
71193

(the "i." in "pr120 i. 18443628785295251988480000000000000x" looks up the index into the left argument of the right argument) which agrees as I'm counting from zero.