Doc/Articles/Play151

From J Wiki
Jump to: navigation, search

At Play With J ... Table of Contents ... Previous Chapter ... Next Chapter

17. To Summarise

. By Eugene McDonnell. First published in Vector, 15, 1, (August 1998), 132-137.

The key adverb in J, represented by slashdot (/.) is defined in the Dictionary as:

x u/.y  is  (=x) u@# y

that is, items of x specify keys for corresponding items of y and u is applied to each collection of y having identical keys. For example:

	
   1 2 3 1 3 2 1 </. 'abcdefg'
+---+--+--+
|adg|bf|ce|
+---+--+--+

This may be clearer if we look at the separate parts.

  =x
1 0 0 1 0 0 1
0 1 0 0 0 1 0
0 0 1 0 1 0 0

The first row of this has 1s in the first, fourth, and seventh positions, so when used as the left argument to copy (the dyad of #), and applied to y , yields its first, fourth, and seventh items, or 'adg' ; similarly the second row yields 'bf' and the third row yields 'ce' . Each of these is then boxed and the three are catenated together, yielding

+---+--+--+
|adg|bf|ce|
+---+--+--+

The basic idea remains the same when u changes from box to a different monad. For example, if we replace box by tally (the monad of #) we get:

  x #/. y
3 2 2

The same three groupings are selected, but instead of being boxed they are tallied, or counted, yielding the count of each group; three in the first, and two in the second and third.

The key adverb was not in the initial version of J. It came in later at the request of the J user community, notably Joey Tuttle. Joey's interests were not merely theoretical; he had practical ends in view. He was in the business of analyzing huge amounts of data and summarizing time and amount fields by accounts. We'll use an abbreviated version so we can fit the data onto a small page. Suppose we have three accounts, 1001, 1002, and 1003, and suppose further that we have a table whose rows give an account number and an amount:

   acct=:1001 1002 1003

   table=:((?10#3){acct),.?10#100

   table
1001 51
1003 83
1002  3
1002  5
1001 52
1001 67
1003  0
1003 38
1003  6
1002 41

To summarize this table by account, we transpose (|:) it, so that the accounts are in the first row, and the amounts in the second, then insert (/) sum key (+//.) between the account row and the amount row:

   +//./|:table
170 127 49

If you check the first amount in the sum, 170, you can verify that it is indeed the sum of the three amounts associated with the first occurring account, 1001, that is, it is the sum of 51, 52, and 67. Similarly the second amount 127, is the sum of the four amounts associated with the second occurring account, 1003, that is, it is the sum of 83, 0, 38, and 6. Lastly, the third amount, 49, is the sum of the three amounts associated with the third occurring account, 1002, that is, 3, 5, and 41.

This result may not be completely satisfactory, since the amounts are not in the order of the accounts: they are in the order in which they fortuitously occur in the table. One way to remedy this is to place some dummy rows at the beginning of the table, one for each account, with the accounts in the desired order, and with the amounts set to zero (acct,.0) .

  (acct,.0),table
1001  0
1002  0
1003  0
1001 51
1003 83
1002  3
1002  5
1001 52
1001 67
1003  0
1003 38
1003  6
1002 41

Now when we summarize the amounts will be in account order.

   +//./|:(acct,.0),table
170 49 127

We can produce a summary of accounts and amounts by prefacing the above with the list of account numbers and stitching (,.) the lists together.

   acct,.+//./|:(acct,.0),table
1001 170
1002  49
1003 127

This gives you the theory and the practice of the key adverb, so it's time to play, and incidentally to learn another way to use key.

How are the digits of pi distributed? If the digits were distributed evenly, then the frequency of occurrence of all digits would be about 10%. J enables you to compute as many digits of pi as you have room for and time for. A convenient way to obtain n digits of pi is to subtract 1 (<:) from n , make this an extended integer (x:) use this as an exponent of 10 (10^), apply floor atop pi times (<.@o.) and take the format (":) of this:

   dp=: monad def '":<.@o.10^x:<:y'   NB. digits of pi

Try this on a small integer:

   q10=:dp 10

   q10
3141592653

This is correct. Try it on a somewhat larger integer:

   q30=:dp 30

   q30
314159265358979323846264338327

Checking this against the value of pi to many places in a table such as may be found in a volume of Knuth's The Art of Computer Programming shows that q30 is accurate, too.

Let's compute some more (q3000 may take several minutes):

   q100=:dp 100

   q300=:dp 300

   q1000=:dp 1000

   q3000=:dp 3000

Now let's see how the digits are distributed in each of these, in order. We need a digit distribution function, This is where a new use of key comes in. In order to make the result of the digit distribution function be in the right order, we'll preface the argument with d , a list of the decimal digits in order.

   d=:'0123456789'

To get the distribution we preface the formatted digits of pi with d (d,y) ,  then apply count (#) key (/.) reflexive (~) to this, giving us the distribution of d,y , then subtract 1 (<:) to adjust the count for the presence of d .

   dd=: monad def '<:#/.~d,y'

And try this out on q10 , which is easy to verify by eye:

   /:~q10
1123345569
   dd q10
0 2 1 2 1 2 1 0 0 1

No zeros, two ones, one two, two threes, one four, two fives, one six, no sevens or eights, and one nine. Now let's see the digit distribution of each of the other lists of pi digits.

   dd q30
0 2 4 7 3 3 3 2 3 3

   dd q100
8 8 12 12 10 8 9 8 12 13

   dd q300
26 30 35 31 37 27 31 19 34 30

   dd q1000
93 116 103 103 93 97 94 95 101 105

   dd q3000
259 308 303 266 318 315 302 287 310 332

These look somewhat reasonable, but it would be better to see how closely each gets to having 10% of each digit, using a function pd , which takes a distribution as argument, and yields the percentage of each value, rounded to the nearest one per cent. Do this by dividing the values by the sum of the values (y%+/y) , multiply this by 100 (100*) , to get percentages, and round, getting the nearest percentage, by adding a half (0.5+) and taking the floor (<.) .

   pd=: monad def '<.0.5+100*y%+/y'

   pd dd q10
0 20 10 20 10 20 10 0 0 10

We can compare this to dd q10 and see that it is simply the same values multiplied by 10 to give percentages, as desired. There are so few digits to take into account that it is difficult to say whether the distribution is even or not. Trying the next distribution, of thirty values, still leaves us uncertain.

   pd dd q30
0 7 13 23 10 10 10 7 10 10

There are no zeros among the first thirty digits, and a lot of threes. Probably still not enough digits.

   pd dd q100
8 8 12 12 10 8 9 8 12 13

Except for the large number of nines, this is beginning to look quite even.

   pd dd q300
9 10 12 10 12 9 10 6 11 10

Here, the number of sevens seems too low. Let's keep looking.

   pd dd q1000
9 12 10 10 9 10 9 10 10 11

Ones seem a bit high, but I'd say this distribution is even enough.

   pd dd q3000
9 10 10 9 11 11 10 10 10 11

With 3000 digits to distribute, we can say with some satisfaction that this represents an even distribution. Before we part, let's looks at a consecutive portion of these digits:

   (762+i.6){q1000
999999

Hmmm. Well, yes, that's not too unusual. * In fact, if such strings didn’t occur every now and then, it would argue against randomness.

FOOTNOTE:

  • Eugene has selected the so-called Feynman Point, a six-digit sequence 999999 in the decimal expansion of π to which Richard Feynman (1918-1988) used to draw attention in his lectures to make an instructive joke about what is and what is not perceived as random. (Ed.)