NYCJUG/2022-01-11

From J Wiki
Jump to navigation Jump to search

Beginner's regatta

In the "Show and Tell" section following, we use something like this expression from Skip Cave to note which numeric vectors contain a single pair of duplicated values:

   singlePair=: 13 : '(<:#&>y)=#&>~.&.>y'   NB. 0 if no pair in item of (vec of vecs) y, 1 if one pair of identical values.
   singlePair
([: <: #&>) = [: #&> ~.&.>
   singlePair 1 1 2;1 1 1;2 2 1 1;1 2 3 4;1 2 3 4 4  NB. Single pair, triplet, 2-pair, no pair, single pair
1 0 0 0 1

How does this work?

The spacing provided by the tacit version provides us a convenient breakdown of the function if we work from right to left. In this direction, the first sub-expression is ~.&.> which returns the unique items in each element of y.

The next part #&> counts the number of unique items, returning a simple vector. These numbers are each compared to one less than the total number of items in each element of y by ([: <: #&>).

The reason this tells us there is a single pair in a sequence is because the only way the number of unique items in a vector is one less than the total number of items is if there is a single pair of identical values. This works well for a single pair but is not obviously extendable for detecting other combinations.

Simplifying the Code by Refactoring

When we look at this code, we might notice that we ultimately want to compare two simple (unenclosed) vectors of counts, so keeping them enclosed for the nub (~.), then disclosing the result by counting those, can be simplified by combining tally and nub using atop (@:) so we have

   singlePair=: 13 : '(<:#&>y)=(#@:~.)&>y'   NB. 0 if no pair in item of (vec of vecs) y, 1 if one pair of identical values.
   singlePair
([: <: #&>) = #@:~.&>

Of course, we want to get the same answer as before, so we check it:

   singlePair 1 1 2;1 1 1;2 2 1 1;1 2 3 4;1 2 3 4 4
1 0 0 0 1

We might notice that the left-hand sub-expression <:#&>y is really decrement atop tally, or <:@:# applied to each element of y, giving us

   singlePair=: 13 : '((<:@:#)&>y)=(#@:~.)&>y'   NB. 0 if no pair in item of (vec of vecs) y, 1 if one pair of identical values.
   singlePair 1 1 2;1 1 1;2 2 1 1;1 2 3 4;1 2 3 4 4
1 0 0 0 1

Now that we see these two parallel constructions next to each other, it's clear that the &> is repeated so it can be moved outside the parentheses like this:

   singlePair=: 13 : '((<:@:#)=(#@:~.))&>y'   NB. 0 if no pair in item of (vec of vecs) y, 1 if one pair of identical values.
   singlePair 1 1 2;1 1 1;2 2 1 1;1 2 3 4;1 2 3 4 4
1 0 0 0 1

It's now apparent that the expression in the outer parentheses is a fork, so we can remove excess parentheses, giving us

   singlePair=: 13 : '(<:@:# = #@:~.)&>y'   NB. 0 if no pair in item of (vec of vecs) y, 1 if one pair of identical values.
   singlePair
(<:@:# = #@:~.)&>
   singlePair 1 1 2;1 1 1;2 2 1 1;1 2 3 4;1 2 3 4 4
1 0 0 0 1

Timing Improvement

This simplification of the code also improves its performance.

To quantify this improvement, let's define the original and refactored versions of the code this way:

   singlePair0=: ([: <: #&>) = [: #&> ~.&.>    NB. Original version
   singlePair=: (<:@:# = #@:~.)&>              NB. Refactored version

We can define a million element test vector with about 25% single pairs like this:

   +/singlePair testSP0=. (?4$~#testSP0) (] , ] { ~ [ ? [: # ]) &.> testSP0=. ~.&.>20?@$&.>1e6$20
251216

Check that the results are the same between the two versions, then time each one for 100 iterations to get stable timings:

   (singlePair-:singlePair0) testSP0
1
   (100) 6!:2 'singlePair testSP0'
0.213768
   (100) 6!:2 'singlePair0 testSP0'
0.388398

So the refactored version is roughly twice as fast.

Testing with larger vectors that have about 1/5th single pairs, we see similar results:

   +/singlePair testSP0=. (?5$~#testSP0) (] , ] { ~ [ ? [: # ]) &.> testSP0=. ~.&.>40?@$&.>1e6$40
200316
   (singlePair-:singlePair0) testSP0
1
   (100) 6!:2 'singlePair0 testSP0'
0.55259
   (100) 6!:2 'singlePair testSP0'
0.237357

Show-and-tell

Revisiting the "Percentage of Flops with a Pair" Problem

We had two responses to a problem presented in our meeting this past October, one from Thomas McGuire and one from Skip Cave. Skip's answer is a brute force solution but one that is reasonably fast for pertinent values and which looks correct by inspection. Thomas's answers were more analytical (he came up with two different ways of stating it) but lacked sufficient generality as he posed them which makes it more difficult to test by applying to similar problems. However, he made use of something in J called a "Stopes function" with which I was unfamiliar so it was interesting to learn about this.

The Stopes Function

In fact, when I searched for more information on this function, I found far more relevant references to it on the J wiki than I did with a more broadly-based search. Many of the hits I got were typos for "slope function" and the non-typo results referred specifically to something in mining called a "stope":

  1. An excavation in the form of steps made by the mining of ore from steeply inclined or vertical veins.
  2. Middle English forms of stapen, past participle of step.
  3. An excavation made in a mine to remove the ore which has been rendered accessible by the shafts and drifts.

[From The American Heritage® Dictionary of the English Language, 5th Edition by way of Wordnik.]

So, this is basically a step function in the form S^!.I ] N where S is the starting point, I is the increment (step size), and N is the number of steps.

Thomas's Response - lightly edited

​From: Thomas McGuire <tmcguirefl@gmail.com>
Date: Thu, Oct 14, 2021 at 1:10 AM
Subject: [Jprogramming] NYCJUG: Counting pairs in a 3 card poker hand
To: <programming@jsoftware.com>

Devon presented more of his Poker simulations now using Jd (the J database) at the most recent NYCJUG meeting (https://code.jsoftware.com/wiki/NYCJUG/2021-10-12 <https://code.jsoftware.com/wiki/NYCJUG/2021-10-12>)

He came up with an interesting problem of calculating the expected number of pairs in the initial flop in a game of Omaha.

There are 3 cards dealt into the flop. Consider those with only pairs in them, there would be 3 ways that the pairs could be dealt out.

X X Y, X Y X, Y X X

Taking the first configuration the number of different hands you could make would be:

   */52 3 48  NB. this matches part of Devon’s calculation
7488

Any one of 52 cards can be dealt, once the card is dealt then only 1 of 3 can be dealt to make the pair, then we have to exclude the remaining 2 cards that match and therefore the third card will be any one of 48. Since there are 3 configurations the cards can be dealt in, the other 2 would be calculated:

  */52 48 3
7488
  */48 52 3
7488

This makes

   3*7488.    NB. Different 3 card hands with only pairs in them
22464

To calculate the percentage of the total number of hands Devon made the calculation using only one of the pairs hand configurations, then used 3!52 for combinations of 52 things taken 3 at a time.

This was off by a factor of 2 from his simulation, where he enumerated all the possibilities.

I finally realized (it took me 2 days of intermittent thought) that the order in which the cards are dealt matter. Not so much in the scoring of a hand but it does matter for the number of ways the same hand can be dealt out.

So the total number of 3 card hands from 52 cards is:

    */52 51 50
132600

which is permutations of 52 things taken 3 at a time, which, like the combinations function in J, has a representation called the Stope function:

    52 ^!._1 (3)
132600

Devon’s original calculation was:

   (*/52 3 48)%3!52
0.338824

However, based on the analysis above it should be:

   (3 * */52 3 48)%52 ^!._1 (3)
0.169412

which is very close to his simulated percentage: 0.169418.

Devon was more concerned with scoring a hand rather than the order they were dealt in, when dealing with his calculation. So the best I could come up with to follow what I think was Devon’s thought process was the following:

  • there are 13 different card values
  • there are 2!4 combinations of pairs due to the different suits
  • there will be 48 cards left over to make the flop since we exclude 3 of a kind (that’s a different type of poker hand than a pair) So:
       (*/13,(2!4),48)%3!52
    0.169412
    

    So the question is: did I calculate these both correctly or did I just come up with 2 methods that match the simulation and just think my logic is correct?

  • Skip's Response

    Skip Cave gave us a good brute force way of solving this problem:

    ​From: Skip Cave <skip@caveconsulting.com>
    Date: Fri, Oct 15, 2021 at 11:48 AM
    Subject: Re: [Jprogramming] NYCJUG: Counting pairs in a 3 card poker hand
    To: Programming@jsoftware.com <programming@jsoftware.com>
    

    Brute Force probability of a pair in 5 cards:

       (+/%#)4=;#&.>~.&.>{(5 comb 52){4#1 to 13
    0.422569
    

    This is good because it's easy to generalize and the code for it appears to make sense, especially if you take a look at Beginner's Regatta above.

    So, for the portion of single pairs in 4 cards, and simplifying 1 to 13 to i.13, we get

       (+/%#)3=;#&.>~.&.>{(4 comb 52){4#i.13
    0.30425
    

    For the of portion of single pairs in 5 cards:

       (+/%#)4=;#&.>~.&.>{(5 comb 52){4#i.13
    0.422569
    

    Generalizing this slightly, the number of pairs in three through six cards is:

       numPairsIn=: 3 : '(+/%#)(<:y)=;#&.>~.&.>{(y comb 52){4#i.13'"0
       numPairsIn 3 4 5 6
    0.169412 0.30425 0.422569 0.485505
    

    The use of "y comb 52" means that this takes significantly more time for larger y:

       6!:2 'numPairsIn 2'
    0.0006821
       6!:2 'numPairsIn 3'
    0.0082328
       6!:2 'numPairsIn 4'
    0.0939739
       6!:2 'numPairsIn 5'
    1.40777
       6!:2 'numPairsIn 6'
    45.9122
    

    Looking at by what factor the time increases, using the values above, we see a factor greater than 10 for each:

       2%~/\0.0006821 0.0082328 0.0939739 1.40777 45.9122
    12.0698 11.4146 14.9804 32.6134
    

    This emphasizes that an analytic solution like Thomas's would be superior for large values of y, assuming we can generalize it properly. Fortunately, having correct answers for small values of y can help us determine the correctness of any generalization.

    Timings for Improved Version

    If we incorporate our refactored singlePairs from the earlier Beginner's Regatta section, we see significant improvement in the brute force method over the domain in which we are interested:

       numPairsIn=: 3 : '(+/%#)(<:@:# = #@:~.)&>{(y comb 52){4#i.13'"0
       numPairsIn 2 3 4
    0.0588235 0.169412 0.30425
       6!:2 'numPairsIn 2'
    0.0007094
       6!:2 'numPairsIn 3'
    0.0081591
       6!:2 'numPairsIn 4'
    0.0836212
       6!:2 'numPairsIn 5'
    1.04363
       6!:2 'numPairsIn 6'
    17.8209
       2%~/\0.0006821 0.0082328 0.0939739 1.40777 45.9122  NB. Differences between timings of earlier version
    12.0698 11.4146 14.9804 32.6134
       2%~/\0.0007094 0.0081591 0.0836212 1.04363 17.8209  NB. Differences between timings of refactored version
    11.5014 10.2488 12.4804 17.0759 
    

    However, even the refactored version grows rapidly enough that the time for calculating for large values quickly becomes prohibitive, though not for values relevant to this problem.

    Verifying and Generalizing Thomas's Calculations

    As I mentioned in my reply to Thomas's email, we could verify his calculation by applying it to other variants of the problem.

    The first problem with generalizing this is that there are several "magic numbers" in this expression:

       (3**/52 3 48)%52^!._1 (3)   NB. Probability of 1 pair in 3 cards
    0.169412
    

    One of these is obvious: 52 is the number of cards in a deck but we do not need to generalize this for this problem. The numerous values of "3" pose more of a problem because it's clear that at least one of them, the one in the vector, is a different thing than the other two. The terminal "3" is evidently the number of cards in which we want to find a pair, so perhaps we can replace this with "y" in a more general expression.

    This leaves the initial "3" - what does it represent? Fortunately, since we already know the correct answer for some values of this problem, we can easily verify that the first "3" does not represent the same thing as the last one:

       (4**/52 3 48 44)%52^!._1 (4)   NB. Probability of 1 pair in 4 cards?
    0.202833                          NB. Incorrect....
    

    Looking back at the original explanation, we see that the first "3" is the number of possible ways a pair can occur in a set of three, which Thomas illustrated by enumerating the possibilities: X X Y, X Y X, Y X X.

    If we similarly enumerate the ways a pair can show up in four cards, we see that this is XXYZ, XYXZ, XYZX, YXXZ, YXZX, YZXX - six arrangements assuming we realize that XXYZ is the same as XXZY and so on because Y and Z do not represent specific values, just ones that are different from each other and from X.

    With a bit of thought, we can figure that these are the values of 2!N - N things taken two at a time:

       2!2 3 4 5
    1 3 6 10
    

    So, the initial "3" generalizes to 2!y , if y is the number of cards, for 3 cards.

    Stopes Again

    The vector in the middle which we reduce by "*" is a little trickier but the pattern is clear if we move that "3" to the front it becomes 3,(52-0),(52-4)....

    Oh, look: it's the Stopes function again! We need to use 52^!._4 where "52" is the number of cards in a deck and "4" is the number of suits, which we negate for the decrement value. Also, the initial "3" is one less than the number of suits. So, if we account for needing y-1 steps for the new use of Stopes, we get this:

       numPairsInTM=: 13 : '((2!y) * */ 3,52^!._4 ] <:y) % 52^!._1 ] y'"0   NB. # pairs in y cards, thanks to Thomas McGuire
       numPairsInTM
    (((2 ! ]) * [: */ 3 , 52 ^!._4 [: ] <:) % 52 ^!._1 ])"0
    

    Testing this for values where we already know the answers:

       numPairsInTM 2 3 4 5 6
    0.0588235 0.169412 0.30425 0.422569 0.485505
    

    Even better, this is analytic, not brute force, so the time spent is constant regardless of the magnitude of the argument:

       6!:2 'numPairsInTM 2'
    1.49e_5
       6!:2 'numPairsInTM 3'
    1.43e_5
       6!:2 'numPairsInTM 4'
    1.41e_5
       6!:2 'numPairsInTM 5'
    1.36e_5
       6!:2 'numPairsInTM 6'
    1.4e_5
    

    This O(0) (?) performance lets us play with many more values without being constrained by the time it takes:

       numPairsInTM 2+i.11
    0.0588235 0.169412 0.30425 0.422569 0.485505 0.472839 0.392282 0.275107 0.159946 0.0744721 0.026156
       numPairsInTM 14
    0.000739767
    

    Interestingly enough, the percentage of single pairs hits a maximum at six cards and decreases after that.

    We also get reasonable behavior for edge conditions and extreme values:

       numPairsInTM 1
    0
       numPairsInTM 0
    |domain error: numPairsInTM
    |numPairsInTM[0]
          dbr 1 [ dbr 0
       numPairsInTM >:i.26
    0 0.0588235 0.169412 0.30425 0.422569 0.485505 0.472839 0.392282 0.275107 0.159946 0.0744721 0.026156 0.00618234 0.000739767 0 0 0 0 0 0 0 0 0 0 0 0
    

    This tapers off to zero for values greater than 14 because we must have more than one pair if we pick two or more than the 13 distinct ranks. It's a low number for 14 because there's still a small chance of getting only 1 pair.

    Advanced topics

    Language Use Related to Tool Use

    Here's some interesting research that lends weight to the idea of notation as a tool of thought. The article pointing out the research is here - https://www.newscientist.com/article/2297259-using-tools-helps-you-understand-language-and-vice-versa/ - but behind a paywall. However, the original research can be seen here: https://www.science.org/doi/10.1126/science.abe0874.

    The TLDR; is that language and tool use seem to be governed by the same brain region and that practicing tool-using helps people do better in a test of complex language understanding, and vice-versa.

    According to the New Scientist article

    The crossover [between tool-use and language understanding] may happen because some of the same parts of the brain are involved in tool use and language, says Claudio Brozzoli at the National Institute of Health and Medical Research in Lyon, France.
    One idea is that language evolved by co-opting some of the brain networks involved in tool use. Both abilities involve sequences of precise physical movements – whether of the hands or of the lips, jaws, tongue and voice box – which must be done in the right order to be effective.
    Brozzoli’s team asked volunteers to lie in a brain scanner while carrying out tasks involving either tool use or understanding complex sentences. The tool-based task involved placing small, key-shaped pegs into a tray of holes using a pair of long pliers, while viewing the hands through an arrangement of mirrors. The language test involved understanding sentences such as: “The writer that the poet admires writes the paper.”
    During both tasks, the fMRI scanner showed higher activity in small structures deep in the brain called the basal ganglia. The pattern of activity was similar during both tasks.

    In the study, 52 people alternated complex language tasks with tool-using tasks of differing complexity. The complex tool-using tasks correlated with 30% higher scores on the subsequent language task, compared to one prior to tool use, for people who had complex tool-using tasks but scores improved only 15% for a group that had a simpler tool-use task between the language tests. The benefit was similar for a pair of tool-use tasks interspersed with language tasks of varying complexity.

    Learning and Teaching J

    Terse or Short?

    Ten years ago, we were talking about one-word descriptions of J - bringing up the words "feral", "argute", "razor", and others. This discussion morphed into ideas about an elevator pitch or a 144-character tweet for J.

    The late, great Roger suggested this for a tweet:

       #jtweet1=: 0 : 0
    J is a programming language that works with arrays, verbs, and adverbs.  For example, +/x sums array x and /:~x sorts it.
    )
    122
    

    He started it, so I'll continue it, perhaps a little more succinctly:

       #'J is a programming language that works with arrays, verbs, and adverbs.  For example, +/x sums array x and /:~x sorts it.'
    121
    

    Hmm, "is" is a weak verb. My 1st five tries:

          #'The J programming language works with arrays using verbs and adverbs.  For example, +/x sums array x and /:~x sorts it.'
    119
          #'The J programming language works with arrays using verbs and adverbs.  For example, +/x sums array x, /:~x sorts it, and |.x flips it'
    133
       #'The J programming language works with arrays using verbs and adverbs.  For example, +/x sums array x, /:~x sorts it, and |.x flips it on a vertical axis'
    152
    

    Oops, a bit long. Try shortening it by changing a noun phrase to an adjective, as Ken might suggest.

       #'The J programming language works with arrays using verbs and adverbs.  For example, +/x sums array x, /:~x sorts it, and |.x flips it vertically.'
    145
       #'The J programming language works with arrays using verbs and adverbs.  For example, +/x sums array x, /:~x sorts it, and |.x flips it vertically'
    144
    

    But by "vertically", I really intend to mean "on a vertical axis" but's that's another noun phrase and confuses in one dimension and is arguably incorrect in more than one, so that's not exactly right but what is that will fit? I tried a few.

       #'The J programming language works with arrays using verbs and adverbs.  For example, +/x sums array x, /:~x sorts it, and |.x flips its 1st dimension'
    148
       #'The J programming language works with arrays using verbs and adverbs.  For example, +/x sums array x, /:~x sorts it, and |.x flips it 1-dimension'
    145
       #'The J programming language works with arrays using verbs and adverbs.  For example, +/x sums array x, /:~x sorts it, and |.x flips 1st-dimensionally'
    148
       #'The J programming language works with arrays using verbs and adverbs.  For example, +/x sums array x, /:~x sorts it, and |.x flips 1st-dimension'
    144
       #'The J programming language works with arrays using verbs and adverbs.  For example, +/x sums array x, /:~x sorts it, and |.x flops 1st-dimension'
    144
    

    I was going to go with the final version above where I change "flips" to "flops" to incorporate the "on" I wanted to insert but could not, at least not until I thought about it a little more and decided to start pushing the boundaries of online idiom and punctuation, if such a thing is possible.

       #'The J programming language works on arrays using verbs and adverbs. EG +/x sums array x, /:~x sorts it, and |.x flips, all on x''s 1st-dimension'
    143
    

    That's probably good enough for now but leaves room for improvement.

    What do you think? What's your contribution?

    Here's one more:

      #'J plays with multi-dimensional arrays using a symbol-heavy notation based on functions and modifiers melded together in a simple, logical system'
    144