NYCJUG/2021-08-10

From J Wiki
Jump to: navigation, search

Meeting Agenda for NYCJUG 20210810

Beginner's regatta

A Question About Local Versus Global Variables

The following question was posted on the J-Programming Forum:

​From: 'Firstname Lastname' via Programming <programming@jsoftware.com>
Date: Thu, Jul 15, 2021 at 5:01 AM
Subject: [Jprogramming] local variable in a script
To: <programming@jsoftware.com>

Dear list,

having a script file A.ijs with the contents

a =. 1
f1 =. a&+
f2 =. 3 : 0
a&+y
)
f1 0
f2 0

and loading it with

a =. 2
load 'A.ijs'

I see that the value 1 of 'a' is used for f1, but a value of a *global* variable 'a' is used for f2. This bothers me a lot. Hence just by changing from a tacit to explicit form can lead to a surprise.

I eventually found a mention of a similar effect in 'Learning J', but I do not understand why it has been designed this way... So what is the way to have local variables (ie, with the scope of a script file) that are visible (as if globals) to the functions within the script? Does one have to create a locale to this end?

Thanks for any comments and explanations.

Best regards
Firstname

The first thing to do when investigating a question like this is to replicate the result. Unfortunately, this proves to be difficult with the information given.

First, we read the file to ensure that it matches what is above:

   fread 'A.ijs'
a =. 1
f1 =. a&+
f2 =. 3 : 0
a&+y
)
f1 0
f2 0

Next, we repeat the steps mentioned in the email:

   a=. 2
   load 'A.ijs'
   f1
|value error: f1
   f2
|value error: f2

So, it's clear that we are not replicating the issue as stated.

Local Definitions in a Script

Since all the definitions in the example script are local, it is unsurprising that "f1" and "f2" are not defined. In fact all local definitions in a script are local to that script. That is, the local values are not defined in the session when a script is loaded into the session.

This is by design. It's often handy to use local variables but we don't want a lot of names cluttering our session, so local names in a script are not created in a session that loads the script. Also, we don't want random, temporary values from a script clobbering names in our session.

In fact, it is customary in J to define verbs as globals within their namespace. So, if we change the script file to define the verbs globally:

   fread 'A1.ijs'
NB.* A1.ijs: define verbs globally.
a =. 1
f1=: a&+
f2=: 3 : 0
a&+y
)
f1 0
f2 0

   load 'A1.ijs'
   f1
1&+
   f2
3 : 'a&+y'
   a=. 2
   load 'A1.ijs'
   f1
1&+
   f2
3 : 'a&+y'

Now we start to see the basis of the complaint "...just by changing from a tacit to explicit form can lead to a surprise." The difference between tacit and explicit is not the issue: the difference is caused by the explicit version referencing an identifier external to the script. So, changing the value of "a" will change the definition of the explicit verb which references that name but does not change the tacit verb which was defined using the value available to it at definition time.

We now also can see that the answer to the question "[s]o what is the way to have local variables (ie, with the scope of a script file) that are visible (as if globals) to the functions within the script?" is to define the functions as globals, as illustrated by our "A1.ijs" example above.

When to use Globals Versus Locals

In general, one should define verbs as globals using "=:". Most nouns should also be defined as local using "=." except where we explicitly want a global value. If we define a noun as a global, the convention is to use all capital letters to make this transgression more obvious.

Avoiding global nouns reduces the risk of inadvertently using wrong values. This principle is well-illustrated by this example since, I suspect, the only way the scenario could have happened as described is if the verbs were defined in the session prior to loading the script. Since these definitions were local to the session, they were not over-written by the locals defined in the script.

It is a commonly-accepted good coding practice to avoid global names as much as possible, particularly for passing data between functions as there is no way to guard against inadvertent re-assignment of the values. The sloppy practice of using global names to pass data between functions greatly complicates debugging as well.

Show-and-tell

I recently posted the following on the J-Programming forum.

Filtering Possible Hands

I'm stuck on getting decent performance for the following problem.

Say we have a list of all 5-card hands from a deck of 52 cards:

   load '~addons/stats/base/combinatorial.ijs'
   $allHands=. 5 comb 52
2598960 5

We also have a list of all 4-card combinations:

   $HCs=. 4 comb 52
270725 4

I want to find all hands possible using any two cards from a given row of HCs but excluding those with either of the other two cards from the same row.

For example, for this randomly-chosen row of HCs, which hands include the first two but exclude the second two?

   (]{~[:?#) HCs
15 22 33 44
   (#,+/) include=. 2=+/"1 allHands e."1 ] 15 22  NB. Size of result and number of hits
2598960 19600
   (#,+/) exclude=. 0=+/"1 allHands e."1 ] 33 44
2598960 480200

So there are 17,296 hands we want:

   +/include*.exclude
17296

Check five at random to verify that these look correct:

   ]rcix=. (]{~[:5&?#) I. include*.exclude
1002358 1165504 2176134 1960355 56685
   rcix{allHands
 4 15 22 37 39
 5 15 22 34 45
15 18 20 22 39
12 15 22 27 39
 0  4  6 15 22

These look correct because all include (15,22) but exclude any of (33,44).

However, this is only one possibility for this single row of HCs but we have six variations for this row based on choosing all possible pairs of indexes into the row of HCs to include,

   |:inclIx=. 2 comb 4
0 0 0 1 1 2
1 2 3 2 3 3

And six corresponding pairs to exclude (indexes into the row of HCs):

   |:exclIx=. (i.4)-."1 ] 2 comb 4
2 1 1 0 0 0
3 3 2 3 2 1

In the above example, we did the one of the six corresponding to inclIx of (0,1) and exclIx of (2,3). Is there a way to do all six sets for each row of HCs compared to each row of 'allHands?

We might start by reducing the sizes of our arrays by making them character instead of integer:

   allHands=. allHands{a. [ HCs=. HCs{a.

However, we quickly see that we run out of memory for the expression we would like to write:

   include=. 2=+/"1 allHands e."1 / inclIx{"1 / HCs
|out of memory
|   include=.2=+/"1 allHands     e."1/inclIx{"1/HCs

A little arithmetic shows us the extent of the problem:

   (],*/) #&>(,/inclIx{"1/HCs);<allHands
1624350 2598960 4221620676000
   10^.1624350*2598960
12.6255

So we have a 4-trillion-element intermediate result which is a bit much.

Our immediate thought might be to do this in pieces by selecting successive portions of the right-hand part (HCs) of this expression. We would like to work on integral pieces for neatness, so we look at the prime factors of the size of HCs and select the product of a couple of them to be the "chunk size":

   q:#HCs
5 5 7 7 13 17
   ixs=. (ctr*chsz)+i.chsz [ ctr=. 0 [ chsz=. */5 5

[The expression for the next chunk would be

   ixs=. (ctr*chsz)+i.chsz [ ctr=. >:ctr

and so on.]

Timing how long one piece takes and scoring the results by simply adding up their indexes into allHands, we get this:

   6!:2 'sel=. (2=+/"1 allHands e."1 / inclIx{"1 / ixs{HCs) *. 0=+/"1 allHands e."1 / exclIx{"1 / ixs{HCs'
11.5105
   scores=. 0$~#HCs    NB. Initialize scores vector
   6!:2 'theseScores=. +/&>I.&.><"1 |:+./"1 ] 0 2 1|:sel'
0.687056
   6!:2 'scores=. (theseScores) ixs}scores'
6.4e_6
   +/11.5105 0.687056 6.4e_6      NB. Total time per chunk
12.1976
   12.1976*(#ixs)%~#HCs           NB. Total estimated time
132088
   0 60 60#:12.1976*(#ixs)%~#HCs  NB. Estimated time: h m s
36 41 27.8104

So we should be able to do it this way in about 36 hours. Can someone think of a faster method to accomplish this? The memory requirements seem modest enough that I should be able to run five to ten processes simultaneously on this to reduce the overall time but is there a way to speed up the basic selection?

Some Alternatives

As is often the case on the J forums, I received a couple of good answers within a day.

The first of these, from Raul Miller, was much better than I gave it credit for initially. I thought his expression was returning an incorrect result when, in fact, it was so much more efficient than my original attempt that I failed to understand that the result combined what I was solving as six variations into a single result. So, while I was expecting 17,296 results for one of the six variations, Raul's method returns all 103,776 (=6*17296) solutions at once.

The proposal from Pascal Jasmin was quite elegant compared to the messy way I had proposed but was also slower, as I noted in my reply:

I modified selhands to return the indexes into allHands rather than the explicit results as this is more easily turned into a score.

selhands=: 4 : 0
   'in ex' =. y
   I. ((# in) = (ex +/@:e."1 ]) -~ in +/@:e."1 ]) x
)

I like the elegance of the expression; however, the performance does not appear to be better:

   ixs=. i.100                   NB. Do only 100 cases
   6!:2 'tst100=. (<allHands) selhands&.><"1 (<"1 ixs{,/inclIx{"1/HCs),.<"1 ixs{,/exclIx{"1/HCs'
49.7256
   #,/inclIx{"1/HCs              NB. How many altogether?
1624350
   49.7256*100%~1624350          NB. Scale time up by total #%100 -> estimate of total time
807718
   0 60 60 #: 49.7256*100%~1624350   NB. Estimated total time in h m s
224 21 57.7836

An Excellent Solution

Checking Raul's version for performance, I timed a run for 100 cases, then estimated the time for the entire set from that:

   6!:2 'tst3=. (<"1]100{.HCs) selthem &.> <allHands'
3.03651
   100%~#HCs
2707.25
   3.03651*2707.25            NB. Estimated time in seconds
8220.59 
   0 60 60#:3.03651*2707.25   NB. Estimated time h m s
2 17 0.591698
   selthem=: [: +/ [: I. 2 = [: +/"1 [: +./ =/          NB. Return the score (=sum of positions) directly.
   6!:2 'tstAll=. (<"1 HCs) selthem &> <allHands'
8376.24
   0 60 60#:8376.24
2 19 36.24

So the time to solve this for all cases was only about two hours and twenty minutes, much better than my original estimate of more than 36 hours.

Advanced topics

We will continue on the previous thread by breaking down Raul's solution given above and looking at another solution from R. E. Boss, the method of which hearkens back to a recent meeting where we looked at the enormous performance advantages of generative solutions.

All Selected Hands - Solution 1

Here is what Raul's simple, elegant solution looks like, slightly modified to return a scalar score rather than the long vector of results.

   selthem=: [: +/ [: I. 2 = [: +/"1 [: +./ =/
   6!:2 'tstAll=. (<"1 HCs) selthem &> <allHands'
8376.24
   0 60 60#:8376.24
2 19 36.24

We see from the arguments supplied that this function takes a vector of a 4-card combination on the left and the large table of all 5-card combinations on the right. In this invocation, we supply all the 4-card vectors en-masse by enclosing each row and applying the function using the bond conjunction &> to the entire 5-card table.

Breaking apart selthem by reading it right-to-left, we see that it compares each element of the right-hand argument to each element of the left-hand argument with =/, then or-reduces +./ the result to produce a Boolean matrix, with the shape of the right argument, containing a 1 for each match with a left-hand element. Summing this across the columns +/"1 returns the number of matches.

The next part, 2 =, is a brilliant simplification of my much more complicated logic where I treated inclusions and exclusions separately as it ensures we only pick the cases where exactly two of the 4-cards are in the 5-cards.

The final part, where we sum the indexes, +/ [: I., is my simple scoring method. This works fairly well if we have ensured that allHands is in ascending order of poker hands. In other words, the last four entries in allHands are the four royal flushes which are the highest hands in non-wildcard games. Ordering the 5-card combinations this way bakes in a simple weighting scheme where the hands that are better follow those that are worse.

Checking Results

A simple check of the results shows that the highest-rated 4-card hands are four-of-a-kinds with aces at the top, then kings, and so on down the card rankings.

   4{.\:tstAll
185795 176317 166071 155017
   showCards a. i. 185795 176317 166071 155017{HCs
+--+--+--+--+
|A♣|A♦|A♥|A♠|
+--+--+--+--+
|K♣|K♦|K♥|K♠|
+--+--+--+--+
|Q♣|Q♦|Q♥|Q♠|
+--+--+--+--+
|J♣|J♦|J♥|J♠|
+--+--+--+--+

This does not seem unlikely but may not be the best answer because our simple scoring system may be too simple. However, it looks as correct as the simple scoring allows.

All Selected Hands - Solution 2

This solution, from R. E. Boss, reminds us of an insight we had last meeting when looking at the "three consecutive identical digits" problem: generative solutions are much faster than filtering solutions. Generative solutions build up the results from small sets to larger ones whereas filtering solutions whittle down very large datasets to smaller ones.

There was an initial hiccup when I found that the solution as stated still runs out of memory when we try to solve for the entire set at once. The initial test on a subset looked promising as running on 100 4-card combinations was quite fast. Based on timing this run, I estimated the entire problem could be solved in about 20 minutes. However, the memory issue forced me into an iterative approach as seen below.

   6!:2 '$tstREB=. /:~"1 ,/^:2 (2{."1 t),"1 "3 c3_48{"1"1 _ (t=.c4_4{"(_ 1) 100{.c4_52)-.~ "1 i.52'
0.454629
   $tstREB
10377600 5
   5{.tstREB
0 1 4 5 6
0 2 4 5 6
0 3 4 5 6
1 2 4 5 6
1 3 4 5 6
   $c4_52
270725 4
   100%~#c4_52
2707.25
   0.454629*2707.25              NB. Estimated total time in seconds
1230.79
   0 60 60#:0.454629*2707.25     NB. Estimated total time in h m s
0 20 30.7944

The final sorting step is unnecessary since I will reduce the long vector into a score by summing it. However, we run into this by removing the 100{. above.

   6!:2 '$REBall=. +/"1 ,/^:2 (2{."1 t),"1 "3 c3_48{"1"1 _ (t=.c4_4{"(_ 1) c4_52)-.~ "1 i.52'
|out of memory
|   $REBall=.+/"1,/^:2(2{."1 t),"1"3 c3_48    {"1"1 _(t=.c4_4{"(_ 1)c4_52)-.~"1 i.52

How Solution 2 Works

As R. E. Boss explains his method:

The idea is you make the 6 variants of (some) all 4 comb 52 possibilities and remove these 4 numbers from i.52. Then you construct all 3 comb 48 from the remaining numbers. Then you prepend the two first numbers which were omitted and sort each row.

Yes - a generative solution! Why didn't I think of that? We explored this very thing in our last meeting in solutions to the "three consecutive digits" problem.

Solution 2 (Almost) Final Version

Unfortunately, breaking down a problem into pieces that fit into memory necessarily uglifies the original solution. Also, the proposed method introduces an issue with scoring hands by their positions in a list of all 5-card combinations in ascending order by value of the poker hand. However, the start of the solution is contained here though it will take some modifications to finalize it.

Here is my adaptation of the original code to process the 4-card table in pieces.

ashpREB=: 3 : 0
   c4_4=. (,.|.)2 comb 4 [ c3_48=. 3 comb 48 [ c4_52=. 4 comb 52
   'chsz iters'=. */ &> (3&{.;3&}.) q:#c4_52  NB. Chunk size, number of iterations
   scores=. 0$~#c4_52
   for_ix. i.iters do. ixs=. (ix*chsz)+i.chsz
      sc0=. (2{."1 t),"1 "3 c3_48{"1"1 _ (t=.c4_4{"(_ 1) ixs{c4_52)-.~ "1 i.52
      scores=. (+/,/^:2 ] 0 2 3 1|:sc0) ixs}scores
   end.
   scores
)

This modification also incorporates the aggregate scoring to keep down the size of the intermediate result.

This version gives us an answer in about the estimated 20 minutes:

   6!:2 'tstREB0=. ashpREB '''''
1301.3
   0 60 60#:1301.3
0 21 41.3

Learning and Teaching J

Arraycast Podcasts

The array languages podcasts, https://www.arraycast.com/episodes, are up to seven episodes now. They are well worth listening to and I hope to mine them in the future for some NYCJUG topics.

These are not so much about learning the languages as stirring up interest and providing incentive to learn one or more of them.

I find it especially useful that the podcasts are supplemented by transcripts which should help engage people with different learning styles. Also, the transcripts have links to the "show notes" which are more detailed notes on the code and such being discussed.

Learning APL with Neural Networks

Learn APL with Neural Networks: here is a set of YouTube tutorials, by Rodrigo Girão Serrão, that aims to teach APL by implementing a neural network with the language.

There are 42 videos ranging in length from a little less than three minutes to nearly 16 minutes. In total, the videos will take about six hours to watch without interruption, not counting the ads between them.

The author covers basics about installing Dyalog APL and what neural nets are. He offers a fairly detailed explanation of the underpinnings of how neurons in a NN work at an easy pace to follow. He also lets you know at the beginning of each lesson what he is going to cover and if you can skip it if you are already familiar with the APL or the neural net explanation.