NYCJUG/2021-10-12

Beginner's regatta

Look here for a 59-second video about nub and sieve.

Nub of an Array

Reducing an array to its unique elements has traditionally been called nub in APL. In J, we have a simple way to do this with monadic ~..

For example,

      ~. 1 1 2 2 3 2 1 3
1 2 3

The implementation of nub is quite efficient for simple arrays. Here we compare timings on million-element vectors of integers and floating point numbers.

   rn=. 1e6?@$100                 NB. 1e6 random integers from 0 to 99.
   (100) 6!:2 '~.rn'
0.00310083
   10{.rnflt=. rn{100?@$0         NB. 1e6 random floating-point numbers between 0 and 1.
0.411754 0.415987 0.0316415 0.479662 0.264928 0.952819 0.0951409 0.295497 0.84377 0.363628
   #rnflt
1000000
   (100) 6!:2 '~.rnflt'
0.0102748

The (100) preceding the timing foreign 6!:2 runs the string on the right 100 times and returns the average timing; this gives us more precise timings. We see that it takes longer to nub the floating-point vector because it takes up more memory than the integer one does.

Nub Sieve

Sometimes we are more concerned with finding where duplicates are located than what the unique elements are. In this case, we may want to use nub sieve which is ~:. This returns a Boolean with a one indicating the first unique item and a zero indicating an item preceded by one or more duplicates (counting from left to right), like this:

   ~: 1 1 2 2 3 2 1 3
1 0 1 0 1 0 0 0

So, the location of the duplicates is simply the not of this:

   -.~: 1 1 2 2 3 2 1 3
0 1 0 1 0 1 1 1

Show-and-tell

Large Arrays of Poker Simulation Results

As you are all probably tired of hearing if you have paid attention to any of the last five or six meetings, I have been working on a simulation of the poker game called Omaha.

What I have mostly done so far is to simulate playing many randomly-generated hands for varying numbers of players, from two to eleven, and ranking the best resulting hands for each player.

Each player starts with four hole cards which are dealt face down. Then five cards are dealt face up in groups with betting between each group. Each player may combine two of their hole cards with three of the five common cards to make the best hand. My simulation is too simple at this point to account for betting - which is the ultimate goal - and we ignore the possibility of low hands which sometimes split the pot.

The result of each simulation is a 5-element vector with the four hole cards for each player, the best final hand for each player, a numerical designation of the type of this best hand (nothing, pair...straight flush), and a ranking of each hand from best to worst with zero as the best. There may be ties, so there may be more than one zero in the ranking.

For reasons of performance and saving space, the cards are represented as the first 52 characters of a.. Simulation results look something like this one million by five table representing one million simulations:

   $sim6_01
1000000 5
   {.sim6_01                NB. Results of first simulation
+----+-------+-------+-----------+-------+
|  '*| �┘ 3  |1 2857|5 1 4 3 2 0| ┘�3�|
|┬"-│|�┘│"3  |3  856|           |      |
|/�,(|��┘,3 |2  792|          |      |
|�0 1| �013  |2  835|          |      |
|└�%�|�└┘%3 |2  857|          |      |
|┼�┴ |  ┴┘3  |4   10|           |      |
+----+-------+-------+-----------+------+

Converting the characters representing the hole cards of this 6-player simulation to numbers, then to card representations, we see this.

   a. i. >0{{.sim6_01
32 10 39 42
17 34 45 25
47  5 44 40
 6 48  0 49
22  4 37  7
20 28 23  8
   showCards a. i. >0{{.sim6_01
+--+---+---+--+
|Q♣|8♥ |5♠ |2♠|
+--+---+---+--+
|A♦|10♥|8♠ |6♦|
+--+---+---+--+
|7♣|7♠ |10♠|3♠|
+--+---+---+--+
|Q♠|J♠ |8♣ |2♣|
+--+---+---+--+
|K♥|J♦ |9♣ |6♣|
+--+---+---+--+
|Q♦|10♣|9♦ |4♥|
+--+---+---+--+

Similarly, we see the community cards and the best final hand for each player.

   showCards a. i. >4{{.sim6_01  NB. Community cards
+--+--+--+--+--+
|A♠|A♣|K♦|J♣|3♦|
+--+--+--+--+--+
   showCards a. i. >1{{.sim6_01  NB. Final best hand for each player
+--+--+--+--+---+
|A♣|A♠|K♦|Q♣|8♥ |
+--+--+--+--+---+
|A♣|A♦|A♠|K♦|10♥|
+--+--+--+--+---+
|A♣|A♠|7♣|7♠|K♦ |
+--+--+--+--+---+
|A♣|A♠|J♣|J♠|Q♠ |
+--+--+--+--+---+
|A♣|A♠|K♦|K♥|J♦ |
+--+--+--+--+---+
|A♠|K♦|Q♦|J♣|10♣|
+--+--+--+--+---+

From the hand types and hand rankings shown here, we see that the winner of this simulation is the last player, who has a straight. The straight is indicated by the 4 in the first column.

   2 3{{.sim6_01        NB. Type of each hand, rank from best (0) to worst
+------+-----------+
|1 2857|5 1 4 3 2 0|
|3  856|           |
|2  792|           |
|2  835|           |
|2  857|           |
|4   10|           |
+------+-----------+

The first column of the 2-column ranking is the major hand type, where higher is better; the second column is the ranking within the hand type. This second column indicates that this straight is the highest possible one. Similarly, the second column distinguishes the three two-pairs from worst - 792 for aces and sevens - to best - 857 for aces and kings.

Putting Simulations into a Jd Database

The code I have so far has allowed me to run 300 million simulations for two to eleven players which gives me quite a lot of data: slightly over a terabyte.

As I mentioned last meeting, this much data practically requires a database. I said then that I was thinking of putting this into a q database but Bob T. suggested that J's Jd might be more straightforward and I'm glad I took his suggestion as my database layout took a of couple of tries to get it right and it was easier to figure it out in a language I know well.

Initial Design Considerations

The first design problem has to do with the variable nature of the size of the data for each simulation type. As can be seen from the array above where we can see that each array in a row of the simulation table, except the last, has a tally equal to the number of players. These arrays are parallel because the positions in each array except the last imply the player to which each array applies.

   #&>{.sim6_01
6 6 6 6 5

Pondering the potential inefficiency of dealing with variable-sized fields, the first design decision was to make separate sets of tables for each number of players. This increases the number of tables we have to deal with but there is no obvious reason why we would want to combine 2-player data with 11-player data except maybe at some aggregate level. Separating the tables by the number of players for each set greatly simplifies the design at little obvious cost.

Having the tally of the arrays imply the player position builds in some efficiency. I was disappointed to find that Jd limits our data items to vectors and scalars, so I could not simply drop each data item in a simulation vector into its own table.

Here was my first attempt at defining the Jd tables for the six-player case:

   jdadminnew 'D:/amisc/work/Poker/OmahaSim'
   jd 'createtable /replace sim6 simnum int, player int, holecard byte 4, highhand byte 5, htMajor int, htMinor int, hr int'
   jd 'createtable /replace sim6community simnum int, cc byte 5'
   jd 'info schema'
+-------------+--------+----+-----+
|table        |column  |type|shape|
+-------------+--------+----+-----+
|sim6         |simnum  |int |_    |
|sim6         |player  |int |_    |
|sim6         |holecard|byte|4    |
|sim6         |highhand|byte|5    |
|sim6         |htMajor |int |_    |
|sim6         |htMinor |int |_    |
|sim6         |hr      |int |_    |
|sim6community|simnum  |int |_    |
|sim6community|cc      |byte|5    |
+-------------+--------+----+-----+

Since I have to break the six-item arrays into six separate records, I have to tag each one with a player number, 0-5 in this case. Also, whereas the simulation number was also implied by the row number of the table of simulation results in my original data structure, I have to specify each simulation number as well.

An Initial Pain Point

Notice how the first Jd line above, the jadminnew command, defines the database as a top-level folder under which the tables are built, each with its own sub-folder. My first try at populating one of my tables was much slower than it should have been because I foolishly wrote something to write each simulation a row-at-a-time. However, when I thought to concatenate the cells in each column together instead, my initial try wiped out my earlier work because the jadminnew command will wipe out any existing database without warning. As much as I don't like the "Are you sure?" hesitations frequently given for mundane tasks, in this case I would have appreciated the warning as this was a costly mistake. It was also one I made again later on in this work.

The take-home lesson here is beware of jdadminnew and jdadminx. Either one will wipe out an existing database to create the new one you specify without so much as a sorry!

Problems with Initial Design

The first thing to do with your newly-loaded data is, of course, to check it.

The original simulation files were created in parallel but in a crude, manual fashion. Essentially, I had about six separate J sessions running simultaneously in which I would kick off a set of 10 million simulations and save the results as variables to file. However, I occasionally forgot to randomize the random number seed between sessions, so had some duplicate runs.

We will look at an example of this manual parallelization later.

So, my first task is to find and remove duplicates, then run new simulations to re-populate each set so each has 300 million distinct simulations.

The way I set the seed is rather crude but probably good enough. Doing a better job than this is the subject of a future show-and-tell. This is how I did it for one of the simpler RNGs:

   9!:1](|-/7!:1'')+<.13#.|.qts''

However, my initial design gets in the way of finding duplicates because I have to re-assemble the pieces of the simulation I broke apart and tagged with simulation and player numbers. This leads to some code like this:

   6!:2 'hcs=. ,>(<1 0){jd ''reads holecard from sim2'''
3.19848
   6!:2 'simnum=. ,>(<1 0){jd ''reads simnum from sim2'''
8.95898
   6!:2 'ptn=. ,_2~:\simnum'
120.357
   10{.ptn
1 0 1 0 1 0 1 0 1 0

This turned into a dead-end because of the extreme slowness of code like this:

   6!:2 'numUnq=. #~.ptn<;.1 hcs'
166.947
   numUnq
269822842
   7!:1''
15011542208 66352727200
   6!:2 'whDupes=. -.whUnq ptn<;.1 hcs'

This last expression ran for days before I pulled the plug on it.

Embarassingly enough, given today's Beginner's Regatta, a major problem was my hand-written whUnq code which in fact could be replaced by nub sieve ~:. This may just be a very old piece of code or it just took me a while to discover the J primitive.

A Better Database Design

Considering these problems, I eventually realized there was a simple way to avoid breaking up data for a single simulation. Since I already imply the number of players by having a separate set of tables for each number, why not use that in conjunction with the variable-length fields available to character columns in Jd? I had to come up with a character represention for the two-column hand type table which I was able to fit into three bytes for each row but, other than this, the main trick was to use raveled arrays. I have not yet written the decoders necessary to convert them back into more usable forms but this should be trivial.

One advantage of this new design is that it reduced the size of the database by almost 60% albeit at some future performance cost but I anticipate this to be small compared to other parts of the simulation. It's still a little over 400 GB but this size can be arbitrarily large depending on how many simulations I choose to run in the future.

To accomplish this, I parameterized the table definition to create different length fields for different numbers of players, by creating a verb like this:

createFlatTable=: 3 : 0
   np=. y
   jd 'createtable /replace sim',(":np),' simnum int, holecard byte ',(":np*4),', highhand byte ',(":np*5),', handtype byte ',(":np*3),', fhrank byte ',":np
   jd 'createtable /replace sim',(":np),'community simnum int,cc byte 5'
)

Simply calling this with the number of players for each set of tables made short work of setting up the tables in a new folder I called OmahaFlat instead of my previous "OmahaSim".

The new schema looks like this:

   jd 'info schema'
+--------------+--------+------+-----+
|table         |column  |type  |shape|
+--------------+--------+------+-----+
|sim11         |simnum  |int   | _   |
|sim11	       |holecard|byte  |44   |
|sim11	       |highhand|byte  |55   |
|sim11	       |handtype|byte  |33   |
|sim11	       |fhrank  |byte  |11   |
|sim11community|simnum  |int   | _   |
|sim11community|cc      |byte  | 5   |
|sim10         |simnum  |int   | _   |
...
|sim3community |cc      |byte  | 5   |
|sim2          |simnum  |int   | _   |
|sim2 	       |holecard|byte  | 8   |
|sim2 	       |highhand|byte  |10   |
|sim2 	       |handtype|byte  | 6   |
|sim2 	       |fhrank  |byte  | 2   |
|sim2community |simnum  |int   | _   |
|sim2community |cc      |byte  | 5   |
+--------------+--------+------+-----+

Some Preliminary Timings and Memory Usage

It took quite a while to load the data into my first set of tables because I foolishly wrote a looping, single-record-at-a-time loader. Once I figured out that loops are bad, I got what I thought was pretty good performance for populating the tables by loading the concatenated rows all at once.

My simulations were run 10 million at a time, so I have a number of tables, 30 per each number of players simulation, in variables stored to file using my trusty old WS namespace.

Here we see timings for loading 10 million 6-player simulation:

      6!:2 '(9;''sim9'') addToTables sim90' [ smoutput 6!:2 '''D:/amisc/work/Poker/'' unfileVar_WS_ ''sim90'''
47.2141
189.095

Reading the variable sim90 from file took about 47 seconds and loading it into a pair of database tables took about 189 seconds. We can extrapolate from this to estimate that loading all 30 variables should take only about 7000 seconds which does not seem too bad for 300 million 9-player simulations.

However, there is time unaccounted for by this simple extrapolation. Since each of these simulation variables is several gigabytes in size, it is prudent to erase each one after it's been loaded so as not to exceed available RAM. We see here that erasing a 9-player simulation variable registers as taking less than a second but the elapsed time is closer to 110 seconds:

   qts'' [ smoutput 6!:2 '4!:55 <''sim9''' [ smoutput qts''
2021 10 11 22 1 19.425
0.892342
2021 10 11 22 3 9.94
   2021 10 11 22 3 9.94 TSDiff 2021 10 11 22 1 19.425
+-------+----------------+
|110.515|0 0 0 0 1 50.515|
+-------+----------------+

This raises our estimated total load time for 30 9-player simulations to over 10,000 seconds; the actual elapsed time for this was 9,468 seconds, or about 2 1/2 hours. This still does not seem bad for loading almost 173 GB of data.

However, we did max out available RAM:

Using the Data

Since the load is a one-time cost, this time is less important than operations we intend to do repeatedly. However, a recurring issue for the loads and much of the subsequent usage is how much memory gets allocated but not subsequently unallocated as can be seen in the graph above.

I'm using these examples to look up what I want to do.

   6!:2 'all=. (>(<1 0){jd ''reads cc from sim4community''),.>(<1 0){jd ''reads holecard from sim4'''
58.3138
   6!:2 'whDups4=. -.~:all'
11.0129
   6!:2 'nu=. #~.all'
13.4461
   nu
269824044
   +/whDups4 
30175956
   3e8-nu
30175956     NB. Confirm number

I will rely on the fact that the two tables were initially built at the same time but would have to work with the simulation numbers for the more general case:

   6!:2 'simnum=. >(<1 0){jd ''reads simnum from sim4'''
21.7932
   6!:2 'simnumcc=. >(<1 0){jd ''reads simnum from sim4community'''
21.8242
   simnum-:simnumcc
1
   'simnum simnumcc'=. ,&.>simnum;simnumcc
   $simnum
300000000
   6!:2 'jd ''delete sim4'';whDups4#simnum'
90.2998
   6!:2 'jd ''delete sim4community'';whDups4#simnumcc'
2.8698

Here's the size of the sim4 tables after we do this:

   $&.>jd 'reads from sim4'
+-----------+------------+------------+------------+-----------+
|6          |8           |8           |8           |6          |
+-----------+------------+------------+------------+-----------+
|269824044 1|269824044 16|269824044 20|269824044 12|269824044 4|
+-----------+------------+------------+------------+-----------+

Now I've allocated much of my available RAM:

We can verify from within J that we have about 51 GB allocated:

   7!:0''
51518119328
   10^.7!:0''
10.712

So I have to exit my J session and restart it to free up the RAM.

   q''

Process shell<2> finished

If I wasn't counting on the simnums being sequential and aligned, I would have to do the deletion in a more general fashion like this:

   6!:2 'jd ''delete sim4community'';(simnumcc e. whDups4#simnum)#simnumcc'

Running Simulations in Parallel

We need to run some new simulations to refill the duplicates we removed, so we may as well run several in parallel using the crude method to which I earlier alluded. Say we decide to run four of these, each doing 30175956%4 or 7543989 simulations each.

We start up four new J console sessions under emacs, randomize the RNG seed, and kick them off like this:

You can see that we have an emacs window with 4 panes, each one running a separate session as shown by the distinct names at the bottom of each pane: sim0pll211012.log through sim3Pll211012.log.

Each one will assign a different variable - sim4Refill0 to sim4Refill3 - and save it to file.

The four J sessions are each using about one core's worth of CPU with an increasing memory allocation as the simulation table grows.

The J session with minimal resource usage at the bottom of the jconsole list is a separate, idle session.

When they all finish, after about 1500 seconds, we see how much of the memory is unallocated.

Each of the parallel sessions drops down to about 1 GB allocated from a high of close to 7 GB.

Refilling the Removed Duplicates

Now that we have the refill files, it's a simple matter to load them into the database and check that we have no duplicate records.

   6!:2'(4;''D:/amisc/work/Poker'') populateFromVars ''sim4Refill0'';''sim4Refill1'';''sim4Refill2'';''sim4Refill3'''
231.025
   $&.>jd 'reads from sim4community'
+-----------+-----------+
|6          |2          |
+-----------+-----------+
|300000000 1|300000000 5|
+-----------+-----------+
   6!:2 'all=. (>(<1 0){jd ''reads cc from sim4community''),.>(<1 0){jd ''reads holecard from sim4'''
41.9041
   6!:2 'whDups4=. -.~:all'
11.4311
   6!:2 'nu=. #~.all'
13.4841
   nu  
300000000

Advanced topics

Simulation Versus Calculation

In a recent Omaha game, one of my fellow players opined that we were seeing unusually many pairs in the flop; the flop is the name for the first three community cards which are dealt all at once.

This raises the question of how many pairs one would expect to see in a flop. Without loss of generality, we could phrase this as "How many pairs should we expect in three cards taken from a deck of 52?"

This seems simple enough to calculate. The first of three cards can be any one of the 52, the second can be one of the remaining three that matches the first one, and the third any one of the 48 that do not match the first two. We divide this quantity by the number of combinations of three cards taken from 52, or, in J:

   (*/52 3 48)%3!52
0.338824

This seems straightforward: the answer is about 34% of the time. Since we have a large sample of such deals in our database, we can check this number by looking at the first three community cards from each simulation.

We need to convert the character representations to numbers, then look at those numbers modulus 13 to get the face values of the cards. This gives us the following expression which we time, as usual.

   6!:2 'npairs=. +/(2 e. #/.~)"1 ] 13|a. i. 3{."1 >(<1 0){jd ''reads cc from sim4community'''
64.9463
   npairs
50825400
   npairs%3e8
0.169418

How odd!

To four digits of precision, this has a value half of what I calculated:

   2*0.169418
0.338836

Can anyone think of a way to explain this difference?

My feeling is that the simulation is more trustworthy, so this is the number I gave to my fellow player, but that also raises the question of why would I think that is the better number?

The Peculiarities of Porting J

This is the blog of a developer who has previously ported GNU APL and Kona (an open-source version of K) to OpenBSD. He figured he would go for the hat trick and port J as well.

He had previously attempted to do this and had failed but his experience this time was a success, either because he's gotten better at porting to OpenBSD or J has gotten "lot more platform-independent and platform-agnostic" or both.

His first hurdle was that J lacks a build system but what "...J has today is a combination of shell scripts that control GNU Makefiles. OK, that's not so bad. They're even POSIX shell scripts, no hidden GNU Bash dependencies. Thank you J team!"

His next problem was that the J build_all.sh script did not recognize OpenBSD. He was of the opinion that it should have tried and failed rather than giving up immediately. He solved this by finding the "magic location ... in the file jsrc/js.h [where] there is an #ifdef __linux__ line that [he] changed to #ifdef __OpenBSD__" and was surprised to find that it ran to completion without error!

He had some complaints about the choices of compile flags and J's predilection for Intel AVX which prevents it from compiling on AMD chips. There were some other minor hiccups but overall he was pleased with his experience.

Seven Most Basic Programming Languages

When I saw this, I thought "APL better be in there or I'm not taking it seriously." TL;DR It is.

The seven programming ur-languages

Learning and Teaching J

The APL Revolution

This essay by Alan Kay - What made APL programming so revolutionary? - is worth a look. One important point he makes is that APL had the advantage of being worked out by thinking about the language itself without thought of implementation.

He illustrates the advantage of a succinct notation with an example of the history of the evolution of Maxwell's equations for electro-magnetic fields, culminating in these expressions of them.

He is generally very positive about APL's advantages but ends with a criticism of J and K as being too much like APL. He has hopes someone could adapt these ideas into some new and great language that advances from what we already have.

Personally, I would be happy if the other languages could catch up a bit before we think about moving forward, not to say there aren't things the APL community can learn from the mainstream.

New Efforts

There is a new effort to catalog useful APL idioms. Some time ago, the Finn APL Idiom Library was the apotheosis of something like this. There is an online enhancement of this, with almost 900 entries, which tries to modernize that decades-old tome.

J has so many idioms on the J site that it begs the question of how to find the one you need. Does anyone have ideas for better indexing these? I'm thinking along the lines of RosettaCode.

This modern version is on the APL Wiki which is also running a contest to pick a new APL logo.

Here are a few examples plus a few examples that are not A Programming Language logos. Many, many more can be found here.