NYCJUG/2021-09-14

Beginner's Regatta: Looking at Memory Use

Recently I was reminded of something I have known a long time: it's often better to allocate a large array, then fill it up, than it is to build the large array in pieces.

Sometimes this difference shows up as a performance improvement but recently the same technique solved a memory usage problem. I was running many simulations resulting in a large array of simulation results. Since simulations take arbitrarily long depending on how many you want to run, I was manually parallelizing the computations by spinning off multiple ones at a time. However, I found that a given J session was using so much memory after one set of simulations that I could not queue up multiple runs in succession in the same session.

For example, consider our initial memory allocation.

   7!:3 ''    NB. Memory blocks free
  64  101
 128 6268
 256 1235
 512  102
1024   69

The foreign "7!:3" shows us a J session's memory allocation, in terms of blocks allocated but now free, in a two-column table with sizes of memory blocks and the number of blocks of that size in each row. The result above shows that, in a new J session, we have over a thousand blocks of size 256 bytes. These are free for further use by J but are not available to other processes. Multiplying the columns and summing the results gives us these summary figures:

   (+/;]) */"1 ] 7!:3''
+-------+------------------------------+
|1248192|6464 802432 315904 51712 71680|
+-------+------------------------------+

This shows that we have about 1.2 megabytes allocated but freed. However, after we run a million simulations for six players, we see that the allocation increases substantially.

  
   6!:2 'sim6_00=: 1 play1ToEnd&>1e6$6'
263.746

   7!:3''
  64     101
 128     891
 256 1000082
 512     102
1024      70
   (+/;]) */"1 ] 7!:3''
+---------+---------------------------------+
|256265024|6464 114176 256020992 51712 71680|
+---------+---------------------------------+

Now, instead of about 1,000 256-byte blocks, we are allocating over a million of them. Our total usage has ballooned to 256 megabytes from about one megabyte. This can also be seen from the system level here:

The number shown by the system does not agree with what we see as freed blocks but it also includes memory allocated and in use.

Once I noticed this problem, I used the old technique of pre-allocating a large array to see if this helped. Did it ever!

   7!:3''     NB. New session about the same as the one above.
  64  101
 128 6268
 256 1234
 512  102
1024   70

NB. Now do the same thing but pre-allocate the result array before running;
NB. runs in explicit loop.
   simLoop=: 3 : 0
   'nl np'=. y   NB. Number of loops, number of players.
   cell=. (np,&.>4 5)$&.><' '
   cell=. (<5$' '),~cell,(np,&.>2;a:)$&.><2-2
   rr=. (nl,5)$cell
   for_ix. i.nl  do. rr=. (1 play1ToEnd np) ix}rr end.
)

   6!:2 'sim6_01=: simLoop 1e6 6'
264.4

   7!:3 ''
  64  96
 128 866
 256 146
 512 101
1024  69
   (+/;]) */"1 ] 7!:3''
+------+-----------------------------+
|276352|6144 110976 37376 51200 70656|
+------+-----------------------------+

The running time is basically the same but the resulting memory usage is significantly different. In fact, we have less freed memory allocated after running the simulation this way than we did to start with. The total memory used between these two techniques differs by about a factor of a thousand.

   256265024%276352
927.314

Show-and-tell

Conditional Probabilities in Poker

Following a recent poker loss where my aces-full full house with kings lost to four-of-a-kind, I wondered how commonly this happens. Fortunately, I happen to have results from a few hundred million simulations, so I can come up with a reasonable estimate of how badly I was beaten.

I have simulation results, in groups of 10 million, stored in J variables written to file. Retrieving one of these variables, for 8-player simulations in this case, we can look at the simulation element which shows the final, best hand for each player represented as a two-column array like this:

   >(<0 2){sim80
5 1213
1 2194
2   55
4    0
2  829
0 1273
1  870
1 1957

The first column is a number denoting the type of hand, as named here:

   HandTypes
+-+--------------+
|#|Name          |
+-+--------------+
|0|nothing       |
+-+--------------+
|1|pair          |
+-+--------------+
|2|2-pair        |
+-+--------------+
|3|3-of-a-kind   |
+-+--------------+
|4|straight      |
+-+--------------+
|5|flush         |
+-+--------------+
|6|full house    |
+-+--------------+
|7|4-of-a-kind   |
+-+--------------+
|8|straight flush|
+-+--------------+

The 2nd column is a ranking within the type of hand. So, in the example above "4 0" is the lowest possible straight (five-high):

   showCards {.RH5#~HANDRATED *./ . = 4 0
+--+--+--+--+--+
|A♦|5♣|4♣|3♣|2♣|
+--+--+--+--+--+

In this case, we are interested in a specific full house which is hand type 6 156:

   showCards {.RH5#~HANDRATED *./ . = 6 156
+--+--+--+--+--+
|A♣|A♦|A♥|K♣|K♦|
+--+--+--+--+--+

There are 24 equally-ranked variants of this hand because of the different possible combinations of suits.

   +/HANDRATED *./ . = 6 156
24

So, we first find all the instances where we have the full house in question:

   6!:2 'wh=. (2{"1 sim80) ([: I. [: +./&> [ *./ .=&.> [: < ]) 6 156'
34.8073

One interesting thing I've noticed when dealing with these fairly large arrays is that there is some unaccounted for time when we are measuring how long various operations take. In the case above, if we compare elapsed time to the result of 6!:2, there is a noticeable difference.

   qts'' [ smoutput 6!:2 'wh=. (2{"1 sim80) ([: I. [: +./&> [ *./ .=&.> [: < ]) 6 156' [ smoutput qts''
      2021 9 13 23 14 47.089
36.3103
2021 9 13 23 16 30.383
   ($sim80);7!:5 <'sim80'   NB. How large is sim80?
+----------+---------+
|10000000 5|8.21687e9|
+----------+---------+
   load 'dt'
   2021 9 13 23 16 30.383 TSDiff 2021 9 13 23 14 47.089
+-------+----------------+
|103.294|0 0 0 0 1 43.294|
+-------+----------------+

We see that figuring out and assigning "wh" takes about 36 seconds but we do not get our J prompt back for 103 seconds.

In any case, the final calculation is quick.

   6!:2 'aa=.(7 (e.) 0 { "1 >)&> wh{2{"1 sim80' [ smoutput qts''
0.0655278
   +/aa
3176

So we have 3,176 instances out of 10 million where this full house is beaten by 4-of-a-kind, about a 0.03% chance.

Advanced topics

Using "Agenda" for Conditional Conversion

Continuing to flog poker simulations, here we see a handy use of "agenda" (@.) and a user-defined adverb.

These simulation results were originally stored as a matrix general array with one row per simulation but converting some of these numeric matrixes to character gave tremendous speed-up in the simulations. Each row has a determinate set of cells. For N players, the elements are now:

 0: Nx4 character representation of each player's hole cards.
 1: Nx5 character representation of each player's best high hand.
 2: Nx2 table of hand types where 0{"1 is the major type (index into "HandType"), 1{"1 is sub-type where higher is higher.
 3: N-vector of integers ranking hands from best (0) to worst (N-1); there may be ties for best in which case there is more than one 0 ranking.
 4: 5 element character vector of common cards dealt in game.

Notice that we have a mix of character and numeric arrays in the current form of simulations. However, the older simulations are all numeric. Since the character-based version is not only faster to use but also takes less disk space to store, I wanted to convert my earlier simulations to the character form. The complication arises because we don't want to convert all the cells in a row.

Originally, I did this simply in this ad hoc fashion:

charifySim=: 3 : '(<a.{~>0{y) 0}(<a.{~>1{y) 1}(<a.{~>4{y) 4}y'"1

But it seemed too specific to the problem and not very elegant or flexible.

I also played around with an adverb like this:

   toCu=: 1 : 'toChar&.>`] @. (u)"1 y'
   isChar toCu 2 3$'hi';(65+i.2 3);'ho';<97+i. 3 3
+---+---+---+
|hi |ABC|ho |
|   |DEF|   |
+---+---+---+
|abc|hi |ABC|
|def|   |DEF|
|ghi|   |   |
+---+---+---+

where

   isChar=: (' ' = [: ({.) 1 ({.) 0 $ ,)&>"1

Howvever, we do not want to convert all numbers to characters as a couple of the cells are less amenable to this. Basically, we only want to convert the cells representing cards, as represented by numbers from 0 to 51. This complication was solved by a nice use of agenda and a double use of the rank conjunction, along with a Boolean specifying which cells in a row should be converted.

Our main verb either converts a cell to character or leaves it the same:

   toC=: 4 : '((]&.>)`((<a.){~&.>]) @. x) y'

The left argument is 1 for each cell to be converted or 0 for a cell to be left alone. The cells that will be left alone are the ones which are not character in the current version of the simulation results.

   isChar=: (' ' = [: ({.) 1 ({.) 0 $ ,)&>"1
   isChar {.sim80
1 1 0 0 1

So, we apply or conditional converter to each row of an older simulation matrix like this:

simToC=: 1 1 0 0 1&toC"0"1 ]

The bound left argument is the Boolean we saw above to avoid converting the two cells in a row corresponding to the zeros.

The double use of rank is something I have not used before so a little explanation may help. The first rank conjunction evaluated, the far-right one, tells us we will work primarily on rows of the table of simulations. However, within each row, we want to apply "toC" to each cell congruent with the Boolean left argument.

Discussion of Recent Proposals

We have had a couple of proposals on the J forums recently: the idea of "accessor" objects and a proposal to move the J development eco-system more completely on to Github.

Accessors

The idea of accessors was explained in an email from Michal Wallace <michal.wallace@gmail.com>, sent on September 14th, 2021 at 12:06 PM to the Jprogramming forum.

I have been experimenting with a tacit style where the y "argument" represents the state of some big complicated object or system... for example, the state of a parser or a virtual machine.
I've found that "accessor" verbs are really handy, and allow you to decouple your tacit code from the actual implementation of the object.
An accessor 'gsn' (for "get/set 'n'") is a ambivalent verb that works like this:

   gsn y  -> ignore y and return S
   x gsn y -> ignore y. sets n to x and return S
   S here indicates whatever the "state of the whole system" is...

'gsn' here is a pronoun (proverb?) -- If you had three state variables n0,n1,n2, you would make three such verbs, gsn0, gsn1, gsn2.
This is quite flexible. If you want to store the state of your system in an object or namespace, you can implement gsn like so:

   gsn0 =: {{  n0__state  }} :: {{ n0__state =: x }}

Then the y argument you pass is just or whatever you want, since it's ignored.
Or you can choose to implement the state as some physical array structure, which gets accessed and modified in place:

   gsn1 =: {{  1 { y }}  :: {{ x 1 } y }}

It would be nice if these accessors could be created automatically.

Moving J Issues to Github

Michal Wallace also proposed this, which seems less controversial than the previous suggestion.

I suggest we take this:
https://code.jsoftware.com/wiki/System/Interpreter/Requests
... and migrate everything here:
https://github.com/jsoftware/jsource/issues

  Pros:
  - permalink to issues (currently the link breaks when moved to "done")
  - issues can be assigned, searched, etc
  - can be associated with pull requests / code reviews on github
  - can be marked as "easy" or "help wanted" to encourage outsiders
  - commit messages can auto-link to the issue
  - free and fairly standard for github accounts
  - github accounts don't require manual setup (vs j wiki accounts)

  Cons:
  - requests would no longer show up in j wiki search
  - requestors would need a github account
  - might be other options that are even better
  - inertia?

Thoughts?
(I am happy to help with / do all of the migration work, if people aren't opposed to the idea...)