NYCJUG/2021-04-13

From J Wiki
Jump to navigation Jump to search

Meeting Agenda for NYCJUG 20210413

Show-and-tell

Video Time Codes

Dan will tell us about his code - at https://github.com/danhirschi/types_timecode - for dealing with video time codes.

These are codes embedded in a video sequence telling how far along we are in the video. Complications arise because the frame rate of a video is not quite a whole number of frames per second.

As Dan explains:

Video frames are numbered in the format hours:minutes:seconde:frames (01:00:23:04)

There are several frame rates possible.

Broadcast in the Americas uses a rate of about 29.97 frames per second (FPS), and with HDTV, an option to use 59.94 FPS.*

Beginning Poker

The card game poker has nine types of (five card) hands if there are no wild cards. These hand types are shown here in descending order from least to best:

Hand Type # Type # Sub-type % Type Cum. % Rev. Cum. % Type/Hands
nothing 1302540 1277 50.118 50.12 100 2
pair 1098240 2860 42.257 92.37 49.88 2.4
2-pair 123552 858 4.754 97.13 7.62 21
3-of-a-kind 54912 858 2.113 99.24 2.87 47.3
straight 10200 10 0.392 99.63 0.759 254.8
flush 5108 1277 0.196 99.83 0.366 508.8
full house 3744 156 0.144 99.97 0.170 694.2
4-of-a-kind 624 156 0.024 99.998 0.026 4165
straight flush 40 10 0.0015 100 0.00154 64974
Total 2598960 7462 100 - - -

Poker is an interesting game to simulate, particularly as it is non-deterministic. However, before we can begin simulating poker, we have to be able to determine the best hand one can make with an arbitrary set of five cards.

If number the cards in a deck from 0 to 51, we want to be able to assign a rank (value) and a suit to each card. An obvious way to do this would be:

suitRank=: |:@((4 13)&#:)

This treats each number as a mixed base (4,13) value with the suit as the first row and the rank as the second one. So, the first five cards would form this hand:

   suitRank i.5
0 0 0 0 0
0 1 2 3 4

We need to establish some naming conventions to turn these numbers into the way we normally name a card, e.g. "the jack of diamonds". Since the ace is generally the highest card and two the lowest, we will translate these ranks into indexes of a global variable "RANK". The suit numbers will index a global "SUITS". These are defined thusly:

RANK=: ,&.>(]&.>'23456789'),'10';'J';'Q';'K';'A'
SUIT=: 'CDHS'

This would render the hand above like this:

   |.(<"1 suitRank i.5){&>(<"0 SUIT);<RANK
+-+-+-+-+-+
|2|3|4|5|6|
+-+-+-+-+-+
|C|C|C|C|C|
+-+-+-+-+-+

That is two through six, all clubs.

The Size of the Poker Universe

A poker hand consists of five cards chosen at random from a deck of 52 cards. This means there are 5!52, or 2,598,960 possible hands. With modern computers and J, this means it is simple to generate all the possible poker hands like this:

   h5=: 5 AllCombos 52
   h5=: h5{"1~/:"(1) 13|h5      NB. Order each hand by card rank.

We arbitrarily order the cards in a hand by their ranks to reduce the number of permutations we may have to consider.

The AllCombos verb is lifted from the J wiki:

AllCombos=: 4 : 0
NB.* AllCombos: make mat of all possible distinct y things taken x at a time.
   comb=. ,.basis=. i.y
   while. 0<x=. <:x do.
       maxs=. >./"1 comb
       newcol=. maxs rmlt&.><basis
       lens=. ;#&.>newcol
       comb=. (lens#comb),.;newcol
   end.
   comb
)   

If we have a set of functions to determine the type of a 5-card hand, we can apply them to each item of the complete list to rank every possible hand. Having this array may help performance by allowing us to determine the type of an arbitrary set of five cards, ordered in some standard way, with a table look-up. Doing this allows us to amortize the cost of calculating a hand's type by doing all the work up front and saving the result.

The timings of each of these steps are as follows:

   6!:2 'h5=: 5 AllCombos 52'
0.28242
   6!:2 'h5=: h5{"1~/:"(1) 13|h5'      NB. Order each hand by card rank.
0.234936
   6!:2 'whall=. (isNothing"1 h5),.(isPair"1 h5),.(is2Pair"1 h5),.(is3ok"1 h5),.(isOnlyStraight"1 h5),.(isOnlyFlush"1 h5),.(isFullHouse"1 h5),.(is4ok"1 h5),.isStraightFlush"1 h5'
62.0951

If these are one-time costs, they would seem pretty modest. For running simulations, we are concerned about performance as the number of simulations may be arbitrarily large.

Example Hands

Here are examples of each of the type of poker hand.

Type Example Rank within Type
Nothing King-high King-high
Pair Fours Fours
2-Pair Nines and Threes Nines and Threes
3-of-a-kind Sixes Sixes
Straight Queen-high Queen-high
Flush Jack-high Jack-high
Full House Jacks Full Jacks Full
4-of-a-kind Quad Nines Quad Nines
Straight Flush Royal Flush Royal Flush

Rank within Type

Once we have determined the types of a set of hands, we are not necessarily done unless all hands are of a different type. Frequently, however, we have to determine the relative rankings of hands of the same type: their order from worst to best or vice versa. So, for instance if the two highest hands in the set are both 2-pairs, the one with the higher high pair wins; if the high pairs match, the higher low pair wins; if the low pairs match, the higher fifth card - the "kicker" - wins; if all of these match, the two hands tie. Suits are irrelevant to ordering hands in poker and there may be multi-way ties.

If hands tie, they split equally.

For a pair of nothings, the winner is determined by the highest card, or the next highest if these are the same, and so on down to the lowest ranking card in each hand.

For a pair, the higher pair wins; if these tie, then the highest unpaired card determines the winner, and so on.

For 2-pair, see above.

For 3-of-a-kind, the higher triplet wins. With no wild cards, this is as far as you have to go but, if there are wild cards, the non-triplet cards. from the highest in each hand down, are used as in the above examples.

For a straight, the one with the higher high card wins; if these are the same, they tie. There is a special lowest straight, A-2-3-4-5, called a "wheel" in which the high card is the five since the ace counts as a one.

For a flush, the one with the higher high card wins; if these are the same, compare the next-highest and so on down to the lowest. If all ranks match, the hands tie.

For a full house, the higher triplet wins. Again, if there are wild cards, there is a possibility of a triplet match, so the higher pair determines the winner in this case.

For 4-of-a-kind, the higher set wins. Again, if there are wild cards so that a tie is possible, the kicker determines the winner.

A straight flush works just like a straight. "Royal Flush" is the name given to the highest straight flush. Since suits play no part in determination of hand value, there are four possible highest hands.

Strictly Ranking All Possible Hands

As can be inferred from the above rules, the determination of rankings within types is more complicated than determining types. Rather than explain this in detail, we simply show the code here.

rankAllHands=: 3 : 0
NB.* rankAllHands: rank all hands by type of hand and within type.
   'whall h5'=. y
   whall=. |:whall
   allranks=. (2,#h5)$_1
   for_hctr. i.#whall do.
       ht=. hctr{whall
       subrank=. rankWithinHandtype ht#h5
NB. Row 0 is hand type; 1 is rank within type
       allranks=. ((ht*hctr)+(-.ht)*0{allranks) 0}allranks
       allranks=. (ht}(1{allranks),:(ht exp subrank)) 1}allranks
   end.
NB. Fix wheels: zero-out subtype rank of straights and straight flushes
NB. that are wheels (A-5) so they rate lower than 6-highs.
   as=. 13*i.4                NB. All suits
   whs=. (0{allranks)e. 4 8   NB. Straights and straight-flushes
   whsa=. whs*.((0{"1 h5)e. as)*.(4{"1 h5)e.12+as NB. with a 2 and an Ace
   allranks=. ((-.whsa)*1{allranks) 1}allranks    NB. Make these subtype 0
NB. since this is lowest straight because the one case where Ace=1.
)

Note that we take h5 and whall as defined above as our inputs since these functions will rank all possible hands up front to potentially speed up subsequent processing.

rankWithinHandtype=: 3 : 0
   hrnk=. 1{1 0 2|:suitRank"1 y
NB. "hr" is ,2 col. mat: col. 0 is count (how many dupes), col. 1 is card;
NB. so hand is rated by value of which card has most duplicates, value of
NB. which card has 2nd most dupes, etc.
   hr=. (,@\:~@|:@(~.,:~((+/@|:)@=)))"(1) hrnk
   gv=. /:hr
   whdif=. 1,2(+./ . ~:)/\gv{hrnk
   subrank=. (/:gv){+/\whdif
)

In a game with wild cards, there is a higher hand than a straight flush - the 5-of-a-kind - but we will ignore the possibility of wild cards in this essay from now on.

Determining Hand Type

Again, we simply present the code.

isNothing=: 3 : 0
   if. 2~:#$y do. y=. suitRank y end.
   'suit rank'=. /:~&.><"1 y
NB. >1 suit,    and    no duplicates,
   rc=. (1~:#~.suit)*.(#rank)=#~.rank
NB. and >1 apart, including special case of low Ace.
   rc=. rc*.-.(*./ 1=2-~/\rank)+.*./ 1=2-~/\/:~(rank=12)}rank,:_1
)

We have "Any" and "Only" versions of the straight and flush handlers because of the possibility of a hand being both, i.e. a straight flush.

isAnyFlush=: 3 : 0
   if. 2~:#$y do. y=. suitRank y end.
   *./2=/\0{y
)

isOnlyFlush=: 3 : 0
   if. 2~:#$y do. y=. suitRank y end.
   (isAnyFlush y)*.-.isAnyStraight y       NB. Not straight flush
)

isAnyStraight=: 3 : 0
   if. 2~:#$y do. y=. suitRank y end.
   rc=. *./_1 =2-/\/:~rnk=. 1{y
   if. 12 e. rnk do.                       NB. Handle ambivalent ace
       rc=. rc +. *./ _1=2-/\/:~(rnk=12)}rnk,:_1
   end.
   rc
)

isOnlyStraight=: 3 : 0
   if. 2~:#$y do. y=. suitRank y end.
   (isAnyStraight y)*.-.isAnyFlush y       NB. Not straight flush
)

isPair=: 3 : 0
   if. 2~:#$y do. y=. suitRank y end.
NB.   1=+/2=/\/:~1{y
   1 1 1 2-:/:~;#&.>(1,2~:/\/:~rnk)<;.1 rnk=. 1{y
)

is2Pair=: 3 : 0
   if. 2~:#$y do. y=. suitRank y end.
NB.   2=#~.rnk#~0,~2=/\rnk=. /:~1{y
   1 2 2-:/:~;#&.>(1,2~:/\/:~rnk)<;.1 rnk=. 1{y
)

is3ok=: 3 : 0
   if. 2~:#$y do. y=. suitRank y end.
   1 1 3-:/:~;#&.>(1,2~:/\/:~rnk)<;.1 rnk=. 1{y
)

isFullHouse=: 3 : 0
   if. 2~:#$y do. y=. suitRank y end.
NB.   0 1 1 1-:/:~2=/\/:~1{y
   2 3-:/:~;#&.>(1,2~:/\/:~rnk)<;.1 rnk=. 1{y
)

is4ok=: 3 : 0
   if. 2~:#$y do. y=. suitRank y end.
   1 4-:/:~;#&.>(1,2~:/\/:~rnk)<;.1 rnk=. 1{y
)

isStraightFlush=: 3 : 0
   if. 2~:#$y do. y=. suitRank y end.
   (isAnyFlush y)*.isAnyStraight y         NB. Is straight and flush.
)

NB.* isnSF: is n-card straight flush?
isnSF=: ([: +./ [ (([ = [: # ]) *. [: isAnyStraight ])&> [: </./ [: suitRank ])"(0 1)
NB.EG 1 1 0 -: 5 isnSF 0 1 2 3 4 17,0 1 2 3 12 17,:0 1 2 3 6 17

NB. suitRank=: |:@((4 13)&#:)
suitRank=: 3 : 0
NB.* suitRank: convert nums: 2 rows: suit, rank or 2 rows->nums 0-51.
   if. 2=#$y do.
       13#.|:y
   else.
       |:@((4 13)&#:)y
   end.
)

Note that the evolved version of suitRank is set up to be its own inverse.

Learning and Teaching J

Podcast on Array Computation

Bob Therriault will tell us about his plans for a podcast on array languages and will solicit suggestions and discussion toward this end.

The Trouble with Arrays?

This recent note should give us something to discuss.

From: Emir U <emir@usgroupltd.uk>
Date: Mon, Apr 12, 2021 at 5:06 AM
Subject: [Jprogramming] Farewell for now!
To: programming@jsoftware.com <programming@jsoftware.com>

I've invested something around 50 hours in J, I tried to write half a dozen serious algorithms, and to bind some external libraries. I've concluded thereafter that J just doesn't fit my workload. Here are my main reasons for giving up on J for now in hope that it helps with future direction:

  • Package management and batteries. The addon eco-system is tiny and the packages in it are barebones at best. This is scary because it takes no time at all to find things you need that you can't implement yourself quickly. E.g. loading an Avro/Parquet/ORC file, run simulated annealing, fit a Bayesian model using hamiltonian multi carlo, write a linear programming optimisation, call Neo4J, and so on. I'm use to being able to expect that what I need will have a package for it already. Other languages get around this a bit by either using a common VM as more established language (e.g. Clojure) or by having very ergonomic FFI (e.g. Julia with Python). Further, I think there needs to be a formal system for the packing, versioning, deployment, and dependency management of code, else people will revert to literally copy pasting all dependencies into a directory to make sure their code works on another environment.
  • FFI. When the function in question takes a C struct or an otherwise complicated data structure then you can't easily call the function from J. It seems you have to write a bunch of C to convert the target interface into something that can be called more easily from J. Contrast this to Julia's Python FFI interface as an example, or Rcpp in R.
  • No non-array data structures: hash tables, trie, graphs, etc. This is a show stopper because so many problems are easier to reason about and to solve by picking a more suitable data structure. In the wiki, Dan Bron says: One of J's primary advantages is that it makes (array) programming easy.  If its notation is inapplicable to your problem (more or less), there is not much reason left to use it. This lead me to dismay.
  • No parallelism. This is a show stopper. I work on large data and hard problems which cannot be crunched quickly on a single processor. J seemingly has no multi-threading, multi-processing or GPU support. Parallelism for me is almost always more complicated than just splitting the data and running the same thing many times, so J offers no way to solve problems needing parallelism that can't be emulated at an OS level by running many copies of a script.

So its farewell for now, but I'll keep an eye on developments in hope for the future!

Emir