NYCJUG/2012-03-14

teaching by example, data analysis of VIX (volatility option), heat-map, poker simulation, advances in gtk graphics for J7, OpenJ, J and APL meetings, visual interface, general learning, general problem-solving

Location:: ThomasNet

Agenda

             Meeting Agenda for NYCJUG 20120314
             ----------------------------------
1. Beginner's regatta: we need interesting, practical, domain-oriented
J intros: see "Teach J by not Teaching J.pdf" and "Quick and Dirty Data
Analysis with J.pdf".

2. Show-and-tell: example of a graphic that should be easier to create:
see "Example of type of heat-map.pdf".

Some advances in a poker simulation: see "Defining Break Points for
Categorizing Hands.pdf" - discuss relations between different facets
of agression in poker.

3. News: advances in J7: see "Announcement of gtkwd and grid availability
under jconsole.pdf" and "Chess Board in APL and J.pdf".

OpenJ: news?  See "Google Building Online Chrome Application Shop.pdf", talk
about what is a good Android device to target.

APL Moot at the YHA Lee Valley Youth Hostel, Cheshunt, on Friday-Sunday,
27-29 April.

J Conference Conference - July 23/24 (Monday/Tuesday) 2012 - Toronto: both
days start with a continental breakfast at 9am and sessions run to 6pm.
The first day ends with an open bar reception and banquet.  Register early -
before April 1st - and save!  See "JSoftware Conference Speakers.pdf

The Dyalog 2012 APL programming contest for students is now open!  Prizes for
participants and referrers!

4. Learning, teaching and promoting J, et al.: see "Visual Interface for J.pdf"
and "visualJ4.pdf".

General learning and problem-solving: see "Probabilistic Graphic Models -
Stanford course by Daphne Koller.pdf", "Viewing the world through
mathematical lens.pdf" and "Getting past being stumped by a problem.pdf".

Beginner's regatta

Before the meeting began, our newest member, Scott, had some questions about J. He had looked at some of the introductory materials and was dis-satisfied with the level at which they started - he said they dropped you right into the thick weeds.

He also had a question about whether J was implemented using an LLVM (Low-Level Virtual Machine) framework. The goal of this effort is to provide "a modern, SSA-based compilation strategy capable of supporting both static and dynamic compilation of arbitrary programming languages" as stated on the main website. Scott described it as a layer between language implementations and different hardware and OS platforms. It sounds very relevant to the current dispersion of J onto different platforms, but J continues to suffer the arrows that plague true pioneers. I hastened to point out that it's not a compiled language and it's not parsable within a Backus-Naur scheme of grammar due to its dynamic nature. Perhaps we'll benefit indirectly from LLVM as C compilers are adopted more readily on each new platform.

Once we started the meeting, we looked at Skip Cave's suggestion that J web tutorials focus on teaching math with J as a tool rather than language itself. He pointed to the Khan academy tutorials as an example of compelling instruction that's been proven very popular. He also emphasized the utility of having J tutorials available on cloud servers - this would take advantage of J's recent separation of the front end from the computational engine. Skip also suggested that such a server could be hosted on existing facilities like Amazon at a nominal cost.

We also considered Richard John's suggestion that the followers of reddit would welcome a well-written tutorial on how to solve a problem using J.

These ideas, along with the restriction that a successful tutorial should be organized in small modules, inspired me to take a stab at this genre with a one-page example of using J code to download financial data from the web and analyze it. This is presented as the File:Quick and Dirty Data Analysis with J.pdf paper - also here.

We start with motivation from a Wall Street Journal article about the VIX - the volatility index. The article makes some verifiable claims about the level of this index, so one of the challenges is to check these claims.

We start by showing where we find the data - on Yahoo Finance (finance.yahoo.com) - and assume we downloaded the entire daily price history of the VIX into a .csv file. The J code, commented fairly extensively, shows how we might read this file into some variables and select a portion of it (dates and adjusted closing prices) to analyze. We show different things we can figure out from the data, including verifying the claims in the article. Along the way, we also illustrate alternate J formulations to accomplish the same task, as well as a brief example of deriving a tacit expression from explicit code.

Show-and-Tell

In looking at the graphic in File:Example of type of heat-map.pdf (below), we see how a large amount of data can be shown in a manner that lets us draw some useful inferences from it quite easily. Even looking at the black-and-white version in our printed materials, it's easy to see that three of the items stand out from the others as dark vertical lines across the entire time period. These three items represent the industries "multi-utilities", "gas utilities", and "electric utilities", which are obviously closely related to each other. We also see a cross-industry effect of some kind starting at the beginning of 2011 and persisting until near the end of that year.

This kind of graphic, which was generated by viewmat, is obviously very helpful. However, the difficulty with the example shown is that all the labels and the key had to be inserted into the image manually. It's not too hard to see that this could be handled more programmatically but we don't presently have an easy and flexible way to insert arbitrary text into an image. Ric, joining us online, mentioned that R is able to generate graphics like this fairly easily. It remains a future project for when we get more comfortable with the capabilities of the GTK-based graphics.

Our next section, File:Defining Break Points for Categorizing Hands.pdf, focused on a discussion of some preliminary work on a poker simulation. Like the earlier section on "quick and dirty data analysis", this material could be suitable for a tutorial on J. The paper explains work done to build a foundation for parameterizing qualitative features like "conservative" or "aggressive" play. We show how we develop some simple tools to analyze the results of many simulated games and discover some features of the data extracted from these simulations.

Defining Break Points for Categorizing Hands

The eventual objective of these simulations is to study conservative versus aggressive playing and better understand how these styles interact. There are three aspects to this spectrum of aggressiveness: playing a hand, betting a hand, and updating estimations of opponents. In this section, we will consider mostly the first of these: how aggressively a player plays a hand.

We define a most conservative player as one who only plays a hand that he believes is likely to be the best hand at the table. This implies, e.g. in a 6-player game, the most conservative player should fold 5/6ths of his initial hands on average. How do we know how an initial hand rates? A simple, numerical measure can be based on simulating many games and analyzing the distribution of initial hands. Having this empirical distribution allows us to set breakpoints which are the hand values that partition the distribution into ranges from the worst to the best hand at each point in the deal. Obviously, the distribution and the set of associated breakpoints also varies by the number of players.

We ran a number of simulations for games ranging from two to seven players and generated hand estimations at each round in the deal.

The first column of the first five rows of each six by two table has the results from each player’s estimation of what he thinks the strength of each others’ hand is – an N by N matrix of hand strengths from zero to less than nine. The second column of the first five rows shows each player’s estimate of their own hand. The last row has a return code in the first column and the final hands in the second.'

The data from each simulation is saved in a 6x2 table, giving us variables like this, one for each number of players per game.

Name	Shape
aggr2p	21913 6 2
aggr3p	20440 6 2
aggr4p	23618 6 2
aggr5p	20141 6 2
aggr6p	20347 6 2
aggr7p	21729 6 2

Here is an example from a 2-player game.

   >0{aggr2p
+-----------+--------------------+
|0.946 1.186|1.1284547 1.0970961 |
| 0.96  1.19|                    |
+-----------+--------------------+
|0.871 1.323|1.0516112 1.1561319 |
|0.895 1.377|                    |
+-----------+--------------------+
|1.459 1.163|1.112044 1.2534589  |
|1.478 1.199|                    |
+-----------+--------------------+
|1.244 2.128|0.99216431 1.6778151|
|1.231 2.166|                    |
+-----------+--------------------+
|1.244 2.132|0.79497332 1.6406574|
|1.236 2.166|                    |
+-----------+--------------------+
|1          |51 28 26  7  0 16 21|
|           |29  2 35 25 42 12 34|
+-----------+--------------------+

The first pair of entries shows that both players think the second player has the stronger hand showing – column 0 – based only on the up cards, but the first player has a higher estimate of his own hand based all the cards he can see – column 1.

We can see why this is by looking at the final cell in the last row to see what the final hands are after we put these in a more human-readable form along with a numerical rating for each.

   showTypeHand >(<_1 _1){0{aggr2p
+-----+--+--+--+--+--+--+---+
|1 193|AS|4H|2H|9C|2C|5D|10D|
+-----+--+--+--+--+--+--+---+
|2 768|5H|4C|JH|AD|5S|AC|10H|
+-----+--+--+--+--+--+--+---+

The ratings in the first column tell us that first hand ends up as a low pair and the other is a high two-pair.

Define Break Points for Each Betting Round

   (showTypeHand >(<_1 _1){0{aggr7p);(<0 0){0{aggr7p
+-------------------------------+-----------------------------------------+
|+------+--+--+---+--+---+--+--+| 0.99 1.246 1.049 1.096 0.966  0.96 1.141|
||4 10  |AH|JC|10D|2S|QH |5S|KD||    1 1.255 1.074 1.237     1 0.995 1.166|
|+------+--+--+---+--+---+--+--+|1.019 1.288 1.066 1.255 1.019  1.01 1.084|
||1 2256|5H|QC|QS |4H|8D |9D|7H||    1 1.272 1.084 1.217     1 0.991 1.164|
|+------+--+--+---+--+---+--+--+|1.003 1.276 1.083 1.237 0.997     1 1.165|
||3 127 |8H|3S|3C |AS|3D |6S|2D||1.005 1.279 1.077 1.238 1.018 0.986 1.168|
|+------+--+--+---+--+---+--+--+|0.992  1.27 1.077 1.123 0.993 0.987 1.149|
||1 2196|6H|JS|JD |KH|AD |7C|2C||                                         |
|+------+--+--+---+--+---+--+--+|                                         |
||1 1940|4S|7D|10H|2H|10C|AC|6C||                                         |
|+------+--+--+---+--+---+--+--+|                                         |
||1 824 |6D|5C|10S|5D|QD |9S|KS||                                         |
|+------+--+--+---+--+---+--+--+|                                         |
||1 1475|9C|JH|8S |8C|7S |4D|KC||                                         |
|+------+--+--+---+--+---+--+--+|                                         |
+-------------------------------+-----------------------------------------+
   >(<0 1){0{aggr7p
1.1499581 1.1685127 1.0405408 1.1439841 1.0595902 1.0607454 1.1326227
   3{.>(<0 1){"2 aggr7p
1.1499581 1.1685127 1.0405408 1.1439841 1.0595902 1.0607454 1.1326227
1.1079401 1.1196687 1.0851783 1.2412727 1.1369334 1.1205553 1.0671706
1.0513211   1.13703 1.2481303 1.0217042 1.0393133  1.183652 1.1264991
   load 'mystats'
   $,>(<0 1){"2 aggr7p
152103
   usus ,>(<0 1){"2 aggr7p  NB. USual Stats: Min, Max, Mean, SD
0.87239003 1.3596847 1.1166189 0.063583797

   'ev*' names 3
evenlyPartition   evenlyPartitionBy
medianAbsDev      signGrpDev
   evenlyPartition
4 : '(x(([:i.]) e. ([:i.[) * [:>.%~)#y)<;.1 y'
   evenlyPartitionBy
4 : '(+/ (+/\>1{x) >/~ (}.i.>0{x)*(>0{x)%~+/>1{x) </. y'

Drop initial partition index - want only internal points.

   7 ([: }. ((([: i. [) * [: >. %~) ])) #,>(<0 1){"2 aggr7p
21729 43458 65187 86916 108645 130374
   brkpts=. 7 ([: }. ((([: i. [) * [: >. %~) ])) #,>(<0 1){"2 aggr7p

Look at points on either side to ensure no jumps…^^

   (brkpts+/_1 0 1){/:~,>(<0 1){"2 aggr7p
1.0493213 1.0493226 1.0493284
1.0762388  1.076239 1.0762397
1.0995731  1.099574 1.0995744
1.1234658 1.1234669 1.1234675
1.1506786 1.1506802 1.1506838
1.1871864 1.1871874 1.1871903

Breakpoints and SDs for 7-player, 1st round estimate of own hand.

   (mean,stddev)"1 (brkpts+/_1 0 1){/:~,>(<0 1){"2 aggr7p
1.0493241 3.7527935e_6
1.0762392 4.9394471e_7
1.0995738 6.7729404e_7
1.1234667 8.4423125e_7
1.1506808 2.6525002e_6
 1.187188 2.0266247e_6

   dd=. 'C:\amisc\work\AgentBasedSimulation\FullGameNoBet\'
   ;0{&.>(<dd) unfileVar_WS_ &.>(<'aggr'),&.>'23456',&.>'p'
+-+-+-+-+-+
|1|1|1|1|1|
+-+-+-+-+-+
   brkpts=. 2 ([: }. ((([: i. [) * [: >. %~) ])) #,>(<0 1){"2 aggr2p
   (mean,stddev)"1 (brkpts+/_1 0 1){/:~,>(<0 1){"2 aggr2p
1.1134959 6.243821e_6

Bring together what we've done so far in a dyadic verb nPlayerBreakpoints:

nPlayerBreakpoints=: 4 : 0
   brkpts=. x ([: }. ((([: i. [) * [: >. %~) ])) #,>(<0 1){"2 y
   (mean,stddev)"1 (brkpts+/_1 0 1){/:~,>(<0 1){"2 y
)

      2 nPlayerBreakpoints aggr2p
1.1134959 6.243821e_6
   (<'aggr'),&.>'234567',&.>'p'
+------+------+------+------+------+------+
|aggr2p|aggr3p|aggr4p|aggr5p|aggr6p|aggr7p|
+------+------+------+------+------+------+
   $2 nPlayerBreakpoints aggr2p
1 2
   $npBrkPts=: 2 3 4 5 6 7 nPlayerBreakpoints&.>".&.>(<'aggr'),&.>'234567',&.>'p'
6
   #&.>sds=. 1{"1&.>npBrkPts
+-+-+-+-+-+-+
|1|2|3|4|5|6|
+-+-+-+-+-+-+
   >./;sds
6.243821e_6

Low deviations imply these will gives us not overly sensitive breaks.

   3{.npBrkPts  NB. Round 0 break points for 2, 3, and 4 players…
+---------------------+----------------------+----------------------+
|1.1134959 6.243821e_6|1.0878133 3.1880772e_6|1.0746147 9.8761903e_7|
|                     |1.1392025  1.176969e_6|1.1121257 1.0085457e_6|
|                     |                      |1.1558089 5.9432099e_6|
+---------------------+----------------------+----------------------+

Generalize to work on arbitrary round ( 1{x ) instead of round 0.

nPlayerBreakpoints=: 4 : 0
   brkpts=. ('''0{x''') ([: }. ((([: i. [) * [: >. %~) ])) #,>(<1''',~1{x'''){"2 y
   (mean,stddev)"1 (brkpts+/_1 0 1){/:~,>(<1''',~1{x'''){"2 y
)

Now do it for all rounds and numbers of players:

   >nrXnp=. <"1 <"1(i.5),"0~/2 3 4 5 6 7
+---+---+---+---+---+---+
|2 0|3 0|4 0|5 0|6 0|7 0|
+---+---+---+---+---+---+
|2 1|3 1|4 1|5 1|6 1|7 1|
+---+---+---+---+---+---+
|2 2|3 2|4 2|5 2|6 2|7 2|
+---+---+---+---+---+---+
|2 3|3 3|4 3|5 3|6 3|7 3|
+---+---+---+---+---+---+
|2 4|3 4|4 4|5 4|6 4|7 4|
+---+---+---+---+---+---+

   $npBrkPts=: nrXnp nPlayerBreakpoints&.>&><".&.>(<'aggr'),&.>'234567',&.>'p'
5 6
   $&.>npBrkPts
+---+---+---+---+---+---+
|1 2|2 2|3 2|4 2|5 2|6 2|
+---+---+---+---+---+---+
|1 2|2 2|3 2|4 2|5 2|6 2|
+---+---+---+---+---+---+
|1 2|2 2|3 2|4 2|5 2|6 2|
+---+---+---+---+---+---+
|1 2|2 2|3 2|4 2|5 2|6 2|
+---+---+---+---+---+---+
|1 2|2 2|3 2|4 2|5 2|6 2|
+---+---+---+---+---+---+
   sds=. 1{"1&.>npBrkPts
   >./;,sds       NB. Find greatest deviation
4.0257581e_5
   +./&>sds e.&.>>./;,sds  NB. Location of greatest deviation
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 1 0 0 0 0
   npBrkPts=. 0{"1&.>npBrkPts   'NB. Only need mean...

   >0{"1 npBrkPts   NB. Does it make sense that these decrease by round?
1.1134959
1.1035786
1.0800876
1.0576629
1.0462553

   >1{"1 npBrkPts   NB. Lower break point decreases but higher one increases
 1.0878133 1.1392025
 1.0542439  1.152115
 0.9988874 1.1774303
0.93475572 1.2265425
0.85996527 1.2861884

   >2{"1 npBrkPts
 1.0746147 1.1121257 1.1558089
 1.0277973  1.101737 1.1855156
0.95291753 1.0821581  1.243669
0.86532895 1.0663832  1.321222
0.76023436 1.0576779 1.3913404

   0.01 roundNums >5{"1 npBrkPts
1.05 1.08  1.1 1.12 1.15 1.19
0.98 1.03 1.08 1.12 1.18 1.25
0.88 0.97 1.04 1.12 1.22 1.37
0.75 0.89 1.01 1.13 1.28 1.48
0.63  0.8 0.97 1.15 1.34 1.63

   brXnpBrkPts=: npBrkPts        NB. More mnemonic name:
   dd fileVar_WS_ 'brXnpBrkPts'  NB. Betting Round by Number of Players…
+-+---------------+
|1|BRXNPBRKPTS.DAT|
+-+---------------+

“Mystery” of the Declining Break Points Values Solved by Pictures

News

Our regular "Advanced Topics" section was replaced by "News" this month as there was a lot to talk about. There has been a lot of work done on various facets of "OpenJ" and there are a number of upcoming J-related events for which we should be planning.

Perhaps the most exciting recent work in OpenJ was Michael Dykman's completion of his initial Android port of jconsole. As we were discussing issues around this and the desirability of porting J to other platforms like Chrome, Zach downloaded the new Android port onto his phone and showed us all the working version of J; the JHS version also runs on his phone. There was talk of a version of Ubuntu Linux that runs on Android as well as the availability of Android on Intel platforms. Intel appears to be targeting their Atom chip for the Android implementation as this is a low-power chip more suitable for mobile devices than are their main products.

One issue that came up was, with the multitude of platforms and directions for J to evolve toward, the efforts on the language are getting somewhat diffuse: we're going in many directions at once. This is probably a good thing in general but, given the smallness of our community, it slows the gains in any particular direction. In spite of this, we are seeing progress on a number of areas, not the least of which is getting a full set of GTK features accessible from J - both in the console and JHS versions. It looks like there's growing grid functionality which leads to a number of interesting possibilities.

We talked briefly about the upcoming JSoftware conference in Toronto (July 23-24). A few of us in the room are either attending or speaking at this much-anticipated event, though some of us have only a vague notion about what we might be speaking.

There's also an APL moot next month - April 27-29 - just outside of London. I may be the only J person going to this but it promises to be very enjoyable and intellectually stimulating.

There are also conferences by STSC (April 22 -24 in Jersey City NJ) and Dyalog (Elsinore, Denmark, from Sunday, October 14) later this year, as well as the Dyalog APL programming contest, but we didn't really talk much about these other than to note the Dyalog contest has prizes for non-students.

Learning, teaching, and problem-solving

Moving on, we discussed the idea of a visual interface for J - what this might mean and how it might look - by considering a few existing visual interfaces geared toward programming largely by assembling visual icons. We looked at the visual formula builder for Clarifi - a CapitalIQ product on which I work - as well as a web-based tool called Waterbear and an example (sample below) of File:VisualJ4.pdf sent to us by Ronan Reilly.

The visual metaphor has some attraction but we all seemed to be in agreement that the compelling case for it has yet to be made. It's hard to judge fairly these novel visual representations against a notation with which many of us are quite familiar. By comparison to existing J, the visual systems seemed somewhat bulky and not sufficiently well-ordered to be unambiguous aids to understanding: it appears that any of pictures could be drawn in quite a few different ways. A typical visual representation seems to have a tendency to grow across the screen rather quickly compared to the more compact notation it purports to represent.

We also discussed an online Stanford course on Probabilistic Graphical Models and the difficulty of understanding much of what has been written about Bayesian networks.

The meeting ran later than usual but we didn't have time to get to the more general material on teaching mathematics to fairly young children (and their ability to understand these concepts) and research on techniques for getting around mental blocks to problem-solving.