NYCJUG/2009-01-13

Reading J, talent pool, meteor puzzle, depth-first versus breadth-first, multi-dimensional application of key, various graphics projects, popularity of other data analysis languages.

Location:: Heartland Brewery, 34th and 5th, NYC

Summary of Meeting

We talked about how to read J and help others to understand it. This led us to discuss the pros and cons of "raw" code versus extensive cover functions. We discussed the depth of the J talent pool and how the community might address its shallowness.

John reported on his work on the Meteor puzzle and the particular difficulties posed for J by this kind of problem as it is more amenable to a depth-first approach whereas J is better at breadth-first solutions.

We looked at how the key adverb might be applied to multi-column data with consistent, repeated values to transform it into an equivalent multi-dimensional array.

We then briefly examined the potpourri of graphics applications already written in J and speculated how these might contribute to a project of packaging J for graphical exploration of information. This led naturally to discussion of a recent New York Times article about the popularity of the R language.

Meeting Agenda for NYCJUG 20090113

1. Beginner's regatta: how to read J?

Is the J talent pool deep enough?  How might we cope with its shallowness?


2. Show-and-tell: Meteor report?

How to do multi-dimensional key?


3. Advanced topics: various graphics-related projects: "isigraph Paint Demo.doc",
"J OpenGL.doc", "plottingRepeatedly3D.txt", "TimeSeriesScriptsJWiki.doc", and
first few pages of "OpenGL Example with Keystrokes.doc".


4. Learning, teaching and promoting J: "Data Analysts Captivated by R.doc",
"Introducing Programming in High School Using Python_inpGrPeBaSa06a.pdf", and
"Great Hackers by Paul Graham.doc".

            +.--------------------+.

Debts never go undischarged -
if the borrower doesn't pay,
the lender does.
  - Grant's wise man

Proceedings

There was some general discussion about working environments in which Dan mentioned that sockets don't work in jconsole. I had mentioned how happy I am with my current environment using jconsole from within an emacs session but that there are a few remaining difficulties like the slight difference in the behavior of "plot": a bug prevents you from showing a surface plot without grid lines.

We also talked about how useful it would be to have a tree datatype in J. Dan was strongly in favor of this but he wasn't at some of the meetings where we discussed this and had a hard time coming up with good problem domains for using trees extensively.

Beginner's regatta

[ I see that John has some material on this following topic in his The Rough Guide to Reading J on the wiki.]

John mentioned that some other participants in Project Euler saw his J code for average (+/%#) and assumed that it was some kind of mnemonic as they were not used to terse coding notation. This brings up one issue with breaking down J expressions into named pieces (an important technique as we see below): is it worthwhile when the full names of cover functions are significanly longer than their corresponding J expressions? This speaks to the utility of terse notation but highlights that it cuts two ways: the superficial ease of reading an English-like expression versus the loss of generality and understanding that comes with a dense, consistent notation.

Reading J

We looked at an example of helping novices understand J code by breaking up expressions into named pieces. This was from Raul Miller:

[Jgeneral] How to read J
from	 Miller, Raul D <rdmiller@usatoday.com>
to	 General forum <general@jsoftware.com>
date	 Fri, Apr 28, 2006 at 12:08 PM
subject	 [Jgeneral] How to read J
	
Looking at some of the recent comments on this forum, some people have problems reading J.

So perhaps a thread on "how to read J" would be helpful.

Fundamentally, to understand a sentence, you need to understand the words used in that sentence, and the grammar used in that sentence.  You also need to understand the context in which that sentence exists.

In the context of J (an imperative language), this usually means understanding what happens when the sentence is executed.

Here's an example, based on code I posted earlier today:

len=:[: # #/.
rot=:|."_1
pad=:#/. - #
compress=: 1 #"0~ #/.
expand=: len {."1 pad rot 1j1 (#"1) compress
r45=:<"1@expand #inv&> compress <@#"_1 ]/.

Without the context, this is extremely difficult to understand.  Are we looking at monads?  Dyads?  Is this meant to work on scalars?  Lists?  Is the domain meant to be complex numbers?  What is going on here?

And, the answer to these questions is that this is meant to work on square matrices.  The word r45 is a monad, and given an n by n square matrix, it produces an (_1+2*n) by (_1+2*n) square matrix with the original matrix rotated by 45 degrees and the original elements separated by the default fill for that matrix.

  r45 'ab',:'cd'
 a
b c
 d

This illustrates an important concept for understanding most any program: programs are understood in terms of what they do.  Understanding what they are supposed to do is a critical piece of knowledge in understanding how they do that.

So another question would be: how does it do that?

In other words, what how does the word 'r45' work?

That's equivalent to asking how does its definition work?  In other words, how does this sentence work:

  m=:'ab',:'cd'
  (<"1@expand #inv&> compress <@#"_1 ]/.) m

?

As this is a fork, it's equivalent to

  (<"1@expand m) #inv&> (compress m) <@#"_1 (]/. m)

Working through these pieces,

  <"1@expand m
+-----+-----+-----+
|0 1 0|1 0 1|0 1 0|
+-----+-----+-----+

  compress m
1 0
1 1
1 0

  ]/. m
a
bc
d

In other words, the very first thing we do is rotate m 45 degrees.  Most of the code is concerned with padding the various elements of m.  ]/. m gives us a three row, two column matrix and we want a three row, three column matrix.

This is typical.  Most J code is concerned with fussy little details, not with the overall result.  A corollary is that many J programs can be significantly improved (made simpler, faster, and/or easier to understand) by changing the details of how the result is represented.

I could go on and work through the details of how each step is accomplished.  But I think I've illustrated the principles:  at each step of the way you need to understand what the word means.  And, for the most part in J, that means understanding what the word does.  For verbs, that means understanding how the verb translates nouns from the verb's domain to the verb's range.

And, I don't think I can say this too many times:  to understand a sentence properly, you need to understand the context where it's used.

--
Raul

We all agreed with Dan's endorsement of Raul's mention how the bulk of the code for solving any problem has to do with details about getting input and output into good shape for processing or reading. Often, the core of the problem-solving is done in just a few lines.

These suggestions from Roger Hui might also be useful for people looking to better understand how to read J:

from	 Roger Hui <RHui000@shaw.ca>
to	 General forum <general@jsoftware.com>
date	 Fri, Apr 28, 2006 at 5:10 PM
subject	 Re: [Jgeneral] How to read J
	
The following essays in the J wiki use a style of
writing J that facilitates its reading and understanding:

http://www.jsoftware.com/jwiki/Essays/The_Ball_Clock_Problem
http://www.jsoftware.com/jwiki/Essays/Sudoku

Interestingly, shortly before the meeting, the J Forum saw another posting, from Viktor Cerovski, in which he endeavours to explain a dense bit of code in a fashion similar to that employed by Raul. However, this is sufficiently involved that it deserves its own page.

The J Talent Pool

A query about using J with C (not at all well-documented, protests Dan) led to a discussion about the viability of using J in a commercial venture and the problem a buyer might have with how few J programmers there are.

One idea is to form a loose consortium of backup programmers who would have access to the code base and have been instructed in its basic architecture.

At the moment, this is more of a solution in search of a problem given the dearth of commercial J applications.

Show-and-tell

Report on the Meteor Puzzle

John reported on his attempts to come up with a good J solution to the Meteor puzzle being worked on at the Alioth Shoot-Out. He introduced the problem by showing us a page on optimizing Java performance. There was an earlier discussion of this problem posted on the wiki with some preliminary ideas about how to approach it.

John's code was posted to the J Programming Forum. He was dissatisfied with the performance, so he did not post his solution on the shootout. He thinks that some of the difficulty, typical of many of the Alioth puzzles, had to do with restrictions imposed on the algorithm allowed.

In particular, there was a requirement that the solution should run more quickly when finding 10 solutions than when finding 1000. This implies the necessity of a depth-first search which is contrary to the way more natural to a J solution which would be breadth-first.

Apparently, based on message traffic on Alioth, much of the work toward finding solutions concentrated on finding "islands" of cells - contiguous groups of empty cells. One suggestion was to eliminate islands not containing multiples of five cells since all the Meteor pieces consist of five cells. However, as a general rule, fast J solutions are those which do a few things to large arrays, not those which need to make many, small decisions.

So, until someone comes up with some brilliant mathematical equivalence to translate the Meteor problem into another kind of problem, it remains one of those tasks for which J is not as well-suited as other kinds of languages.

Multi-Dimensional Key

Given a table which starts like this,

Date	BenchmarkGics	CurrencyCode	RatingScore	BenchmarkSpread
20070402	0	USD	AAA	5.601159
20070402	0	USD	AA+	6.186868
20070402	0	USD	AA	8.290190
20070402	0	USD	AA-	10.400237
20070402	0	USD	A+	13.762790
20070402	0	USD	A	18.255246
20070402	0	USD	A-	25.248607
20070402	0	USD	BBB+	36.491174
20070402	0	USD	BBB	51.022000
20070402	0	USD	BBB-	79.230125
...	...	...	...	...

where the header is a vector of character vectors “hdr” and the data is the matrix “dat”, how do we partition the “BenchmarkSpread” column as a multi-dimensional array, based on various columns, using the J “key” adverb “/.”?

Taking a look at some of the characteristics of this data, we might put it into vectors:

   hdr
+----+-------------+------------+-----------+---------------+
|Date|BenchmarkGics|CurrencyCode|RatingScore|BenchmarkSpread|
+----+-------------+------------+-----------+---------------+
   'dt gics sprd'=. <"1|:".&>dat{"1~ hdr i. 'Date';'BenchmarkGics';'BenchmarkSpread'
   $dat
43225 5
   'ccy scr'=. <"1|:dat{"1~ hdr i. 'CurrencyCode';'RatingScore'
   (#@~.)&>dt;gics;ccy;scr;<sprd
65 11 3 19 43225

So, we have 65 unique dates, 11 GICS codes, 3 currencies, and 19 ratings. The values of these latter three are:

   >~.&.>(<"0 gics);ccy;<scr
+---+---+---+---+--+--+--+----+---+----+---+--+---+--+-+--+----+---+----+
|0  |10 |15 |20 |25|30|35|40  |45 |50  |55 |  |   |  | |  |    |   |    |
+---+---+---+---+--+--+--+----+---+----+---+--+---+--+-+--+----+---+----+
|USD|EUR|JPY|   |  |  |  |    |   |    |   |  |   |  | |  |    |   |    |
+---+---+---+---+--+--+--+----+---+----+---+--+---+--+-+--+----+---+----+
|AAA|AA+|AA |AA-|A+|A |A-|BBB+|BBB|BBB-|BB+|BB|BB-|B+|B|B-|CCC+|CCC|CCC-|
+---+---+---+---+--+--+--+----+---+----+---+--+---+--+-+--+----+---+----+

How might we use this information to reshape the spreads as a 3 x 11 x 19 x 65 (currency by GICS by rating by date) array?

How to Apply Key Multi-Dimensionally

Here's one solution using the vectors:

   'k1g k1r k1d k1s'=. (<ccy) </. &.> gics;scr;dt;sprd
   'k2r k2d k2s'=. (<k1g) </. &.>&.> k1r;k1d;<k1s
   'k3d k3s'=. (<k2r) </. &.>&.>&.> k2d;<k2s
   $k4s=. k3d </. &.>&.>&.> k3s
3
   $>>>k4s
3 11 19 65

Are there neater solutions?

Someone suggested that the axis vectors could be converted to indexes which could then be used to populate the array with values.

Advanced Topics

We examined various graphics projects that have been done already in J with an eye to applying this work to some kind of integrated graphics package. It's this sort of package which is part of the reason behind the popularity of a language like R which was the subject of a fairly long article in the New York Times recently. There was also some follow up on a blog by the author of the original article.

Some parts of the article which may be relevant to J: "...R helps people deal with large volumes of data in a wide variety of industries.... Also of note, the software is open source, meaning people can pick it up for free and make their own changes to the code. Such flexibility has inspired statistically minded people of all stripes to get behind R and make it a real success story."

Some of the graphics packages and information we looked at included the J OpenGL introduction, some uses of J to Time Series work with time-series, the isigraph/paint demo, and the "tank" or "helicopter" demo.

Also of note was the extensive code dedicated to illustrating contra-dance steps.

Part of the difficulty with proceeding on this project, an interactive graphing tool for J, is the embarassment of riches. It's hard to know if we want to use OpenGL or go with a specialized, external package such as GnuPlot. Also, either of these alternatives requires a substantial investment in time to understand the tradeoffs necessitated by using one package over another.

There was some consensus that we ought to specialize on J's strengths and not re-invent the wheel. One example of this is using J to interface with LaPack instead of using J's native linear solver: this is a good use of the language as it leverages the enormous work that has been done in an external package.

Learning and Teaching J

What little we covered of this topic is mentioned in the previous section with the discussion of R.

Scan of Meeting Notes