NYCJUG/2009-02-10

From J Wiki
Jump to navigation Jump to search

J style, array programming, interactive graphics, array data, statistics teaching.

Location:: BEST, 33-41 Newark Street, PH, Hoboken, NJ


Meeting Summary

We talked about what distinguishes J code that speaks to the strengths of J versus code that doesn't, how certain styles of coding reflect a scalar approach, not an array-based one.

Then we looked at an example or two of some tentative steps toward interactive graphics.

We also examined how languages other than J have had some success and might offer some ideas for doing things better. There's a well-developed Perl data language which makes a good pass at dealing with arrays. Also, we looked at the interesting mathematics behind the logo of Mathworks and talked about how we might come up with a logo worthy of J.

There's also an APL-based effort on a freely-available array-language: NARS (see below).

Continuing our examination of good ideas from other languages, we perused a survey of online statistics teaching tools and an example of introducing high-school students to programming using an interactive language (Python).

Finally, we contemplated some ideas on what makes a great hacker.

Agenda for NYCJUG of 20090210

1. Beginner's regatta: how to explain J versus non-J?
See "Candlestick Chart Attempt.doc".

2. Show-and-tell: interactive graphics attempt; see "GridAndPlot0.doc";
also "plot3Dinteractive.ijs".

3. Other language comparisons: Perl Data Language tour vs. JDB (see
"PDL Tour.doc"), and Mathworks logo story (see "Mathworks Logo story.doc")
 -> J logo?

4. Learning, teaching and promoting J: see "ArrayOfOnlineStatTeachingTools.pdf",
"Introducing Programming in High School Using Python_inpGrPeBaSa06a.pdf", and
"Great Hackers by Paul Graham.doc".

Proceedings

We take a look at a solution using J that works more like languages not as fluent in arrays: a submission on implementing candlestick, or OHLC (Open-High-Low-Close) charts of equity prices.

Beginner's regatta

We considered some code that works correctly but is not very "J-like". We then discussed what separates a J approach from a less array-based one. We looked at code mentioned in a discussion on how to implement "point-and-figure" charting.:

from	[somebody@someplace.something]
to	Programming forum <programming@jsoftware.com>
date	Sun, Feb 8, 2009 at 7:11 PM
subject	Re: [Jprogramming] Point & Figure charting in J?

On 23 Dec 2008, I wrote:
> Recently, there was discussion of OHLC and Candlestick charting of
> stock market data.  Rather than reinvent the wheel, I thought I'd ask
> first: has anyone created J code that will convert stock market data
> into a point & figure chart and that they'd be willing to share?
There being no responses with such code, I went ahead and wrote an explicit verb that creates an array containing P&F chart data (see bottom of this post).  Admittedly, it's not very J-ish at all, but it does seem to work.
...

There's a question about passing arguments to a function:

(1) How do I pass *six* arguments to the verb?
...
these might possibly be additional arguments, for a total of *eight* arguments) How would any of these multiple arguments be made "optional" (i.e., they could be present or not)?

This problem is avoided for the first cut of the code by hard-coding some values with which to run the example.

There's also a question about graphical display:

(2) How can the resulting array be "plotted" (or converted into some sort of resizable diagram/chart/whatever) in a way that shows all the (perhaps tiny) X's, O's, and date digits?  It would be nice, too, to impose a gridwork surrounding the array elements.

This is followed by some J code that is too scalar-oriented to be fluent, idiomatic J. For example,

pfchart=: 3 : 0

NB. constants (should be input by user instead?):
nBox=. 150    NB. size of charting box (related to nMaxRows below))
nRevBoxes=. 3    NB. number of boxes needed for reversal
nMaxColumns=. 125
nMaxRows=. 50    NB. this can affect value of "nBox" if calculated by formula
nShowDate=. 1    NB. flag to display dates or not

NB. initialize variables:
nBoxHigh=. 0
nBoxLow=. 0
nHigh=. 0
nLow=. 0
nClose=. 0
nNewHigh=. 0
nNewLow=. 0
nCurrCol=. 0
nFirst=. 1    NB. flag is turned "off" (false) after first entry
nXCol=. 1
nOCol=. 0
nColType=. nXCol
sLastMonthPlotted=. ''
sLastYearPlotted=. ''
sYr=. ''
sMn=. ''
nMon=. 0
sMonth=. ''
nRowOffset=. 0

Here we see possible array elements broken out into scalars, e.g. there's a "box" high and low as well as a "high, low, close" group, as well as a "new" high and low pair. Finally, we see the year and month broken apart as separate items even though they are part of the same time dimension.

This is followed by some code to read in (Yahoo!) data broken down into pieces instead of being treated as a whole:

NB. read in (Yahoo) market data file and cull out date/high/low/close values:
bMktData=. readcsv (jpath '~user\data\DJI-r-pf.csv')
bDate=.  0 {"1 bMktData   NB. column 0 is date (format: yyyy-mm-dd)
bHigh=.  2 {"1 bMktData   NB. column 2 is high
bLow=.   3 {"1 bMktData   NB. column 3 is low
bClose=. 4 {"1 bMktData   NB. column 4 is close

However, this does break the data into vectors which are probably the more convenient array form for what follows. Unfortunately, these vectors are used in an enormous "for" loop where they are indexed by a counter which is simply sequential:

for_i. i.((#bMktData)-1) do.

instead of casting what is inside the loop as a function to be applied across arrays.

The loop begins by assigning numerous scalars.

 sYr=. 4 {. > i { bDate
 sMn=. 2 {. 5 }. > i { bDate
 nMon=. ". sMn
 if. (9 < nMon) do.
   sMonth=. (nMon-10) { 'abc'
 else.
   sMonth=. ": 1 } sMn
 end.
 nHigh=. (". > i { bHigh) - nChartLow
 nLow=. (". > i { bLow) - nChartLow
 nClose=. (". > i { bClose) - nChartLow

Also, as these last few lines demonstrate, the potential parallelism of array-processing is lost by treating individual scalars in isolation as though there is no commonality between them.

The last indicator of excessive scalarization we will note is the repetition of a number of extremely similar, deeply nested combinations of "if" statements and "for" loops as typified by this example:

     if. (1 <: (nBoxLow-nNewLow)) do.

       for_b. (1+i.(nBoxLow-nNewLow)) do.
         sChart=. 'O' (< ((nBoxLow-b)+nRowOffset),nCurrCol) } sChart
       end.
       if. 1 = nShowDate do.
         if. (0 = (sLastMonthPlotted -: sMonth)) do.
           sChart=. sMonth (< ((nBoxLow-1)+nRowOffset),nCurrCol) } sChart
           sLastMonthPlotted=. sMonth
           if. 1 = ".sMonth do.
             if. (0 = (sLastYearPlotted -: sYr)) do.
               for_c. i._4 do.
                 sChart=. (c{sYr) (< (3-c),nCurrCol) } sChart
               end.
               sLastYearPlotted=. sYr
             end.
           end.
         end.
       end.
       nBoxLow=. nNewLow
       nBoxHigh=. nBoxLow + 1    NB. for drawing purposes, 1 box above lowest 'O'

I chose this code as an example because it does a good job of illustrating how a programmer from a background of conventional languages might break down a problem in a manner which reflects the scalar limitations of those kind of languages. Also, the code works, so it can serve as a specification for a re-factored version. Here's an example of its output:

PfChartEGDJI2007-2008.png

However, I'd leave it to someone else to re-cast this into more J-like code as it's a non-trivial task and I personally do not have much use for point-and-figure (also known as "candlestick" or "OHLC") charts. If someone is interested in doing this, the data is readily available, as is the existing code.

Show-and-tell

Here we continue the project Thomas demonstrated a few months ago: tools for working interactively with graphics, specifically data plots.

Combine Grid and Plot

Adapting some code from Gosi in the J-Programming Group on Google shows us how we might combine a grid object displaying data with a time-series plot of some of the data.

require 'jzgrid jzplot'
cocurrent 'myform'

NB. CELLDATA=: *./~i.20
CELLDATA=:x >./    x=:i.9

GDEMO=: 0 : 0
pc gdemo;
xywh 26 42 200 133;cc grid isigraph rightmove bottommove;
xywh 249 43 190 132;cc g0 isigraph;
pas 0 0;
rem form end;
)

gdemo_run=: 3 : 0
wd GDEMO
grid=: '' conew 'jzgrid'
show__grid 'celldata'

loc=: conew 'jzplot'             NB. create plot object
PForm__loc=: 'myplot'            NB. define PForm in loc
PFormhwnd__loc=: wd 'qhwndp'     NB. define PFormhwnd in loc
PId__loc=: 'g0'                  NB. define PId in loc

plot__loc 7| 1 2 3 4 5 2 2 4    NB. draw plot on the form
pd__loc 1 2 1 12 4 5 2
pd__loc 1 3  4 5
pd__loc'show'

wd 'pshow'
)

create=: gdemo_run

destroy=: 3 : 0
destroy__grid''
wd 'pclose'
codestroy''
)

gdemo_close=: destroy
create''
end;

Subsequently, I wanted to use this with more realistic data, so first I get some prices:

NB.* getDJI.ijs: get Dow Jones price data into usable forms.

load 'mystats csv'

getData=: 3 : 0
   'djtit djipxs morets umd'=: datXform readcsv y
NB.EG 'djtit djipxs morets umd'=: getData 'C:\Data\DJIpxs19281001-20081214.csv'
)

datXform=: 3 : 0
   'djtit dj'=. y
   djdts=. ".&>'-'-.~&.>dj{"1~djtit i. <'Date'
   djipxs=. ".&>dj{"1~djtit i. <'Adj Close'
   mop=. (<.100%~djdts) </. djipxs
   morets=. }:ret1p (".>(<0 1){dj),{:&>mop   NB. Monthly returns
   umd=. }:~.<.100%~djdts                    NB. Unique Monthly Dates
   djtit;djipxs;morets;umd    NB. Titles, price etc., monthly returns, YYYYMM dates.
)

This data can be embedded in the script file, as in the following, or read from the table downloaded from Yahoo! directly in much the same fashion as shown here.

'djtit djipxs morets umd'=: datXform DJIData=: <;._1&><;._2]0 : 0
Date,Open,High,Low,Close,Volume,Adj Close
1928-10-01,239.43,242.46,238.24,240.01,3500000,240.01
1928-10-02,240.01,241.54,235.42,238.14,3850000,238.14
. . .
2008-12-09,8934.10,8978.14,8591.69,8691.33,5693110000,8691.33
2008-12-10,8693.00,8942.46,8589.86,8761.42,5942130000,8761.42
2008-12-11,8750.13,8861.86,8480.18,8565.09,5513840000,8565.09
2008-12-12,8563.10,8705.43,8272.22,8629.68,5959590000,8629.68
)

This display is elided for readability as we don't want to show all 80 years of daily data here.

Next we use this to populate the grid and provide data for a couple of graphs. There is a problem with resizing the form currently. Also, the use of the arbitrary numbers for the initial size bothers me.

GridWellBehavedPlotNot.png GridAndPlotInitialGriddedNumbered 50pct.png
Resizing the form: the grid changes but the plot doesn't. Perhaps something like this would help on subsequent alterations to the items on the form? The numbered grid helps us to make judgements about placement and sizes of the pieces.

These problems were mitigated somewhat by a modification to the initial form definition:

GDEMO=: 0 : 0
pc gdemo;
xywh 26 42 200 270;cc grid isigraph bottommove rightmove;
xywh 249 43 190 132;cc g0 isigraph bottommove rightmove;
pas 0 0;
rem form end;
)

However, the resized plot does not automatically re-draw:

RightmoveBottommoveNotWorkingOnGraph smaller.png The re-sized plot partially covers and whites out the grid.

The following helps somewhat but the plot still fails to re-draw without some extra effort:

GDEMO=: 0 : 0
pc gdemo;
xywh 26 42 200 270;cc grid isigraph bottommove;
xywh 249 43 190 132;cc g0 isigraph rightmove;
pas 0 0;
rem form end;
)
PartialFixBottommoveGridRightmoveGraph smaller.png

Further complications arise as we add more graphs.

require 'jzgrid jzplot'
SHOWSTARTINFO=: 0
cocurrent 'myform'

load 'getDJI.ijs'                  NB. Get Dow Jones returns (from Yahoo).
load 'plot stats'
load jpath '~system\packages\math\matfacto.ijs'
SHOWSTARTINFO=: 0
load 'mystats dhmutils'

CELLDATA=: 80 12$_2}.morets

GDEMO=: 0 : 0
pc gdemo;
xywh 26 42 200 270;cc grid isigraph bottommove;
xywh 249 43 190 132;cc g0 isigraph rightmove;
xywh 249 180 190 132;cc g1 isigraph rightmove;
pas 0 0;
rem form end;
)

gdemo_run=: 3 : 0
   wd GDEMO
   grid=: '' conew 'jzgrid'
   show__grid 'celldata'

   loc=: conew 'jzplot'            NB. create plot object
   PForm__loc=: 'myplot'           NB. define PForm in loc
   PFormhwnd__loc=: wd 'qhwndp'    NB. define PFormhwnd in loc
   PId__loc=: 'g0'                 NB. define PId in loc

   pargs=. 'title Year-to-Year Correlation of Monthly Returns'
   pargs plot__loc corr/&>2<\_12[\_2}.morets
   pd__loc'show'

    PId__loc=: 'g1'                  NB. define PId in loc for 2nd plot
    'hargs hdat'=. 'Monthly Return Frequencies' figureHistogram morets [ PCT=: 1
    hargs plot__loc >>&.>2{&.>hdat
    pd__loc'show'

   wd 'pshow'
)

create=: gdemo_run

destroy=: 3 : 0
destroy__grid''
wd 'pclose'
codestroy''
)

NB.* figureHistogram: figure parameters and data for histogram.
figureHistogram=: 3 : 0
   ('Histogram';'') figureHistogram y
:
   if. 0>4!:0 <'NB' do. NB=. 21<.>:#~.,openbox y end.     NB. Num Buckets
   y=. (]`,: @. ((0=L. y)*.(1=#$y)*.1~:#y)) y NB. Vec->1 row mat
   'pltit keylabel'=. 2{.boxopen x
   if. 0>4!:0 <'PCT' do. PCT=. 1 end.   NB. Assume numbers should be percents.
   mmy=. 13 : '(<./;<./&.>y),>./;>./&.>y' y  NB. Min and max values of all ys
   if. 0>4!:0 <'BKTS' do.               NB. "0.99*" -> division < least.
       BKTS=: steps ((0.99*0{mmy),1.00001*1{mmy),NB
   end.
   if. 0=4!:0 <'FORCEBKT' do. NB. Force a particular bucket division, eg. 0.
       BKTS=. BKTS+FORCEBKT-BKTS{~0{/:|BKTS-FORCEBKT
   end.

   xlbls=. ;(": &.>(PCT{0.01 0.1) roundNums (PCT{1 100)*BKTS),&.><(PCT#'%'),' '
   xlbls=. ('_';'-') stringreplace xlbls NB.  plot because seems more useful.
   plargs=. 'bar;plotcaption Histogram;title ',pltit,';xlabel ',xlbls
   plargs=. plargs,';labelfont "Courier New" 14;keyfont "Courier New" 16'
   if. 0=#keylabel do. keylabel=. ":i.#y end.
   plargs=. plargs,(1~:#y)#';key ',keylabel NB. No key if only 1 data series.
   if. 0=4!:0 <'OTHERPLOTARGS' do. plargs=. plargs,';',OTHERPLOTARGS end.
   if. 0=L. y do. y=. <"1 y end.     NB. Make all boxed data
   BKTS=. <BKTS
   plargs;<BKTS histoDistrib&.>y
NB.EG pla plot__loc >>&.>2{&.>pld [ 'pla pld=. figureHistogram data
)

gdemo_close=: destroy
create''
end;

Initial2graphs smaller.png

As we add more pieces to the form, we quickly see the need for a way to specify the layout: how do the pieces relate to each other? Thomas mentioned that Java deals with this by specifying three general types of layout: horizontal, vertical, and grid. What we would really like is a general way to link parts within a form to each other and have them respond to resizing of the parent form in useful, logical ways.

3-D Interactive Plot

We looked at a simple 3-D interactive plotting routine which was submitted by a forum participant in October, 2008. It displays a surface plot that can be rotated on each of three axes by means of sliders.

For example,

3DGraphInteractive00.png Initial view of a randomly-generated surface.
3DGraphInteractive01.png Same surface rotated slightly by moving the "Y" slider.
3DGraphInteractive02.png Subsequent rotation in the "Z" axis.

This is a good starting point but raises questions about how to better design the rotation interface as a slider has a limited range of motion and does not map well to the idea of rotation. What we'd really like is an a widget that works like the i-pod's wheel interface.

Other Language Comparisons

We looked at what Perl people have done coming up with a Perl Data Language. They do a nice job of accomodating multi-dimensional arrays. Here's part of the introduction to give a flavor of how they go about it.

Perl Data Language

Generation of PDLs

PDLs or "piddles" are N-dimensional data cubes. There are several ways of generating piddles:

We generate a zero filled 5x5 matrix:

perldl> $a = zeroes 5,5;
perldl> print $a;

[
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
 [0 0 0 0 0]
]

Now, don't think that the number of dimensions is limited to two:

perldl> $m = zeroes(3,2,2); # 3x2x2 cube
perldl> print $m;

[
 [
  [0 0 0]
  [0 0 0]
 ]
 [
  [0 0 0]
  [0 0 0]
 ]
]

From here, the site details how to manipulate these multi-dimensional arrays. See below for a screenshot of the homepage of this site.

Cleve Moler has posted an essay explaining the significance of the Mathworks logo. Mathworks is the company that markets Matlab.

He introduces the wave equation and goes on to talk about the logo, which is an L-shaped membrane, and how it is mathematically interesting. Or rather, non-mathematically interesting as it is one of the simplest geometries for which solutions to the wave equation cannot be expressed analytically, so numerical computation is necessary. The 270º nonconvex corner causes a singularity in the solution.

This is a good logo because it looks intriguing and it has an interesting story behind it that is relevant to one of Matlab's strengths - numerical computation.

We've talked on occasion about a J logo. It would be good to have one that meets these criteria. There was some recent discussion about the relation of GCD to LCM, and Oleg Kobchenko came up with some interesting graphics related to this, so something along those lines would be good.

   load'viewmat'
viewmat (*. % +.)"0/~ 2+i.100 GCDLCM0.png
viewmat | (*. % +.)"0/~ 2+i:50 GCDLCM1.png
viewmat |%:(*. % +.)"0/~ 2+i.100 GCDLCM2.png

We also looked at a 3-D emacs logo designed by someone who writes his own rendering code. There's no real story behind this, but it does look good and has a high geek factor.

EmacsLogo.png

Learning and teaching J

We finished up the evening by looking at some articles. One, on online teaching tools (abstract here) was about what teaching tools are available online. We speculated how to get J noticed in this space. It probably boils down to having more useful labs and examples available. Here's another, more general article from the Journal of Technology Studies on how to integrate online teaching tools in the classroom.

The next article was about some success integrating Python into a high-school programming curriculum. Many of the strengths the authors cite for Python apply equally well to J.

Finally, just for fun, we looked at Paul Graham's blog entry on great hackers and what makes them great. Many of his points should resonate with J afficionados.

Scan of Meeting Notes

NYCJUGMeetingNotes090210 30pct.jpg NARS2000 intro start.png PerlDataLanguageHomePage.png