NYCJUG/2023-10-10

From J Wiki
Jump to navigation Jump to search

Beginner's regatta

We look at padding and trimming arrays. That is, we want to surround a matrix with zeros (or spaces for a character matrix) or trim its end rows and columns.

Padding an Array

There are times we want to pad the edges of an array. We may want a smaller array to have the same shape as a larger array so that we can apply one to the other. Also, for instance, when running the "Game of Life", as detailed in one of the following sections, we may want to put zeros all around an array of cells we are about to evaluate so that there is room for growth.

Simple Approach

Say we have a numeric matrix

   ]mat=. >:i. 3 3
1 2 3
4 5 6
7 8 9

If we simply concatenate a zero, we get a row of zeros on the first dimension (rows). We get a vector (one dimensional) of zeros even though we append a scalar (zero dimensional) because "scalar extension" does the convenient thing in this case.

   mat,0
1 2 3
4 5 6
7 8 9
0 0 0

We can add a top and bottom row simply enough:

   0,mat,0
0 0 0
1 2 3
4 5 6
7 8 9
0 0 0

By specifying another axis, we can extend this to all four sides.

   0,"1 (0,mat,0),"1 ] 0
0 0 0 0 0
0 1 2 3 0
0 4 5 6 0
0 7 8 9 0
0 0 0 0 0

Another Way

We could also pad by overtaking the array. We can do this for two of the sides by taking with one more than each dimension as shown here:

   (>:$mat){.mat
1 2 3 0
4 5 6 0
7 8 9 0
0 0 0 0

The expression >:$mat increments the shape of mat by one. Overtaking, or taking more elements than are available, uses the default fill character to pad the overtaken array to be the specified shape. The default fill is 0 for numeric, (space) for character, and a. (enclosed empty or ace) for boxed arrays.

We can pad the row and column on the left and top rather than on the right and bottom by overtaking with the negative of one more than each dimension.

   (->:$mat){.mat
0 0 0 0
0 1 2 3
0 4 5 6
0 7 8 9

Putting these two together tacitly and generalizing to create a function, we get this:

   13 : '(>:$y){.y'
] {.~ [: >: $
   13 : '(->:$y){.y'
] {.~ [: - [: >: $
   (] {.~ [: - [: >: $)@:(] {.~ [: >: $) mat
0 0 0 0 0
0 1 2 3 0
0 4 5 6 0
0 7 8 9 0
0 0 0 0 0

This method has the advantage of generalizing to character arrays:

   (] {.~ [: - [: >: $)@:(] {.~ [: >: $) 3 3$'ABCDEFGHI'
     
 ABC 
 DEF 
 GHI 
     

A Power Approach

One more way we might look at this is that we can pad the top and the bottom, transpose the matrix, then do the same thing. If we write this to transpose the array with the top and bottom zeros added, we can apply it twice using the power conjunction:

   (3 : '|:0,y,0')^:2]mat  NB. explicit definition
0 0 0 0 0
0 1 2 3 0
0 4 5 6 0
0 7 8 9 0
0 0 0 0 0

   {{|:0,y,0}}^:2]mat     NB. direct definition
0 0 0 0 0
0 1 2 3 0
0 4 5 6 0
0 7 8 9 0
0 0 0 0 0

The timings for all three of these are pretty similar, so choosing one over the other is mostly a matter of aesthetics.

Now let's look at the inverse operation: trimming the edges of a matrix.

Trimming an Array

Starting with a padded version of our original, how might we remove the padding?

   ]fatmat=. {{|:0,y,0}}^:2]mat  NB. Use "power" version of padding with direct definition.
0 0 0 0 0
0 1 2 3 0
0 4 5 6 0
0 7 8 9 0
0 0 0 0 0

Most simply, we can use drop first }. and drop last}::

   }:"1 }."1 }:}.fatmat
1 2 3
4 5 6
7 8 9

This expression is also type-agnostic, i.e. it works equally well on non-numeric matrixes.

Show-and-tell

Sum Same Things

I recently needed to process some trade records that include plain stock trades combined with option trades on those stocks. These transactions are in a file, looking something like this:

Example Transactions.JPG

In this case we see trades of stocks and of options on those stocks. The problem is, how do we add up the total amount we have gained or lost on a given stock, including its associated options trades?

Initially, it seemed as though this might require a tedious slog through the transactions in date order, keeping track of each new issue as it is originally purchased, adding up the costs and profits for the options written on that stock, and figuring out the profit or loss as the stock is sold.

However, on reflection, its much simpler than that once we think in terms of the tools J provides. In short, we only have to sum the amounts column individually for each stock ticker. We don't need to separate the underlying stock from the options written on it, nor do we care about the order in which these transactions occur. We can simply sum the amounts based on the ticker symbol which is present in both the options trades and the stock trades. So, our code looks like this.

First we read in the CSV file, like the one above, using J's standard library function to do this, where flnm is a character string denoting the CSV file's fully-pathed name.

   dat=. readcsv flnm
   'tit dat'=. split dat   NB. Titles on first line, data is all other lines,
   dat=. }:dat             NB.  except the last one.

We split the title from the data using J's standard split function and drop the data's last line which is an irrelevant summary. We use the title row to look up columns of interest in a self-documenting way as shown here where we extract the Symbols column.

   syms=. dat{"1~tit i. <'Symbol'

Since the symbol is the first (space-delimited) item on a line for an option, we extract it and apply the expression ' ' (] {.~ i.~) to return whatever is before the first space on the line. This gives us the same symbol for a traded security whether it's for an option or for a stock on which that option is based.

   syms=. ' ' (] {.~ i.~) &.> syms

Next we extract the amounts. However, we have to fix the entries of this column to be recognizable as numbers to J by removing dollar signs and commas, then converting any minus sign to J's minus sign (_).

   amts=. dat{"1~tit i. <'Amount'         NB. Get amounts, 
   amts=. (<'-_') rplc~&.>amts-.&.><'$,'  NB.  remove characters '$,', ensure minus sign is J minus.   
   amts=. ".&>(0=#&>amts)}amts,:<'0'      NB. Put zero in any empty amount, convert to numeric vector.

Once we've done this, the total profit or loss for each symbol is calculated using J's key (/.) adverb with summation as the verb.

    (~.syms),.<"0 syms +//. amts

In this case we also concatenate column-wise the unique symbols (~.syms) of each corresponding sum so we can see which sum corresponds to which symbol.

A Simpler Way

In the meeting, someone mentioned that the new key dyad avoids having to nub (return unique elements) of the same thing twice, as in the final J statement above where we concatenate ~.syms to our amount summation. We could instead write this:

   syms {{ x, < +/ y }}/.. amts

Here we are making use of the new key dyad "/..".

Advanced topics

Going outside J for a bit, let's take a look at how APL achieves multi-threading for coarse-grained (task level) parallelism.

Isolates in APL

Dyalog APL has a spawn primitive & for simple multi-threading which appears to be much like J's multi-threading (t. and T.). Dyalog APL also a number of system commands for threading: ⎕TID, ⎕TCNUMS, ⎕TNUMS, ⎕TKILL, and ⎕TSYNC. However, isolates are a different, perhaps simpler way, to multi-thread.

The big innovation with isolates is that each new thread runs it its own namespace. A Dyalog namespace is equivalent to J's class/objects system as implemented by the colib library. A namespace is a region separate from the root namespace which is the default starting point of an APL session.

Unlike the spawn primitive, a namespace does not act on the parent namespace directly. This segregates names and processing into a distinct region of memory which can be more well-controlled than threads which directly affect the parent from which they are spawned.

Example of Isolates

Here is a simple example of creating two isolates and running them in parallel. Similarly to J, APL indents user inputs by six spaces whereas output is flush with the left margin.

      )copy isolate
C:\Program Files\Dyalog\Dyalog APL-64 18.2 Unicode\ws\isolate.dws saved Fri Mar 18 14:04:46 2022

There is a workspace isolate with the code for using isolates.

Here we create an empty namespace from the APL command line, assign a variable within it, clone that namespace to form a second namespace, re-assign the variable in the new namespace, then run an operation in parallel in both namespaces. The operation in this case is simple summation.

      ns←⎕NS ''       ⍝ Create an empty namespace
      ns.X←1 2 3 4     ⍝ Give the name X a value within the NS
      is1←ø ''         ⍝ Create an empty isolate
      is1.X←5 6 7 8 9  ⍝ Give X a different value in the isolate
      is2←ø ns         ⍝ Clone the namespace as another isolate
      isolates←is1 is2 ⍝ References to isolates can form an array
      isolates.(+/X)   ⍝ Execute (+/X) in 2 isolates in parallel
35 10

As you can see, this is a very neat and simple way to accomplish well-controlled multi-threading where the complicated details are hidden from the developer.

There's a nice example of using isolates to speed up calculations in this essay by Brian Becker of Dyalog where he shows an example of calculating the Mandelbrot set.

Learning and Teaching J

Here we look at some more APL and compare an implementation of the game of Life in APL to one in J. We also make use of the padding work we covered in the Beginner's Regatta above.

The Anticipated Renaissance of APL

In this article, the author asserts that "APL deserves its renaissance" but takes an interesting approach by starting with this line of APL:

life←{↑1 ⍵∨.∧3 4=+/,¯1 0 1∘.⊖¯1 0 1∘.⌽⊂⍵}

The author then explains that

In fact, it didn’t originate as a computer language at all. It was proposed as a better notation for tensor algebra by Harvard mathematician Kenneth E. Iverson. It was meant to be written by hand on a blackboard to transfer mathematic ideas from one person to another.

The article goes on to illustrate the effect of each step in the above expression to demonstrate the power and utility of APL.

So far, so good. However, it concludes

My theory is, since it was designed to be written by hand, it didn't work out very well with an ASCII. Didn't survive the standardization. Of course, APL has its ASCII friendly descendants inherited its expressiveness but frankly, they are all ugly far beyond the possibility of public success. Roughly 86% of all the fun you get from APL programming comes from the mysterious symbols and the magic behind them.

I think the J and K communities can agree to disagree with this conclusion.

Playing with Life

The one-liner for the famous game of "Life"[1], shown above in APL, has some limitations which become evident when we run it with an argument of a simple Boolean matrix like this:

      life←{↑1 ⍵∨.∧3 4=+/,¯1 0 1∘.⊖¯1 0 1∘.⌽⊂⍵}
      rp←↑(⊂2 2 2)⊤¨3 6 2
      rp
0 1 1
1 1 0
0 1 0

Here, rp is the pattern for the R pentonimo. However, when we run life on this, we get an unexpected result:

      life rp
0 0 0
0 0 0
0 0 0

What if we pad the matrix with a border of zeros first? Now it works as expected:

      pad0s←{⍉0,(⍉0,⍵,0),0}
      life pad0s rp
0 0 0 0 0
0 1 1 1 0
0 1 0 0 0
0 1 1 0 0
0 0 0 0 0

We know this is correct by referring to Ewart Shaw's signature in J which we mentioned in last month's meeting:

   3  ((4&({*.(=+/))++/=3:)@([:,/0&,^:(i.3)@|:"2^:2))&.>@]^:(i.@[)  <#:3 6 2
+-----+---------+-------------+
|0 1 1|0 0 0 0 0|0 0 0 0 0 0 0|
|1 1 0|0 1 1 1 0|0 0 0 1 0 0 0|
|0 1 0|0 1 0 0 0|0 0 1 1 0 0 0|
|     |0 1 1 0 0|0 1 0 0 1 0 0|
|     |0 0 0 0 0|0 0 1 1 0 0 0|
|     |         |0 0 0 0 0 0 0|
|     |         |0 0 0 0 0 0 0|
+-----+---------+-------------+

Shaw's code is significantly more complex than the APL version but that's because it does more. It pads the input with zeros (0&,^:(i.3)@|:"2^:2) and runs through multiple generations as specified by the left argument (^:(i.@[).

Improving Life

One potential problem we see here is that by padding with zeros indiscriminately, our array grows fairly rapidly even when the borders are unused. Working in APL for the moment, how might we deal with this?

Looking at the last matrix above, we can see that there are all zeros in the first row and column but also in the last two rows and columns, so this is not as simple as the trimming function we created in Beginner's Regatta. We need a more nuanced way to trim only the all-zero rows and columns. Fortunately, this is easy to do in APL, as shown here:

      trim0s←{⍉{(∨⌿⍵)/⍵} ⍉{(∨⌿⍵)/⍵} ⍵}

The expression in curly braces is APL's form of direct definition, much like J's double curly braces. In a J direct definition, the left and right arguments are assumed to be x and y, but in APL these are named α (alpha) and ω (omega). Notice also that the above has directly defined functions within a direct definition.

APL also handles axis specification differently from J, with special symbols like to denote reduction on the first axis rather than the default last axis.

Combining this with our padding function defined above, we can create a version of life more similar to Shaw's J version:

      lifeGen←{trim0s life pad0s ⍵}

Trying it out:

      lifeGen rp
1 1 1
1 0 1
1 0 0
      lifeGen lifeGen rp
0 0 1 0
0 1 1 0
1 0 0 1
0 1 1 0
      lifeGen lifeGen lifeGen rp
0 0 1 0
1 1 0 1
1 1 0 1
0 1 1 0
      (lifeGen⍣3) rp    ⍝  Third generation using "power"
0 0 1 0
1 1 0 1
1 1 0 1
0 1 1 0

The J equivalents are left as an exercise for the reader.

Further Improvements

The astute observer may have noticed a fatal flaw in the trimming method we have chosen. It works fine for the few simple cases so far but what happens when we have embedded columns or rows of zeros? Here we look at a J version to illustrate the problem.

   pad0s=: {{|:0,y,0}}^:2
   ]lmat=. pad0s 0 1 1 0 0 1 1 1 0 0 1,0 1 0 0 0 1 0 1 0 1 1,0 0 1 0 0 0 1 1 0 1 0,0 0 1 0 0 1 0 1 0 0 1,0 1 1 0 0 1 1 1 0 1 1,:0 0 1 0 0 0 1 0 0 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 1 0 0 1 1 1 0 0 1 0
0 0 1 0 0 0 1 0 1 0 1 1 0
0 0 0 1 0 0 0 1 1 0 1 0 0
0 0 0 1 0 0 1 0 1 0 0 1 0
0 0 1 1 0 0 1 1 1 0 1 1 0
0 0 0 1 0 0 0 1 0 0 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0

Notice how lmat has three internal columns that are all zeros.

   trim0s=: {{|:(+./0~:y)#"1 y}}^:2
   trim0s lmat
1 1 1 1 1 0 1
1 0 1 0 1 1 1
0 1 0 1 1 1 0
0 1 1 0 1 0 1
1 1 1 1 1 1 1
0 1 0 1 0 1 0

Our trim0s inadvertently removes all-zero rows or columns not on the periphery. Fixing this complicates the function, giving us something like this:

   trim0s=: {{|:y#"1~(+./\+./y)*.|.+./\+./|."1 y}}^:2
   trim0s lmat
1 1 0 0 1 1 1 0 0 1
1 0 0 0 1 0 1 0 1 1
0 1 0 0 0 1 1 0 1 0
0 1 0 0 1 0 1 0 0 1
1 1 0 0 1 1 1 0 1 1
0 1 0 0 0 1 0 0 1 0

In this case, we fix the compression vector by performing an or scan in both directions and anding the results. The effect of or scan on a Boolean is to return only the leading zeros and setting everything after the first one to be one. Note that this assumes the matrix is a Boolean; a non-Boolean argument will give odd results.