NYCJUG/2020-10-13

From J Wiki
Jump to navigation Jump to search

Meeting Agenda for NYCJUG 20201013

1. Beginner's Regatta: Building an Array Based on a Template

2. Show and tell: Will Gajate will present what he's learned about the new "Fold" conjunction.

3. Advanced topics: considering the Expression Problem

4. Learning and teaching J: Publicity

5. Coda: Bad Graphs?

Beginner's Regatta: Building an Array Based on a Template

Thomas Bulka asked how to build a three-dimensional array based on a two-dimensional template by filling in the replaceable items with vectors going "down" into the plane.

From: Thomas Bulka <thomas.bulka@posteo.de>
Date: Sat, Aug 8, 2020 at 4:13 AM
Subject: [Jprogramming] Constructing arrays by filling "templates"
To: <programming@jsoftware.com>

Hello everyone,

I'm stuck with a problem, which (I think) should be really easy to solve in J, but I somehow am not able to do it. This is what I want to do:

Let's assume I have two arrays:

arr1 =: 1 2 3
arr2 =: 4 5 6

Now I do imagine some kind of "template" for a matrix which looks like this:

X 0
0 Y

What I want to do now is to construct a rank-3 array (shape 3 2 2), in which in each consecutive plane X has been replaced by one element of arr1 and Y has been replaced with one element of arr2. The final array should therefore be:

finalarr =: 3 2 2 $ 1 0 0 4 2 0 0 5 3 0 0 6

[Which looks like this:

 1 0
 0 4
 
 2 0
 0 5
 
 3 0
 0 6

]

I'm looking for a sentence/function, that constructs an array of shape n 2 2 (in the example given above) where n equals # arr1 (we can assume that (# arr1) = # arr2). As I said - I've got the feeling that this should be not too hard to do in J, but somehow I'm stuck.

Thank you very much for your help!

Regards,

Thomas

He received this prompt reply from "xash@λ.land":

(0 0;1 1)}&(2 2$0)"1 (arr1 ,. arr2)

This answer from R. E. Boss followed shortly thereafter:

2 _2 {."0"1 |: arr1,:arr2

The First Answer: How Does it Work?

The first response has unnecessary parentheses around the last term. In general, there is seldom a need for a J expression to have a terminal closing parenthesis because of the right-to-left evaluation order. So, how does (0 0;1 1)}&(2 2$0)"1 arr1,.arr2 work?

Evaluating the latter part of the expression in isolation shows that we start with this array, shown here alongside the correct answer to illustrate that it is a step in the right direction:

   ((2 2$0)"1 arr1,.arr2);finalarr
+---+---+
|0 0|1 0|
|0 0|0 4|
|   |   |
|0 0|2 0|
|0 0|0 5|
|   |   |
|0 0|3 0|
|0 0|0 6|
+---+---+

I found this result to be a little mysterious until I looked up this page explaining the use of the rank conjunction with noun arguments. The case of the pair of nouns arguments is called "Constant Verb" and the page has this to say:

 Creates a verb having rank n, that produces the result m for each n-cell of its argument(s).
 ...
 The verb (m"n) has both dyadic and monadic forms. But they all return the constant result m.

Since n specifies rank 1, the 2 by 2 table on the left is replicated for each of the three rows of the two arrays concatenated column-wise:

   arr1,.arr2
1 4
2 5
3 6

This gives us the all-zero three-dimensional array into which we can insert the two vectors at the positions (0,0) and (1,1) in the original 2 by 2 table of zeros; the two-element indexes work on the three-dimensional array by specifying only the two leading axes; the third axis is implicit in the one-dimensional insertions. The }& variation of insert requires the ampersand to form a derived verb that binds the starting three-dimensional array of zeros as the right argument of the insert. This pairs with the rank conjunction as illustrated at the end of this essay.

Also, from the "Composite Item" page:

 Phrase  m} y is equivalent to  m {"0 1&.|: y , with the restriction that m must be numeric.

So this looks like a way to insert into the last dimension, columns, in this case.

However, I failed to find documentation to explain how the missing left x argument uses the two-column arr1,.arr2 portion of the expression, so I could not fully explain how this expression works until I was able to dissect it using the handy dissect tool.

Phrase Dissection Illustrated

Here is an illustration, using the "dissect" tool, of how the phrase is parsed.

Expression 0 parsed1.png

This makes it look like the insert (}&) is bound to the 2 by 2 table of zeros before the insertion so the rightmost array, the lamination of arr1 and arr2, is available to be used for the monadic variant of insert. In fact, selecting the rank symbol in the tool highlights the symbol simultaneously with the insertion binding }&, as shown above. OK, it's one of those occult binding things that associates these three actors in this way but once you see that, you can't unsee it.

The Second Answer

The second answer 2 _2 {."0"1 |: arr1,:arr2 from R. E. Boss works differently. First notice that the right-hand portion |: arr1,:arr2 is actually the same as arr1,.arr2 so this can be written slightly more simply as 2 _2{."0"1 arr1,.arr2.

This expression relies on the fact that over-taking pads with zeros, so is not quite as general as the first expression. For example, we could build a starting array with values of 99 in the first expression by replacing the right-hand phrase by (2 2$99)"1 arr1,.arr2 but this second expression does not offer that flexibility.

The left-hand portion 2 _2{."0"1 specifies over-taking each scalar (rank 0) item on the right by working on each of its vectors (rank 1 objects). So, for the first row of arr1,.arr2, which is "1 4", 2{.1 gives us 1 0 and _2{.4 gives us 0 4; the two results are concatenated one after the other giving us a two-dimensional result for each row of the right argument.

Show-and-tell: the Fold Conjunction

We will have a presentation on the new J conjunction "Fold", specifically these:

u F:. v Fold Multiple Forward Conjunction
u F:: v Fold Multiple Reverse Conjunction
u F: v Fold Multiple Conjunction


u F.. v Fold Single Forward Conjunction
u F.: v Fold Single Reverse Conjunction
u F. v Fold Single Conjunction


This table compares the variants of the J conjunction to their equivalents in the k language along with some examples of use.

Advanced Topics: The Expression Problem

There is something called "The Expression Problem", as seen here, which is a phrase describing a problem for which both OOP (Object-oriented programming) and functional programming seem to have trouble addressing satisfactorily. This example purports to illustrate the shortcomings of both approaches:

 Suppose you want to specify a number of simple planar shapes, such as rectangles and circles, and be able to compute the area for each one given one or more parameters.

A Functional Approach

In a functional specification of this problem, we might define a data type like this:

 type Shape = Square of side
             | Circle of radius

We could then define an area function to address each of these cases, something like this:

 define area = fun x -> case x of
     Square of side => (side * side)
     Circle of radius => (3.14159 * radius * radius)

Clearly this is working in the context of some arbitrary functional language.

An OOP Approach

In object-oriented fashion, one might define a data type like this:

 class Shape <: Object
   virtual fun area : () -> double
 
 class Square <: Shape
   side : double
   area() =  side * side
 
 class Circle <: Shape
   radius : double
   area() = 3.14 * radius * radius

The Expression Problem

The problem shows up when you have to extend these definitions to include new shapes or calculations associated with them. For instances, adding a triangle in the OOP approach is as simple as adding a new class but but is more onerous in the functional approach because you will have to edit every function working with a shape parameter.

However, if you want to add a new calculation of the perimeter for each shape, it is simpler in functional programming because you simply add a new "perimeter" function but OOP requires adding this calculation to each of the classes.

This problem is part of a larger class of problems known as "cross-cutting concerns" where the "concerns" in this example are the shapes and the calculations you want to perform on each.

Suggestions to Address the Expression Problem

The aforementioned statement of this problem mentions that many languages include ways to address it such as functions that can be extended with pattern matches, data types that can be extended with new patterns, open polymorphic functions that apply to an open set of classes, and something called "PredicateDispatching". The essay also expounds on the implications of this problem for both paradigms but devotes more time to the OOP side of things. However, the author notes that this does not imply that functional programming better answers the problem. There is also a link to how one might avoid the problem in C++.

Another Approach

Coming from a functional language like J, we might note that the "functional" approach sketched out above looks to be heavily influenced by the OOP paradigm. It looks nothing like the way we would approach the problem in J since we do not customarily define new data types. Also, the in-built array orientation of J suggests one possible way to address it.

Suppose we rely on a more data-centric approach by bundling together our shapes and their respective calculations into a table like this:

   'hdr shapes'=. split <;._1&>TAB,&.><;._2 ] LF (] , [ #~ [ ~: [: {: ]) CR-.~0 : 0
Shape	Area	Perimeter	Parameters
Square	([:*:{.)	(4*{.)	length of side
Circle	([:o.[:*:{.)	([:+:[:o.{.)	radius
Triangle	([:-:[:*/2&{.)		base, height
)

   hdr
+-----+----+---------+----------+
|Shape|Area|Perimeter|Parameters|
+-----+----+---------+----------+

   shapes
+--------+--------------+------------+--------------+
|Square  |([:*:{.)      |(4*{.)      |length of side|
+--------+--------------+------------+--------------+
|Circle  |([:o.[:*:{.)  |([:+:[:o.{.)|radius        |
+--------+--------------+------------+--------------+
|Triangle|([:-:[:*/2&{.)|            |base, height  |
+--------+--------------+------------+--------------+

This structure allows us to look up the shape in the first column of shapes and select the desired calculation from the hdr vector. Once we know which shape and which calculation we want, we can select the formula from the shapes table and apply it to the given parameters.

Clearly there still are problems. In the case of the triangle, I don't think we can calculate a perimeter given only the length of the base and the height but this would be a problem with any of the above proposals. As far as it goes, this J solution does at least present the elements necessary to address the problem in a compact, easily modifiable form. Perhaps more importantly, it subsumes much of the tedious boiler-plate code into generalized array operations: it puts the burden on the data structure which is static, easily viewed, and easily modified.

Thinking more about one might want to extend this exercise, how would any of these schemes handle an arbitrary planar shape defined by a set of two-dimensional points? We already have a simple solution in J using what Wikipedia calls the "Shoelace formula" but which is also known as the "land surveyor's algorithm". This algorithm was the subject of a NYCJUG meeting on 11/13/2012 and is quite interesting and solves an infinitely larger set of shape/area problems than the approaches outlined above. The perimeter calculation might also be amenable to this more general statement as well but at a high computational cost.

In fact, thinking about the problem this way might lead us to consider curved shapes much differently from shapes that can be defined with point pairs. For instance, in researching formulas for these two calculations for a variety of shapes, I discovered that the perimeter calculation for an ellipse is considered to be rather difficult. Well, maybe not that difficult.

Learning and Teaching J: Publicity

At our last meeting, we talked about some under-used venues for J exposition. In particular, we saw that both Twitter and Reddit have J sub-groups that are lightly populated but that reach people who do not subscribe to the J forums and are otherwise outside our insular community. At that meeting, and since then, I have urged J aficionados to be more active on these sites.

Since then I have tried to lead by example and I have posted some links on those two sites. These are links to some work I did a while ago but remains largely unknown to the broader programming community. It occurs to me that we have a wealth of material demonstrating the virtues of J but that few people outside the array-programming world know about it. In putting together the sections for today's meeting I myself discovered parts of the language I did not know about - such as the noun uses of "rank" - and some very useful pages on the J wiki such as one on solutions to geometry problems.

There are also new trends for which we are well-positioned to contribute, such as topics in machine learning and one I've noticed gaining traction recently: data-oriented design; see this essay which contrasts it to OOP. This is at least a cousin to array-oriented programming, an idea also gaining popularity recently; see this essay about getting rid of loops by using NumPy data structures.

Another reason for publicizing existing work is the knowledge of how search algorithms rank pages based on the number and quality of links to them.

Coda: Bad Graphs?

This is not a J topic per se but speaks to the idea of presenting data graphically. Recently I learned about a type of graph known as an "alluvial flow" diagram, also known as a "Sankey diagram". This way of presenting data is based on data-rich maps showing how the flow of a river evolves over time, such as one shown here:

Themeandermap.jpg

However, examples of these diagrams range from useful to incomprehensible as seen in these examples.

Here is perhaps the most famous of this type of graph which was popularized in the seminal book "The Visual Display of Quantitative Information" by Edward Tufte. This is a six-dimensional (?) illustration of the reduction in Napoleon's troops over the course of their disastrous invasion of Russia in the early 1800s.

Minard.png

This illustration of where energy goes in power generation by a steam plant seems useful as well.

1280px-Sankeysteam.png

However, like any information graphic, the format can easily be abused. Here is an attempt to classify the survival of different types of passengers in the Titanic disaster. Looking at it, can you easily figure out how many 1st-class female children survived?

Alluvial diagram of the Titanic dataset 1 ov7EnESNigBdhdJ3NYBoBg.png

What struck me when searching for "alluvial flow diagram" is that there a number of software packages for generating these but little to no critical evaluation of how effective they are or guidelines for how they should be structured for good comprehension of the data being presented.