From J Wiki
Jump to navigation Jump to search

Beginner's regatta

Bob Therriault tells us about what has changed in the revised J Primer. He is interested in feedback on this. We will also look at the very useful function dyadic I. and what it does.

J Primer

I suppose that the first question that should be answered is 'Why update the J Primer?'

Nycjug primer 1.png

Aside from updating the content - only oldtimers remember the J Form Editor - most of the changes were to improve readability and navigation.

Nycjug primer 2a.png

Particularly of note is simplification of navigation for both the top

Nycjug primer 3a.png

and the bottom sections.

Nycjug primer 4a.png

As well as clearer highlighting of example code and primitives.

Nycjug primer 5a.png

It's important to have a number of easily accessible introduction for beginners. In the J Primer, a newcomer can start from scratch in J and progress all the way to implementing a basic GUI application.

Now that the J Primer has been updated to work with current versions of J in the JQt environment, we hope that newcomers can be directed towards the J Primer as a learning tool and have reduced frustration due to out of date references.

For your consideration, we present the Updated J Primer.

Dyadic I.

Only recently did I take a look at dyadic I. "interval index" and I was able to solve a moderately complicated little problem with ease using it. Some of the context of this problem is given below in Show and Tell.

Getting File Data in a Useful Form

I have a large number of files containing directory listings of backup discs I've made over the years. They look something like what is shown here in an elided form:

   Volume in drive D is Ph060414_0122
   Volume Serial Number is 8989-8C54
   Directory of D:\ 
  04/13/2006  08:48 PM    <DIR>          amisc
  04/13/2006  04:16 PM    <DIR>          Data
  04/05/2006  02:18 PM            37,088 .emacs
                 1 File(s)         37,088 bytes  
   Directory of D:\amisc\J 
  04/13/2006  08:48 PM    <DIR>          ..
  04/29/1999  02:24 AM               154 util.txt
                32 File(s)        487,914 bytes 
   Directory of D:\amisc\pix\Photos\2006Q2 
  04/14/2006  01:16 AM    <DIR>          20060413
  04/13/2006  12:54 PM    <DIR>          20060412-13
                    0 File(s)              0 bytes  
   Directory of D:\amisc\pix\Photos\2006Q2\20060412-13    
  04/13/2006  08:48 PM    <DIR>          ..
  04/12/2006  01:41 PM         3,036,574 DSCF7064.JPG
                  202 File(s)    534,786,174 bytes 
   Directory of D:\amisc\pix\Photos\2006Q2\20060413 
  04/14/2006  01:16 AM    <DIR>          ..
  04/14/2006  01:16 AM         4,740,394 pspbrwse.jbf
  04/13/2006  01:07 PM         3,252,373 DSCF7379.JPG
                  413 File(s)  1,137,214,458 bytes 
   Directory of D:\temp\WD\JDrive   
  04/13/2006  06:16 PM    <DIR>          ..
  03/16/2006  01:49 PM           328,900 FLSZS.DAT
                    7 File(s)      2,636,746 bytes 

This directory listing is the result of the DOS command "dir /o-d /s ..." which orders entries within a directory by ascending date and recursively lists all sub-directories as well.

Notice how each sub-directory section is preceded by the text Directory of ... and ends with text like 413 File(s) .... Using these text strings, I want to extract what's between each pair of starting/ending text indicators. This is complicated by the fact that I only care about certain sub-directories, only those under "\amisc\pix\Photos".

To match a string of text on each line, I used e.'s (member) bigger sister E. (find matches). Whereas e. works on atoms, E. understands vectors. For example,

  'hi' E. 'hihohi'
1 0 0 0 1 0
  'hi' e. 'hihohi'
1 1

We see that E. lets us search for occurrences of vectors unlike e. which works on scalars.

To extract the data we want, we first read in the file and break it into a vector of lines of text based on the assumption that line-feed characters terminate each line.

   #fl=. fread 'Ph060414_0122.dir'
  $fl=. <;._2 ] LF (],[#~[~:[:{:]) CR-.~fl
| Volume in drive D is Ph060414_0122| Volume Serial Number is 8989-8C54||

So we have about 1.3 million bytes and over 27 thousand lines in this case and we can see that the first three lines are the same as the first three lines of the sample file as shown above.

Extracting Information Based on Start/End Strings

In order to extract information about selected sub-directories, we need to find the start and end line number of the sections in which we are interested.

We locate the start of a directory section by finding occurrences of the string "\amisc\pix\Photos\" since these are the only ones we care about.

   #whst=. I. +./&> (<'\amisc\pix\Photos\') E.&.>fl  NB. Start of sub-directories of interest
21498 21506 21714

Now that we have our three starting line numbers, where does each end?

We find the end of all sub-directories by searching for the string " Files(s) ".

   #whend=. I. +./&> (<' File(s) ') E.&.>fl          NB. End of each sub-directory listing

We have many more ending than starting positions. How do we find the endings which correspond only to the starting lines of interest? Why, by using monadic I. or "interval index".

   whend I. whst
971 972 973

This indicates that the starting points are in the intervals indexed by the result. That is,

   whend{~whend I. whst            NB. Which end-points correspond to these starting points?
21504 21712 22131
   whst,.whend{~whend I. whst      NB. Show each start/end pair
21498 21504
21506 21712
21714 22131

Having these points makes it easy to extract only the sections of interest from the large file or simply to count the number of entries in each of the sections of interest by subtracting the starting point from the ending point (and subtracting 4 to account for the two header and one trailer lines as well as the entries for "." and "..").

   4-~whst-~whend{~whend I. whst
2 202 413

We can verify the first one of these simply by inspection of the source file:

 Directory of D:\amisc\pix\Photos\2006Q2

04/14/2006  01:16 AM    <DIR>          20060413
04/13/2006  08:48 PM    <DIR>          .
04/13/2006  08:48 PM    <DIR>          ..
04/13/2006  12:54 PM    <DIR>          20060412-13
               0 File(s)              0 bytes

We see that there are indeed only two entries for this sub-directory, discounting the ubiquitous directory entries for "." and "..".


Ed Gottsman will demonstrate his information perusal tool J Viewer which is a J GUI application for delving into the various facets of the J wiki, forums and J code on GitHub.

Searching the Wiki, Forums and GitHub

The J Viewer, which supplies fast, fluid access to the J Wiki and the J Forums, now supports full-text code/English search of the Wiki, the Forums, and most of the jsoftware section of GitHub. The image below shows a full-text search for /:/: with the results divided into categories. We're actively looking for beta testers to help us resolve any remaining bugs--please send an email to if you're interested in participating. (Thanks.)

If you do happen to try out the beta, there is a video demo that may give you some more areas to explore.

Screenshot 2023-09-12 at 9.36.53 AM.png

Reconciling Large Sets of Files

Lately I've been disposing of the two thousand or so DVDs, CDs, and BluRays I've been using as a primary long-term backup medium. Unfortunately I can't simply refill the recycling bin downstairs and be done with it, oh no, I've got to look over each disc before I get rid of it to ensure I'm not losing my only copy of something important.

This effort requires comparing what's on each disc with what I have on my main RAID drive to figure out what I'm missing. Fortunately, when I created most of these discs, I also copied all the directory information for the whole disk to a uniquely-named file. This allows me to figure out what I may need to copy from a disc without actually having to read it unless it has files that I need.

This effort led to my discovery of the usefulness of dyadic I. as shown in the Beginner's Regatta above, as well as the "toggle" expression examined below in Advanced Topics.

Much of what I did is specific to this project and not of general interest, so I will show just the result of one of the tools I developed. The display below searches a directory file - as seen in the Dyadic I. section above - for files under two specific directories - "\amisc\pix\Photos" and "\amisc\pix\Sel" - and counts the number of .JPG files both there and on the RAID drive.

Here is an example of running this for files offloaded on 1/20/2010. The df variable is set to the stem of the name of the directory listing file which is assumed to be in the current directory.

      TYPE ZvsCD dirFlNm=. 'D:\amisc\ofldinfo\',df,'.dir'[smoutput TYPE=: ;TYPES{~-.TYPES i. <TYPE [ df=. 'Ofld100120_2230'
|0 |D:\amisc\pix\Sel\2006Q1            |3 |
|25|D:\amisc\pix\Sel\2006Q1\20060322   |25|
      TYPE ZvsCD dirFlNm=. 'D:\amisc\ofldinfo\',df,'.dir'[smoutput TYPE=: ;TYPES{~-.TYPES i. <TYPE [ df=. 'Ofld100120_2230'
|0  |D:\amisc\pix\Photos\2009Q3              |9  |
|37 |D:\amisc\pix\Photos\2009Q3\20090831-1010|39 |
|111|D:\amisc\pix\Photos\2009Q3\20090825     |112|
|372|D:\amisc\pix\Photos\2009Q3\20090829-30  |373|
|173|D:\amisc\pix\Photos\2009Q3\20090902-03  |174|
|98 |D:\amisc\pix\Photos\2009Q3\20090903     |99 |
|132|D:\amisc\pix\Photos\2009Q3\20090904     |133|
|134|D:\amisc\pix\Photos\2009Q3\20090910     |135|
|252|D:\amisc\pix\Photos\2009Q3\20090907-08  |84 |
|351|D:\amisc\pix\Photos\2009Q3\20090909     |124|

In the first invocation we look at files under "\amisc\pix\Sel" and see that the numbers match so there is no reason to read the actual disc to retrieve these files. The second invocation toggles the directory of interest to "\amisc\pix\Photos" and shows similar results though the numbers differ slightly. Small differences like this exist for a number of reasons so there is again no point in reading the disc.

This helped speed up the selection process considerably since it avoids many fruitless readings of the discs.

Advanced topics

Here we revisit a topic we've touched on in other meetings: solving a problem by exhaustive versus generative methods.

Finding a Special Number

Recently there was an exchange on the forums to solve this problem from the Quora site, relayed by Skip Cave: Can you arrange the digits 1-9 to make a nine-digit number such that the first digit is divisible by one, the first two digits are divisible by two, the first three divisible by three and so on?

Initial (Exhaustive) Solution

Skip found the answer like this:

perm=: i.@! A. i.     NB. From
to=: [+[:i.[:>:]-[    NB. My guess based on context
NB. ea=.&.>           NB. Don't bother

Skip uses a cover for "each" (&.>) but I prefer the raw J. He also uses perm which I found defined on the J wiki, and on, for which I deduced a likely definition based on the context in which it is used below. I also eliminated a spurious assignment, giving us this:

   {:"1 at#~*./"1[0=a|"1 at=.>10#. &.> (a=.362880 9$1 to 9){. &.> { >:perm 9

Checking that this answer is correct, we break it into its numeric digits, take the cumulative product of those digits and check that each of these products is evenly divisible by one through nine:

3 8 1 6 5 4 7 2 9
3 24 24 144 720 2880 20160 40320 362880
   (>:i.9)|*/\;".&.>":381654729         NB. Should have all zeros
0 0 0 0 0 0 0 0 0 

Taking a look at how this works, we see that this is an exhaustive approach: generate all possible nine-digit numbers with no zeros, with >:perm 9 and filter out the ones not meeting the criteria. We can guess that this is exhaustive by noticing that the number 362880 is !9 which is the total number of combinations of nine things.

This approach can be expensive as seen here; running the expression ten times gives an average of 2.8 seconds on my machine.

   (10) 6!:2 '{:"1 at#~*./"1[0=a|"1 at=.>10#. &.> (a=.362880 9$1 to 9){. &.> { >:perm 9'

This expense is negligible considering this is needs only to be run once but, as we will see, there are much more efficient ways to do this.

Other (Generative) Solutions

Progressively Building a Solution

A reply from Raul Miller shows a different kind of solution. Rather than generating all possibilities and filtering them down, this code builds up a solution a digit at a time by starting with all possible digits for the last digit of the result because any number based on a permutation of the digits one through nine is divisible by nine (otherwise the problem is unsolvable because of "casting out nines"[1]..

Raul Miller via 
Sun, Sep 10, 6:09 PM (2 days ago)
to programming

I guess to simplify it, I would filter the possibilities at each step.

first digit divisible by 1:
   N1=: 1+i.9

Second digit divisible by 2:
   N2=: ;(10*N1)+each (<2*1+i.4)-.each N

Third digit divisible by 3:
   digits=: 10&#.inv
   N3=: (#~ 0=3|]);(10*N2)+each (<1+i.9) (-. digits) each N2

And, so on... and since we're basically doing the same thing at each
step, we could encapsulate and parameterize this step as a function.

   F=: {{(#~ 0=x|]);(10*y)+each (<N1) (-. digits) each y}}
   9 F 8 F 7 F 6 F 5 F 4 F 3 F 2 F N1

   timespacex '9 F 8 F 7 F 6 F 5 F 4 F 3 F 2 F N1'
0.0006679 38400

Fixing a small typo in the definition of F and timing this on the same machine as for the above timing, we get this:

   digits=: 10&#.^:(_1)
   F=: {{(#~ 0=x|]);(10*y)+each (<>:i.9) (-. digits) each y}}
   $2 F >:i.9
   9 F 8 F 7 F 6 F 5 F 4 F 3 F 2 F >:i.9
   (10) 6!:2 '9 F 8 F 7 F 6 F 5 F 4 F 3 F 2 F >:i.9'

So this generative solution is about 6,000 times as fast as the exhaustive one.

Thoughtful Generation

Ben Gorte asked

If we do a bit of work by ourselves? Consider that the 5 has to be in fifth position (starting from 1), and that there must be even digits (four of them) on the even positions. The other four odd digits go to the remaining odd positions. Then there are only 576 possibilities left.

He proceeds to build the solution like this:

   odds =: (i.24) A. '1379' NB. there's a better trick, is there?
   evens =: (i.24) A. '2468'
   all =: ,/ odds {{ }. x 1 3 7 9 } y 2 4 6 8 } '0oeoe5eoeo' }}"1/ evens
   all #~ (1+i.9) {{*./ 0 = x | ". x {."0 1 y }}"1 all

Like Raul, Ben made use of direct definition by including a phrase in double curly braces.

Combining the final two lines of this definition, we see that it performs very well.

   (10) 6!:2 'all #~ (1+i.9) {{*./ 0 = x | ". x {."0 1 y }}"1 ,/ odds {{ }. x 1 3 7 9 } y 2 4 6 8 } ''0oeoe5eoeo'' }}"1/ evens'

Using Power

Michael Day also submitted a direct definition solution which uses the power conjunction (^:).

   if. 10=nd =. >: #":{.y do. y return.
   elseif. 2 = nd do.
      y  =.1 3 5 7 9 end.
   par=. 2|<:{:y    NB. plagiarising Ben Gorte's idea re odds & evens!
   q  =. q#~(=<.) nd%~ q=. ,(par+2*i.5) +/ 10*y
   q#~(-:~.)@":"0 q

This gives the answer very quickly.

   app^:_] 0
   (10) 6!:2 'app^:_] 0'

Speed Champ

A later entry from Pascal Jasmin seemed to hold the speed record.

   I =: ]F.: NB. insert using fold for shape flexibility
   (10) 6!:2 '(>:i.9) ([ (] #~ [ = 10&#.inv(#@~."1@:)@]) [ (] #~ 0 = |) (>: i.9) +("1 0)(,@:) 10 * ])I 9 8 7 6 5 4 3 2'

This is about twice as fast as the next fastest.

However, Pascal then topped himself with this improvement of about 10%:

   BASE10DIVbyINDEX =: '';(>:i.9); 2 4 6 8;(>:i.9); 2 4 6 8;5;2 4 6 8;(>:i.9);2 4 6 8;(>:i.9)
   v =: ([ (] #~ [ = 10&#.inv(#@~."1@:)@]) [ (] #~ 0 = |) (BASE10DIVbyINDEX {::~ [) +("1 0)(,@:) 10 * ])I
   (10) 6!:2 '(>:i.9) v 9 8 7 6 5 4 3 2'

A Stochastic Method

Jose Quintana submitted a semi-exhaustive solution that randomly samples the search space until it finds an answer.

   o=. @:
   entry=. 0&{::
   try=. 1&{:: 
   sample=. (>: o ?~ o 9:) ; 1 + try NB. counting tries to avoid a premature convergence
   good=. (-: 0&*) o ((1+i.9) | 10&#:^:_1\)
   tries=.  ', random tries ' , ": o try
   answer=. 'Answer ', ": o (10&#:^:_1) o entry
   while=. ^: (^:_) NB. parentheses are not necessary for the latest generation
   solve=. (answer , tries) o (sample while (-. o good o entry)) o (0 ;~ i.o 9:) f.
Answer 381654729, random tries 295587
   (10) 6!:2 'solve'''''

We see that this performs better, on average, than the fully exhaustive solution by a factor of more than two.

General Conclusions

Exhaustive solutions are often attractive because they are apparently correct because if we start with all possibilities, the one we care about has to be in there as well. Also, in a loose language like J, it's trivially easy to whip up a monster solution with very little effort. There's an old saying that a day of programming can replace an hour of thought.

An exhaustive solution is good as far as it goes but its Achilles heel is poor scalability. What if the proposed number is in hexadecimal? Unlike the !9 (362880) possibilities for this problem, we would have !15 (1.3e12) for the hexadecimal extension of it.

Though a generative solution will nearly always outperform an exhaustive one, it requires more thought up front. So, especially for limited utility code like this one for finding a single number once, it seems the short time to code an exhaustive solution may overwhelm the time it takes to think carefully about the problem. In general, we have to trade off up-front design time against time lost from inefficient computation. For a problem like this, it would seem all that beautiful efficient code was wasted time since in only took 2 seconds to run the exhaustive version.

However, if we had been asked to find a 15-digit hexadecimal number with the same properties as the 9-digit one, the trade-off would shift strongly to favor up-front design.

How much time might the exhaustive solution take in this case, assuming we could straightforwardly extrapolate our existing timings?

Learning and Teaching J

This is not so much about teaching J as warning that some simple code can have unexpected behavior in part due to J's promiscuity: the basic tokens of J are easy to mix together to get code that runs. This is often a positive but there are times where J's willingness to do something with such a wide variety of inputs can lead to unexpected behavior if apply edge cases.

A Toggle Expression

I had a situation where I needed a global variable to toggle between two different text strings each time I re-executed a particular command. By toggling the global variable TYPE between the strings photos and sel made it easy to run the two cases in which I was interested.

The initial assignment sets two globals: TYPES which is a vector of the strings between which we are switching and TYPE which is one or the other of the strings.

TYPE=: ;{. TYPES=: 'Photos';'Sel' NB. Init to 'Photos' 
TYPE=: ;TYPES{~-.TYPES i. <TYPE   NB. Switch type each time

So, it's easy to run for the two types by repeating the same code, like doSomething TYPE=: ;TYPES{~-.TYPES i. <TYPE, twice in a row. Most of the time, both results are negative which tells us we can get rid of this disc and move on to the next. However, sometimes one or the other type will have to be dealt with case-by-case, depending on the output of our doSomething function.

Testing an Expression

We see that the toggle above works by indexng the current value of TYPE in the vector of allowed strings, then re-assigning TYPE by indexing with the logical not of the result of the look up. This works because there are only two items so the allowable indexes are zero and one which are Boolean values to which we can apply logical not -.. However, the monadic domain of this verb extends to integers, so you start to get very odd results if you add items to TYPES without changing the toggle code.

First, let's see how the original toggle code works in action by re-running the same line a few times and displaying the result.

   ]TYPE=: ;{. TYPES=: 'Photos';'Sel'  NB. Initialize TYPE to 1st item
   ]TYPE=: ;TYPES{~-.TYPES i. <TYPE    NB. Switch type each time
   ]TYPE=: ;TYPES{~-.TYPES i. <TYPE    NB. Switch type each time

So we have the toggle we want: every time we assign it this way, the value flips between the two strings. What if we add a new item to TYPES but do not change the toggle code to accommodate the change?

   ]TYPE=: ;{. TYPES=: 'Photos';'Sel';'Foo'  NB. Initialize to 1st item

It seems to make no difference if we initialize TYPE with either of our original two values. What if we start with the new item?

   ]TYPE=: ;{: TYPES=: 'Photos';'Sel';'Foo'  NB. Initialize to last item

Now our toggle does not work because we are stuck on the one value. Why is this? The fault is in J's extension of logical not to be 1-n for any integer n. This conveniently switches between one and zero for Boolean arguments but does something else entirely for integer arguments greater than one.

   TYPES i. <'Foo'
   -.TYPES i. <'Foo'

So, applying -. to the index makes it a negative number which promiscuous J is happy to use to select from an array selecting, in this case, the _1{ or last item in TYPES.

What happens if we extend TYPES further?

   ]TYPE=: ;{: TYPES=: 'Photos';'Sel';'Foo';'Bar'

It cycles between the latter two values if we start with one of them. What if we start with one of our original values?

   ]TYPE=: ;{. TYPES=: 'Photos';'Sel';'Foo';'Bar'

The good news is that the original behavior is the same. The bad news is that it's not at all obvious what the toggle rule is as you extend TYPES.

Odd Behavior Explained

Eventually, by experimenting with ever longer versions of TYPES, we found a way to describe the behavior of toggle when applied to vectors longer than two items.

Looking at these experiments, we see that the toggle only ever toggles between two values or remains the same. The reason for this difference has to do with whether TYPES has an even or odd number of elements. For an odd number, some value will only toggle to itself but other initializations toggle between two values.

   ]TYPE=: ;{: TYPES=: 'Photos';'Sel';'Foo';'Bar';'Fiz';'Buz';'Golly';'Jee'

So, ignoring the first two items, we see that we are toggling between the first and last values of the latter six Foo and Jee.

Initializing to the 3{ value of Bar, we see that the values toggle between this and the second-to-last item Golly.

   ]TYPE=: ;3{ TYPES=: 'Photos';'Sel';'Foo';'Bar';'Fiz';'Buz';'Golly';'Jee'

Finally, the last pair of toggle values is the 4{ and the 5{ of TYPES.

   ]TYPE=: ;4{ TYPES=: 'Photos';'Sel';'Foo';'Bar';'Fiz';'Buz';'Golly';'Jee'

So, we can see here how toggling behaves depending on the initial value selected.

Diagram of weird toggle behavior.jpg

This can be extended obviously as the vector lengthens: each new pair toggles between opposite values other either end (of all but the first pair in the vector) and adding one new value to the domain also does this except there's now a middle element which toggles to itself.

Domain Error?

What if we initialize TYPE to an illegal value? The toggle behaves well in that it reverts to its default behavior between our two first choices.

   TYPES=: 'Photos';'Sel';'Foo';'Bar';'Fiz';'Buz';'Golly';'Jee'
   TYPE=: ' not a known string'


The obvious message here is that simple, quick and dirty code can lead to unexpected results if its arguments are unexpected. The more subtle warning is to pay attention to J's generous extensions to different data types.

End Matter

Ewart Shaw recently sent a note to the J forum to mention that he has recently used J to design book covers for a series of math books published by the UK Maths Trust.

Here's his most recent one which displays a recently discovered aperiodic tiling by turtles as discovered by amateur mathematician David Smith in 2023.

Mathematical Olympiad Primer II - cover by Ewart Shaw with J.jpg

Mr. Shaw has been famous in the J community because of his one line Mandelbrot in J in his signature. However, I noticed that his signature code looked different this time so I ran it to see what it does:

   3  ((4&({*.(=+/))++/=3:)@([:,/0&,^:(i.3)@|:"2^:2))&.>@]^:(i.@[)  <#:3 6 2
|0 1 1|0 0 0 0 0|0 0 0 0 0 0 0|
|1 1 0|0 1 1 1 0|0 0 0 1 0 0 0|
|0 1 0|0 1 0 0 0|0 0 1 1 0 0 0|
|     |0 1 1 0 0|0 1 0 0 1 0 0|
|     |0 0 0 0 0|0 0 1 1 0 0 0|
|     |         |0 0 0 0 0 0 0|
|     |         |0 0 0 0 0 0 0|

We see that this implements a cellular automata as made popular by John Horton Conway's well-known game of Life. In this example, it produces three iterations of the configuration starting with the R pentomino[2], known as the most active configuration of fewer than six live cells.

At the meeting, someone commented that this code expands the size of the grid as the automata grows unlike many implementations which maintain a fixed grid that wraps top to bottom and side to side, effectively acting as a torus rather than as an infinite plane. One could speculate that this might trace back to the difficulty of dealing with a changing grid size when using a compiled language, which may have led to this change from the original specification.