Beginner's Regatta: Consolidating Whitespace

How might we replace all whitespace in a line of text with a single, selected whitespace character? In this case, by "whitespace" we mean only spaces and tabs but the category could be expanded.

A Complicated Solution

We look at a simple expression for replacing whitespace in a text string with a single tab character for each expanse of whitespace. This is handy when we have text cut from a web page to be pasted into an Excel worksheet, for example. Note that the tab character corresponds to 9{a..

Our initial implementation was a bit verbose and relies on a standard library function rplc.

spaces2TAB=: 3 : 0"1
   WHITESPC=. ' 	'  NB. SPACE, TAB
   DEST=. '	'          NB. TAB is what we want to end up with
   nonWSP=. -.y e. WHITESPC [ insTAB=. (],[: -.{:) WHITESPC (2 >/\ e.~) y
   x1=. -.insTAB#~x0=. nonWSP +. insTAB
   tabsOnly=. (' ',DEST) (4 : '({.x) (~: ([ #^:_1!.({:x) #)]) y') x1#^:_1 x1#x0#y
   tabsOnly=. tabsOnly rplc (10 9{a.);10{a.     NB. Remove initial TAB on line
   tabsOnly=. ((9{a.)={.tabsOnly)}.tabsOnly
)

Here we see it in action:

   samp=. 'Here  is some    text',(' ',TAB,' '),'with',TAB,'different',('  ',TAB),'kinds of whitespace    embedded in it.   '
   spaces2TAB samp
Here	is	some	text	with	different	kinds	of	whitespace	embedded	in	it.

Check that the resulting whitespace consists only of tab characters:

   TAB-.~spaces2TAB samp
Hereissometextwithdifferentkindsofwhitespaceembeddedinit.

However, another test exposes a bug:

   samp2=. TAB,'Line with    initial ',TAB,'  tab'
   spaces2TAB samp2
Line	with	initial	ta

We see that the final non-whitespace character is dropped incorrectly. It's easy enough to hack a fix for this by appending a space to the end of the input but it's starting to seem too complex.

A Better Solution

Upon reflection, there is a much simpler solution using a common J idiom for boxing a line of text using J parsing rules with the words verb ;:. This is documented in the vocabulary - it actually represents an example of a state machine which is implemented by the dyadic use of the adverb. In the monadic case, the state machine defaults to the J parser so any sequence of text is broken apart into distinct J tokens; in the case of most text, this mostly corresponds to words.

So, for this:

   spaces2TAB=: 13 : '}:;TAB,~&.>;:y'
   spaces2TAB
[: }: [: ; (a.{~9) ,~&.> ;:

Using this on our two test cases:

   spaces2TAB samp
Here	is	some	text	with	different	kinds	of	whitespace	embedded	in	it.
   TAB-.~spaces2TAB samp
Hereissometextwithdifferentkindsofwhitespaceembeddedinit.
   TAB-.~spaces2TAB samp2
Linewithinitialtab
   spaces2TAB samp2
Line	with	initial	tab

This gives us a much simpler verb.

Show-and-tell

We look at a more general solution to a problem we solved in last month's meeting and solutions to a question posed on the forum about prefixing strings.

Tetration

Last month we looked at the convergence of raising i to the i power repeatedly. This sort of superpower operation is called tetration.

From: John Randall <randall@newark.rutgers.edu>
Date: Sat, Apr 6, 2024 at 11:32 AM
Subject: Footnote to "A Brief Convergence"
To: Devon McCormick <devonmcc@gmail.com>

I just wanted to point out another way of getting the limit of i^i^.... (By the way, I think this type of thing is called "tetration".)
Complex exponents can be tricky. If z and w are complex numbers, then z^w is defined to be e^(w log z). By definition, log z is some y such that e^y=z. If y is complex with y = r e^(i theta) and satisfies this, then so does r e^(i theta + 2 k pi i) for any k, and all solutions are obtained in this way. We can get a unique solution y = r e^(i theta) by picking the solution with theta in the interval [-pi/2,pi/2], which I think is what J does. This solution is called the principal value of the logarithm.
If z is the limit of i^i^..., then i^z=z, which means e^(z log i)= e^(log z), or f(z) = log z - (log i) z =0. We can find a root of this by using Newton's method with starting value i.
   require '~addons/math/calculus/calculus.ijs'
   newton=: 1 : 'y-(u y)%0 u sslope_jcalculus_ 1]y' NB. J9 onwards

   z=:0j1&^ ^:(_) 0j1
   f=:{{(^.y)-y*^.0j1}}
   z
0.438283j0.360592
   ]w=:f newton^:(_) 0j1
0.438283j0.360592
   z-:w
1
Here z is the limit as calculated in "A Brief Convergence" and w is the limit of using Newton's method to solve f(z)=0. They are identical.

Prefixing Strings

There was traffic on the J forums a few months ago about how to add prefixes to a boxed list of strings.

The original query was this:

From: Sanju Dutt <dutt.sc@gmail.com>
Date: Sat, Dec 16, 2023 at 1:50 AM
Subject: [Jgeneral] add prefix to a boxed list
To: <general@jsoftware.com>

   names=:'Aileen';'Brittney';'Gail';'Fiona'
   names
┌──────┬────────┬────┬─────┐
│Aileen│Brittney│Gail│Fiona│
└──────┴────────┴────┴─────┘
   'Mrs. ' "0&names
'Mrs. '"0&(<;._1 ' Aileen Brittney Gail Fiona')

I tried different approaches. I want to prefix 'Mrs.' to each name.

The result of this attempt above is essentially an error though none is specified. Sometimes J does this when there is an unrecognized token in an expression as seen here.

   undefined "0&names
undefined"0&(<;._1 ' Aileen Brittney Gail Fiona')

Note that "names" is a bad choice for a variable name as it is a standard cover function for list_z_@nl.

A Solution

It might seem the obvious solution would be something like this:

   (<'Mrs. '),&.>names
+-----------+-------------+---------+----------+
|Mrs. Aileen|Mrs. Brittney|Mrs. Gail|Mrs. Fiona|
+-----------+-------------+---------+----------+

This was the proposal from Paul Jackson who also mentioned this version to avoid parentheses:

   names ,~&.> <'Mrs. '
+-----------+-------------+---------+----------+
|Mrs. Aileen|Mrs. Brittney|Mrs. Gail|Mrs. Fiona|
+-----------+-------------+---------+----------+

Alternate Solution

An interesting alternate solution was mentioned by Bob Therriault:

   'Mrs. ',&.(a:`>) names
+-----------+-------------+---------+----------+
|Mrs. Aileen|Mrs. Brittney|Mrs. Gail|Mrs. Fiona|
+-----------+-------------+---------+----------+

This was a bit puzzling, even to Henry Rich who is deeply involved in maintaining the J source code. However, as he points out, this is a variation of the dual adverb &. known as a semidual.

This relies on using ace (a:) this way:

Semiduals x u&.(a:`v) y and x u&.(v`a:) y
The operation encapsulated by u&.v is used often and is important as a notation to aid thought. When the operation is dyadic, sometimes the sequence of transforming/operating/transforming back should be applied to just one argument, with the other argument being passed unchanged into u. This can be represented by a gerund form where a: indicates which argument is to be used unmodified. The form using &. is defined in terms of the infinite-rank version &.::
x u&.(a:`v) y is x u&.:(a:`v)"(lu,mv) where lu is the left rank of u and mv is the monadic rank of v
x u&.(v`a:) y is x u&.:(v`a:)"(mv,ru) where ru is the right rank of u and mv is the monadic rank of v

Using the semidual avoids boxing the left argument.

Another Alternate Solution

When we covered this material in a meeting, Dan Hirschi commented that he would do this sort of thing this way:

   ('Mrs. ',]) each names

This uses a standard keyword "each" which is a cover function for &.>.

There was some discussion about whether one should use the name or the J symbols for this operation. One participant mentioned using the keyword as kind of an alert: if he sees a lot of these on the same line, it's an indication that the various uses should be combined into a single tacit expression applied with only a single "each". This person also often sees better performance upon making this sort of change. Whether this is some kind of optimization possible with a tacit expression or simply due to less work boxing and unboxing elements is not clear.

Advanced topics

We look at an interesting difference between APL and J's handling of vector arguments to the iota function and learn about fuzz testing.

Vector Argument to Iota

This question popped up on the Reddit group "r/apljk" - a group for the languages APL, J, and K.

Here someone is inquiring about a vector argument to monadic iota (⍳) in APL:

What are some uses of iota accepting a vector of non-negative numbers?
I recently learned that iota can accept a vector, like this:
      ⍳2 3
1 1  1 2  1 3
2 1  2 2  2 3
What are some popular uses of this feature?

Notice that the APL example returns a boxed result of 2-element vectors with the outer dimension of 2 3.

      ⍴⍳2 3     ⍝ Dimensions of entire array
2 3
      ⍴¨⍳2 3    ⍝ Dimensions of each element of array
 2  2  2 
 2  2  2 
      ]box on   ⍝ Show boxing
Was OFF
      ⍳2 3
┌───┬───┬───┐
│1 1│1 2│1 3│
├───┼───┼───┤
│2 1│2 2│2 3│
└───┴───┴───┘

Notice how this result differs significantly from that of the equivalent J expression:

   i. 2 3
0 1 2
3 4 5

Odometer

This response to the question about APL was illuminating:

John_Earnest · 9 hr. ago
In K the equivalent of this is sometimes called "odometer".
It's useful for combinatoric puzzles, cellular automata, and board games; situations where an imperative language might use nested for loops to perform a search, map, or aggregation over a coordinate space.

This is interesting because J also has the same odometer concept though it is not implemented with a primitive. We see in this essay how odometer may be implemented in J:

   odometer=: #: i.@(*/)
   odometer 2 3
0 0
0 1
0 2
1 0
1 1
1 2

There is also this use of the catalog verb, as stated on the J wiki page "The odometer is the opened ravelled catalogue of i.0{x, i.1{x, i.2{x, etc."

   i.&.>2 3
+---+-----+
|0 1|0 1 2|
+---+-----+
    {i.&.>2 3
+---+---+---+
|0 0|0 1|0 2|
+---+---+---+
|1 0|1 1|1 2|
+---+---+---+
      
   >,{i.&.>2 3
0 0
0 1
0 2
1 0
1 1
1 2

Following the description given above, this may be more useful as a simple array:

      >,{i.&.>2 3           NB. Open ravel -> simple mat
0 0
0 1
0 2
1 0
1 1
1 2

For the APL example to return an unboxed result with the outer dimension of 2 3 so, to get the equivalent of the simple J odometer result, we could do this:

Fuzz Testing

This article outlines a simple but powerful testing technique called fuzz testing. The idea of this is to feed semi-random inputs to your programs to see if they break. This is a technique well-suited to J because of our ability to easily generate semi-random sequences and J's functional orientation.

We may have inadvertently stumbled on this technique in the past, as illustrated in a previous NYCJUG meeting where we tested randomly-generated J expressions and discovered a few dangerous ones. The objective then was to answer the question: in a random 10 by 10 table of J tokens, how many adjacent triples – vertically or horizontally – form valid J expressions? Generating and evaluating these randomly generated J phrases turned up some surprises.

Example in J

As an example of fuzz testing, let's look at our earlier code for turning multiple whitespace into single tab characters.

Let's generate a new sample for testing by randomly selecting plain ASCII characters from the atomic vector a.. We ensure that we have a lot of whitespace by prepending multiple instances of tabs and spaces to these low-bit characters like this:

   samp3=. 99 (] { ~ [ ?@$ [: # ]) (50$' ',TAB),32}.128{.a.
   samp3
Mr	XuZ	)d	=z< 3`wp+ }}8	jqD;oQ	d  vP	 ' V9;Q	<S|zA y	i	 7	\	:	 1 gk]^ndh    	l	  	X  J	 .n	F`Z@5?1k
   spaces2TAB samp3
|open quote: spaces2TAB
|spaces2TAB[0]
   samp3=. 99 (] { ~ [ ?@$ [: # ]) (50$' ',TAB),32}.128{.a.
   samp3
Mr	XuZ	)d	=z< 3`wp+ }}8	jqD;oQ	d  vP	 ' V9;Q	<S|zA y	i	 7	\	:	 1 gk]^ndh    	l	  	X  J	 .n	F`Z@5?1k
   spaces2TAB samp3
|open quote: spaces2TAB
|spaces2TAB[0]

Right away we get an error caused by having a single quote in our string. Let's fix this instance by doubling any single quotes and re-trying the test case.

   samp3=. (1+samp3='''')#samp3
   spaces2TAB samp3
Mr	XuZ	)	d	=	z	<	3	`	wp	+	}}	8	jqD	;	oQ	d	vP	''	V9	;	Q	<	S	|	zA	y	i	7	\	:	1	gk	]	^	ndh	l	X	J	.	n	F	`	Z	@	5	?	1k

   samp4=. 99 (] { ~ [ ?@$ [: # ]) (50$' ',TAB),32}.a.   NB. Include higher ASCII chars
   spaces2TAB samp4
 	IF	 	y	Y4	 	 	EB	!	 	 	?	 	 	 	"	 	i	G3w	 	O	 	_	 	#	r	 	]	k	 	 	 	_viB	 	4	 	D	 	6	|	9	 	Q	 	 	 	 	=	3	}	 	Vl	 	 	 	nD	 	 	 	 	$	 	 	 	F	 	@	 	Gz	 	 	 	 	 
   samp4
 	IF y Y4  EB!Æ? φ" i G3w O _  #r ]kҔ _viB 4   D 	6|9 Q    =3}		 Vl	    nD    $    F @ Gz	 	    
   TAB-.~spaces2TAB samp4
 IF yY4  EB!Æ? φ" iG3w O _ #r ]kҔ _viB 4 D 6|9 Q    =3} Vl   nD    $   F @ Gz

In this latter test, we see characters that look like whitespace when shown in a browser on this wiki page. Should we treat these as whitespace too? It would complicate the code but it would depend on the ultimate target for the output. Here we see how these characters render in a Consolas font:

Learning and Teaching J: A Simple Language

In this essay on the Gleam programming language, the author posits language features which are useful for a simple programming language, which Gleam purports to be.

The author begins

I love simple programming languages, like Gleam, Go, and C. I know I'm not alone; there's something wonderful about using a simple language: reading it, using it in a team, coming back to it after a long time, the list goes on.
In this post I want to make this kind of simplicity more precise and talk about some reasons it's important. I propose five key ideas for simple programming languages: ready-at-hand features, fast iteration cycles, a single way of doing things, first-order reasoning principles, and simple static type systems. I discuss each of these at length below.

I imagine most of us in the J community would agree with most of these principles, the exceptions being "a single way of doing things" and, possibly, "simple static type systems". The first of these appears problematic from a philosophical point of view since it conflicts with the generality of a language.

Let's briefly go over what the author means by each of these feautures.

Ready-at-hand

The author begins this section by distinguishing between "presence-at-hand" and "readiness-at-hand" where the former is something (a tool) which occupies our immediate consciousness but, for the latter, "it's like we can't even tell that it's there until we want to use it." This is "...like the idea of a small working memory that limits the number of concepts ... in your mind at once, but generalized to include how we filter out noise ... we're constantly bombarded with." Interestingly, the author connects this philosophy to Heidegger.

It seems Heidegger argued that technology is a mode of revealing the world, that it shapes our understanding of the world and frames our relationship to it.

Iteration Cycles

This section starts by explaining the value of quick iteration cycles.

A very fast iteration cycle (meaning compile times, mostly) is a very nice feature that simple languages aim for. Prototyping and experimentation is very cheap and the developer can stay in a flow-state that a 2+ second compile time would make impossible.

From the array language perspective, it is a little risible that the author would emphasize compilation time but most of us would agree with the emphasis on cheap prototyping and experimentation.

One Way of Doing Things

This is the one we find the most debatable but it may make sense in the context of a compiled language because "[d]esigning a language for fast compile times often means a lot of fancy features aren't possible." The author makes claims some implementations of functional programming are too difficult in practice, citing the example of the Go language's lambdas, but continues "Gleam takes this idea even further. Following its functional lineage, there are no loop constructs, just recursion and things like map and fold."

So, this "one way" seems to refer to a limited feature set rather than enforcing a code smell. The author makes the case for this:

Gleam explicitly makes a small, synergystic feature set the goal, optimizing for fast learning times and ease of reading code. A tagline of the language is that you can learn it in an afternoon. This focus is a big deal, and definitely resonates with what I like about Go as well. You won't understand how useful this is until you experience it for a while yourself.

It would be nice to claim one could learn J in an afternoon but this would be possible only in a very superficial way that does not do justice to the depth of the language.

The example code given for the Fibonacci series does not appear to be what we might call functional in a J sense:

pub fn fib(n: Int) -> Int {
  case n < 2 {
    True -> n
    False -> fib(n - 1) + fib(n - 2)
  }
}

Compare this to the wonderful J phrase (adapted from this essay):

f7=: 3 : '{.{: (+/ .*)/ (+/ .*)~^:(I.|.#:y) 2 2$0 1 1 1x'

Which, in addition to being purely functional, is astoundingly performant:

   6!:2 'f7 x: 1e6'    NB. How long to calculate the millionth Fibonacci number?
0.092191
   f7 x: 1e6           NB. What does the result look like?
1953282128707757731632014947596256332443542996591873396953405194571625257887015694766641987634150146128879524335220236084625510912019560233744015438115196636156919962125642894303370113827800638002767411527927466669865578379318822832061271497583230334854893...
   #":f7 x: 1e6        NB. How many digits is this result?
208988

First Order Reasoning

This desideratum is a bit vaguer than the others.

In academia there's a language called OBJ designed to be like a functional language with no lambdas (technically it's a "term rewriting" language). The scholars behind it argue that higher-order functions are difficult for humans to reason about, and they offer interesting ways to recover much of the expressivity of closures in other (vaguely object-oriented) ways.

For more on this, we are referred here which is an essay on how important type-checking is and how much code it takes to implement.

However, we in the J community might argue against higher-order functions being difficult to reason about if we understand the relation of verbs to adverbs.

Type Systems

This is perhaps the most controversial of the points on this list (from a J standpoint).

One might wonder why Python doesn't make my list. The reason is that python code feels very different to write, because of refactoring. The type systems of Gleam, Go, and C are very helpful when I make big changes to my code, allowing me to not keep everything in my head at once. Python makes me feel lost and alone, feeling around in the dark, wondering what runtime type error I'll discover next.

The author does make allowances for exceptions to strict typing:

Simple languages straddle an interesting balance between expressiveness and restrictiveness. C, Go, and Gleam all offer some form of dynamic typing that's explicitly for a small number of usecases. Then, they all offer a little bit of fanciness to express what you want without dynamic typing: Go interfaces, Gleam's powerful polymorphism, and C preprocessor macros and casts. After that, the type systems are very bare-bones and restrictive. This balance struck by simple programming languages feels very nice to work in. It's like how perfect parents find a balance between their child's freedom and their own control, keeping their child both safe and happy.

OO Taint

To give this argument its full due, we might look at a link purporting to show us how useful typing is. We don't really buy into the argument given here, perhaps because of the contamination of object-orientation making simple things more complex than they need to be.

For instance, here's our old friend Fibonacci as rendered in Rust:

struct Fib(u64, u64);

impl Default for Fib {
  fn default() -> Self {
    Self(0, 1)
  }
}

impl Iterator for Fib {
  type Item = u64;

  fn next(&mut self) -> Option<Self::Item> {
    (self.0, self.1) = (self.1, self.0 + self.1);
    Some(self.0)
  }
}

And, to make things even simpler, here's all you have to do to invoke this code:

let fib = Fib::default()
  .take(10)
  .collect::<Vec<_>>();
println!("fib: {:?}", fib);

The Gleam implementation of Fibonacci given in this second article looks like this:

fn next(state: #(Int, Int)) -> #(Int, Int) {
  let #(a, b) = state
  #(b, a + b)
}

pub fn new() -> iterator.Iterator(Int) {
  #(0, 1)
  |> iterator.iterate(next)
  |> iterator.map(pair.second)
}

This can be invoked like this:

fib.new()
|> iterator.take(10)
|> iterator.to_list()
|> io.debug()

This Gleam implementation, differing substantially from the one in the parent essay, seems to give lie to the "one way to do things" mantra if we interpret this as some kind of convergence of coding style but that does not seem to be what the phrase implies.

NYCJUG/2024-04-09

Contents