NYCJUG/2023-02-14

From J Wiki
Jump to navigation Jump to search

Beginner's regatta

Substring Search

Devon had proposed the following, using the dot operator, to streamline a substring search where we are trying to find an occurrence of the string on the left in any of the boxed items on the right:

    (<'ab')  (+./ . E.)&> 'ac';'cab';'bca';'bab'
0 1 0 1

This works by applying E. (match) to find occurrences of the left argument in the right argument. E. differs from the similar e. by comparing sub-strings in the right argument to the left argument. For instance, we find each occurrence of the two substrings 'co' and 'na' here:

   ('co';'na') E.&>/<'rococo banana'
0 0 1 0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 1 0 1 0

We see above that the result of this sub-string search is reduced by or (+./) to give us a one for any occurrence which is why the shape of the result is the same as the shape of the right argument. However, notice that the or-reduce is attached to match (E.) by a single dot. This is the dot conjunction which applies two functions in sequence where the left function is often a reduction. This is probably most commonly used for matrix multiplication - +/ . * - which multiplies the rows of the left matrix with the columns of the right one then sums each of these row/column products.

More Elegant

Dan replied with the more elegant

    (<'ab')  (1 e. E.)&>  'ac';'cab';'bca';'bab'  NB. quicker?
0 1 0 1

Here we see that this expression simplifies the or-reduction of the first expression by simply looking for the occurrence of any one in the result of the find matches from E..

Eleganter Still

Then Raul blew us all away with

   'ab'  (1 e. E.)S:0  'ac';'cab';'bca';'bab'
0 1 0 1

We see that he incorporates the 1 e. E. that Dan suggested but avoids the &> conjunction by using S:0, or spread. This conjunction applies the verb on the left - (1 e. E.) - to the leaves of the enclosed array on the right. This may be more than we need as this expression works with more deeply-nested right arguments like this:

   'coconut';('banana';'cola');('encode';(<'foo';<'bar';'corn'))
+-------+-------------+------+----------------+
|coconut|+------+----+|encode|+---+----------+|
|       ||banana|cola||      ||foo|+---+----+||
|       |+------+----+|      ||   ||bar|corn|||
|       |             |      ||   |+---+----+||
|       |             |      |+---+----------+|
+-------+-------------+------+----------------+
   (<'co') (1 e. E.)S:0 'coconut';('banana';'cola');('encode';(<'foo';<'bar';'corn'))
1 0 1 1 0 0 1

However this expression also removes the alignment between the shape of the right argument and the shape of the result.


   $'coconut';('banana';'cola');('encode';(<'foo';<'bar';'corn'))
4

Show-and-tell

Bad Memory

In last month's NYCJUG meeting, we noted some odd behavior where J apparently takes a long time to free up memory. As I posted to Henry Rich, here are the steps to reproduce the odd behavior:

RH6=: 6 comb 52
6!:2 'hash0=. 52#."(0 1) RH6' [ smoutput ts0=. qts''
ts0=. (qts''),:ts0

(where "comb" is from the wiki) Compare the difference in times between 6!:2 and the difference of the timestamps in ts0.

Henry got back to me with the following observations:

1. The version of comb that was chosen is memoized and kept a set of tables lying around, holding ~10GB of memory. Using RE Boss's comb saved that.

2. In the phrase (52#."0 1 RH6), the rank is unnecessary. When I removed it the line completed in 0.11 seconds compared to 120sec with the rank included (!)

This second point is one we can all take to heart: avoid unnecessary rank.

The note about the potential costliness of memoization is also illuminating especially considering I had deliberately chosen the memoized version of comb under the assumption that it would run faster.

I also ran this exercise starting with "RH7=: 7 comb 52" and it took over two days to complete the hash. Without the rank, it took less than eight seconds! That's a 16,000 times improvement.

In addition to the J code changes to speed things up, this exercise did uncover an actual bug in the J implementation of at least one use of rank, an "underlying problem in #."n . It was not releasing blocks allocated in subroutines...."

Henry concludes by telling about how this exercise also led to an underlying improvement in the J interpreter. As Henry said

After thinking about it, I realized that I can detect the case of

52 #."0 1 bigarray

when I start to execute #."0 1, and by looking at the ranks of #. I can suppress the rank conjunction, giving you the fastest result even with the rank specified. It'll be in the next version. Note that I can't tell by looking at (#."0 1) that the rank is superfluous - it isn't if x isn't an atom.

Excessive use of rank is a problem for most beginners and even experienced coders like us make the mistake from time to time. For years I have wanted to do something to help, and your example was just what I needed.

Nonce Error Solved

Another problem I noticed last month was this one:

​From: Devon McCormick <devonmcc@gmail.com>
Date: Mon, Jan 30, 2023 at 3:01 PM
Subject: "Nonce error" on monadic "x:"?
To: J-programming forum <programming@jsoftware.com>
   Hi,
   
   Does anyone know why I'm suddenly getting this?
         x: 99
   |nonce error, executing monad x:
   |       x:99

Fortunately, the simple fix was provided by Raul:

​From: Raul Miller <rauldmiller@gmail.com>
Date: Mon, Jan 30, 2023 at 4:46 PM
Subject: Re: [Jprogramming] "Nonce error" on monadic "x:"?
   Nonce error should mean that you have not run install'gmp'
   Maybe we should get some eformat support for the error message reminding people that that's needed.

Miscellaneous

Another binary which now needs to be installed for J is the one for the new LAPACK. We need to get it this way:

      getbin_jlapack2_''
LAPACK binary installed.

Also, it looks like some of the LAPACK routines in J have had their names changed so, for instance, "dgeev" is now "geev".

Advanced topics

What's in a Name?

This article, Taming Names in Software Development, has a lot of suggestions about good naming practices. There's an old joke that the two most difficult problems in programming are naming things, memory management, and off-by-one errors but, like many jokes, it contains some truth.

This author makes the point that a name allows us to neatly encapsulate a complex idea and that a "good name is succinct, evocative, fitting. It reduces cognitive load and stand outs in your mind. Bad names are obscure, misleading, fuzzy or outright lies." The emphasis on reducing cognitive load is practically reading from the J playbook.

The article also argues for a balance between extremely long and extremely short names, saying

In software, really good names are meaningful, descriptive, short, consistent, and distinct. You will notice that ‘descriptive’ and ‘short’ are diametrically opposed. As are ‘consistent’ and ‘distinct’. There is no solution, only tradeoffs.

Descriptive names are safe, legible, clear. They tell you what exactly you’re dealing with, bring you up to speed, don’t require you to be an expert in the codebase or a mind reader. I understand exactly what BasicReviewableFlaggedPostSerializer is on my first time seeing it. But they can also be bulky and unwieldy.

Short names are easy to use, easy to scan, pithy and convenient. They use abbreviations and shorthands to get out of your way so you can focus on the logic. pc_auth_token is so much easier to say than premium_customer_http_authentication_token. But short names can also be confusing and opaque.

The author wraps up this section by warning us that "[b]alancing these opposing principles is what makes good naming so hard" and that he favors names that are "...descriptive and conventional by default, and reserve shorter names for oft-repeated variables and classes." Again this has echoes in J conventions where we use very short names for briefly-used local variables and longer names for higher-level objects like top-level functions or important global variables.

One of the responses to this essay suggested a method for handling names in general:

...one of the things we can do is avoid worrying about it while writing code. Leave ‘naming’ to the ‘editing’ phase thereby separating cognitive load a bit. Also, while coding, things change a lot, so no sense in undue hardship. Make it work, then clean it up.

How Names Can Lie

The essay makes a number of useful suggestions for dealing with names and includes a "name audit" example to show some of these principles in action after making arguments against name complexity in which he again touches on the blight of cognitive overhead and how names can lie to you.

The problem is increased cognitive overhead, developer time wasted deciphering outdated terminology, burnout and buggy code. That last one, buggy code, is especially bad. A common source of bugs is when what you think should happen is badly mismatched with what will happen. Deceitful names are dangerous.

Once I wrote a memorable bug by calling deleteResource() and assuming it would delete the resource. Silly me! I spent the afternoon hunting all over the codebase for the logic flagging a resource as deleted. I naively assumed that logic would live in deleteResource(). No? Well maybe sqlSetResourceDeleted()? Huh. sqlCoreDeleted()? Nada… Ah… there it is, right in prepResourceOperation(). Of course, why didn’t I think of that!?

Remember when I said bad names could be outright lies? A badly-named deleteResource() function will lie right to your face and prepResourceOperation() will stand there not saying a single word.

Ruby Naming Conventions

Apparently the Ruby language has a number of interesting naming conventions as mentioned here.

Conventions communicate intent, both by form and content. For example, Ruby naming conventions recommend classes be written PascalCase (form) , and preferably as nouns, concrete and thing-y (content). So you can see User or CustomerAccount and recognize them as classes. Ruby methods on the other hand should be snake_case, and preferably unabbreviated verbs (e.g. publish, invite_user, find_all). A method ending in an exclamation mark, like archive!, warns that it modifies data when called. A question mark a la archived?, on the other hand, implies the method will return a boolean true or false.

Learning and Teaching J

The new J wiki continues to make progress. Here is what the proposed landing page looks like:

NewJHomePage0.png
NewJHomePage1.png

One important innovation on the new wiki is the use of categories to link together different entries which cover related topics.

Advice from ChatGPT

Here is what the famous ChatGPT says about learning and teaching J.

Tips for learning and teaching J programming language

  • A. Understand the basic concepts and syntax
  • B. Practice regularly and use online resources
  • C. Start with small, manageable projects
  • D. Collaborate with other learners and experts

Array-Language Meetings

Here are some upcoming meetings that may be of interest the J community.