NYCJUG/2009-09-08/ArrayHabitsHarmful

From J Wiki
Jump to navigation Jump to search

We pondered the following lesson from Joey Tuttle:

How Array Habits Can Be Harmful

. from Joey K Tuttle <jkt@qued.com> . to Programming forum <programming@jsoftware.com> . date Tue, Mar 10, 2009 at 2:34 AM . subject [Jprogramming] Habits can be harmful

I had a recent experience that others might find useful. I have some j code in a Linux #! script (jwork) that takes pairs of files in a sendmail queue (header and body files) and puts them together into a usable email object. Long years of habit had me starting with an empty result and doing something like -

  result =: result, grind files

inside a for. loop -- and then, after the work was done -

  stdout result

By moving the stdout bit inside the for. loop and removing the catenation in the loop, a partial result is catenated to the standard output pipe and the result file is built by adding the results of each iteration in the script. That is, something like

  #  ls q* | jwork > output

causes output to be built as jwork iterates through the files from the list.

This change caused a speedup of 4 or more times because of the simple elimination of copies of the catenated temporary result variable. This made things work a whole lot more reasonably and take a lot less memory - and has the interesting side effect of generating a perfectly usable partial result if interrupted during operation.

The point of this post is that sometimes what seems like "natural array thinking" is counter productive - maybe others know this instinctively, but it was an eye opener for me.


We discussed the trade-offs of an array-based approach - which often requires us to bring an entire object into memory before we can do anything - versus a streaming approach which brings into memory only a modest part of an object.

Sometimes in J we find that the piece of code we wrote which ran well on a small amount of test data has difficulty working with a realistic amount of actual data. In a case like this, we'll often be forced to wrap our nice, small, elegant piece of code in an unpretty loop in order to process a large amount of data in pieces. It would be nice if there were a general, transparent way to make this transition.