Vocabulary/ArrayProcessing

From J Wiki
Jump to: navigation, search

Back to: Vocabulary

Array Processing

How do you process an array?

You can view this question in two ways:

  • How do you (currently) process an array?
  • How might you ideally process an array, if you could throw away the crutches of crippled thinking?

Data comes in arrays.

Well no, it doesn't always. But when it doesn't (as it doesn't with SQL or COM) you know deep down inside it really should.

So data processing is array processing. At least it is up there, in programmers' heaven.

But just try to collect your data into one big array in preparation for a really bold approach, and you find that SQL breaks your fingers for you upfront, right from the word go. Every finger bar the first, which leaves you only able to pick through records one-by-one.


How do you currently do it?

Other people's languages have arrays. They have functions that work on an array, call it A. Function f say, which calls function g. Some import the whole array A upfront into the namespace of the function f and then expect you to chip away at it with a for-loop.

Sometimes you're acutely aware of having two copies of A. Sometimes the language is smart enough not to pass a snapshot of A to f, but only a pointer to its commencement in memory. Some people think that's as smart as it's possible to be, when passing arrays between functions. But it means, to make sense of the world at the end of the pointer, you need a pointer-based template which you push around by dead-reckoning in carefully-computed steps in A-space.

Are the numbers in array A bytes, integers or floating point? Signed or unsigned? What precision? Your template needs to know all that if your dead-reckoning isn't to leave you dead in the water. You cross your fingers that the guy who built the A-table was using the same header file that you did, at the same level of update.

Something as simple as an integer can make it all come unstuck when you move your code to a different platform. In 10 years the integer in that header file can swell from 16-bit to 32-bit, can swap its bytes, and suddenly start paying attention to its sign-bit.

If you're lucky, g has been written to expect exactly the same kind of array as f. Think of all the possible arrays in the world! Are you still feeling lucky?

And if, at last, you manage to point your one good finger at a valid number, does that number mean what you hope it does? Is it inches or centimetres? Spacecraft have blundered into Mars due to wrong assumptions in that department.

Okay, you've got all that right? Now remember that if A has 256 entries, the last entry is not  A[256] but  A[255] . Something like 1 in 3 programming bugs boil down to getting that wrong.


How would it be nice to do it?

J is an Array-Processing Language (an APL).

Other people's languages (OPLs) have arrays. They have functions that work on those arrays. So that makes them APLs too, right?

Wrong.

An OPL fetches array A in one form or another (e.g. "by name" or "by value"), and dumps it down in your lap, so to speak. Then it leaves you to poke inside it using nested for-loops. Or you can lay a paper-trail of pointers and use them to mark where you've been, to go back to. That's the world as IBM created it, and it's never going to change.

Pardon me, it just has.

J is an APL, not an OPL. What makes an APL different from an OPL is that it doesn't just dump A in the lap of function f to flip through with one finger. An APL has primitives which are array-savvy. That means you can develop your functions from the primitives using individual numbers, but then you discover to your delight that they automatically generalize from individual numbers to whole arrays.

For example, consider the J primitive (+), which adds two numbers. Adding 1 to a number is perfectly simple:

   6 + 1
7

Now suppose you have an array A defined as follows:

   A =: 6 5 9 2 4 9 0 7 0 4

You want to add 1 to every element of A. Do you have to write a loop to step through A applying (+1)? No, you don't. Adding 1 to A is just as simple as adding 1 to 6

   A + 1
7 6 10 3 5 10 1 8 1 5

That's because (+) is array-savvy. A function f built using (+), and other such J primitives, will automatically generalize from individual numbers to whole arrays.

Function f need never know the size of A, nor even its structure. It need never get passed a copy of the whole of A, nor even a pointer to its length-field. Not even if it needs to call function g to contract-out some of the processing of A. And what goes for f goes for g too.

In fact you can write f to take delivery of just a single number, or a row of numbers (if A is a matrix), leaving it to J to scale-up f to handle A whatever its size. Or even to behave sensibly if A runs out of numbers altogether. You can even leave it to the APL to decide the precision of those number at runtime. J lets you process numbers as numbers, ignoring questions of precision (unless such questions are directly of concern to you).


How can I get hold of J to be able to do all that?

You can view this question in two ways:

  • How do you download and install J?
  • How do you get to use J as it should be used?

The first question is easy. Just download and install J from http://jsoftware.com . Its free, needs Mbytes not Gbytes of disk space, and installation is easy.

The second question is the tough one.

The trouble is this. J can easily be used like an OPL. You'll recognise all the language features it offers and will have no trouble putting them to work instantly. They'll (mostly) do what you expect. When they don't, you soon spot the problem, plus learn the solution.

But experience has shown that people trained on OPLs have a lot of trouble using J like an APL, as it's meant to be used. They simply can't relax enough to let J do the hard work for them. They want to do the things they're used to doing, to build the ship out of matchsticks, because that way they feel in-control. They end up with code that's a hundred times longer than it need be, and a hundred times slower. This ball-park figure is literal, not metaphorical, as we shall demonstrate.

The purpose of this article is to show you how to use J like an APL right from the start. Not use it for a year (enjoying the neat IDE), before cautiously introducing tricky little features you don't altogether understand.


Show me J in use, first as an OPL, then as an APL

Let's make an array of numbers, A, and process it using two simple functions f and g, where f calls g.

  • We'll do it the OPL way, with code which will be clear enough to an OPL-trained programmer.
  • Then we'll do it the APL way (with only a brief explanation for now).
  • Then we'll time both solutions, showing a better-than-hundredfold improvement.

Here is A, created using the primitive (?.) which generates a list of random numbers. Let's make A heavy enough by giving it 100,000 entries instead of just 10

smoutput A=: ?. 100000 $ 10   NB. 100000 random #s - each 0 thru 9
6 5 9 2 4 9 0 7 0 4 6 8 3 8 1 2 8 0 0 2 1 6 0 4 4 1 3 3 6 6 5 7 8 8 2 1 9 7 9 9 1 4 3 9 7 2 4 3 7 5 3 8 3 6 ...

(?.) is a variant of (?). The advantage of (?.) for us here is that it generates the same sequence of numbers each time it's used. Good for developing demos, like this one, and for testing your apps. Not so good for Monte Carlo simulations.

Now let's write a function: av4 to calculate the average of all the entries in A greater than 4.

In J we normally implement a mathematical function as a verb.

First write it the traditional way, using the looping constructs in J which are familiar to any programmer trained on an OPL. Then run it with argument A

   require 'stats/base/univariate'   NB. needed for the verb: mean

   av4 =: verb define
z=. i.0                 NB. initialize buffer: z to the empty list
for_i. i.$y do.         NB. loop thru the entries of y
  j=. i { y             NB. j is the i'th element of y
  if. j>4 do.           NB. if j>4 then
    z=. z,j             NB. append j to z
  end.
end.
mean z			NB. average the buffer: z
)

   smoutput av4 A       NB. run verb av4 on array A
7.00823

Now rewrite av4 using function composition, which is J's special strength.

We'll give the verb a different name: av4j so that it can coexist with av4 for comparison purposes.

Let's define two verbs f and g

   f =: >&4             NB. returns 1 if y>4
   g =: mean            NB. g is the function: mean

We're not forced to do this. We could use (>&4) instead of f and mean instead of g. But it shows the process more clearly.

Now let's define av4j as a simple-looking composition of three functions: f, g and a primitive: (#). Then run it with argument A as we did with av4

av4j=: [: g f # ]	NB. combine the verbs f, g and # to act in 1-shot

smoutput av4j A		NB. run verb av4j on array A
7.00823

It gives the same answer as av4, which is reassuring.

But in practice you'd want to conduct more exhaustive tests to convince yourself the two verbs are functionally equivalent.

Now time the two expressions. For this we need a test verb timer

   timer =: 6!:2

Don't bother how timer works. All we need to know is that it estimates the time taken to run a given string of J code, returning the answer in (fractional) seconds.

Let's run timer for av4 and av4j in turn, acting on the array A

   smoutput timer 'av4 A'
0.340405
   smoutput timer 'av4j A'
0.00117706

   0.340405 % 0.00117706   NB. divide one result by the other
289.199

So we find that av4j ran 289.199 times faster than av4 on this occasion.

The figure will vary slightly as you repeat the experiment.

That's a big gain in speed. But not atypical when rewriting - in the APL way - a verb written in the conventional OPL way.


Download a script of the entire demo: File:Av4.ijs


Now tell me how it works

The verb av4j uses a construct called a Fork, firstly to combine f and # , and next to combine the result with g.

If you read the page: Fork, this will introduce you to further primitives, which you can look up in the NuVoc portal.

But first take a look at the ancillary page Vocabulary/Loopless, which goes deeper into the topic of loopless programming, of which we've only scratched the surface here.

Each J primitive has a page to itself. Special topics are explored in the ancillary pages listed at the end of NuVoc.

Welcome to the world of J!