Help / JforC / Loopless Code V: Partitions

From J Wiki
Jump to navigation Jump to search

>> << Pri JfC LJ Phr Dic Voc !: Rel NuVoc wd Help J for C Programmers

                                                                          23. Loopless Code V: Partitions

The adverb \ operated on subsets of the operand y that were taken in a regular way.  Now we will take the next step, and operate on irregular subsets of .  This will finally give us the chance to do interesting work with character strings.

Find Unique Items: Monad ~. and Monad ~:

Monad ~. has infinite rank.  ~. y is y with duplicate items removed and is called the nub of .  The items of ~. y are in the order of their first appearance in :

   ~. 'Green grow the lilacs'
Gren gowthliacs
   ]a =. _2 ]\ 0 1 1 2 0 1 2 3 1 3 1 2
0 1
1 2
0 1
2 3
1 3
1 2
0 1
1 2
2 3
1 3

y can have any rank, and ~. y gives the unique items.

Monad ~: has infinite rank and tells you which items monad ~. would pick.  ~: y is a Boolean list where each element is 1 if the corresponding item of y would have been selected for the nub of :

   ~: 'Green grow the lilacs'
1 1 1 0 1 1 1 0 1 1 0 1 1 0 0 1 1 0 1 1 1
   ~: a
1 1 0 1 1 0
   (~: a) # a
0 1
1 2
2 3
1 3

~. y is equivalent to (~: y) # y .

Apply On Subsets: Dyad u/.

Dyad u/. has infinite rank.  x u/. y applies u to subsets of y for which the corresponding items of x (called the keys) are identical.  I will skip the formal description in favor of a verbal one: First find the nub of x, call it nx .  Then, for each item of nx, find all matching items of x; make an array of the corresponding items of y (this array will always have rank one more than the rank of an item of y), and apply u to that array; the result becomes one item of the result of x u/. y .  The items of x u/. y thus correspond to items of the nub of .  Note that the subsets of y may have different shapes (they will all have the same rank, being made of items of y, but each subset may have a different number of items).  For this reason, we usually have u produce a boxed result to avoid framing fills.

   ]a =. 2 0 1 0 2 </. 'Hello'

The subsets were created, and each one was boxed.  Note that each subset is a list, even the one with only one element:

   $&.> a
   3 0 3 0 3 +//. 100 1 200 2 300
600 3

The summing was performed for each subset.  Note that the result is sorted not on the value of the keys, but rather on the smallest index of each key in .

   ]a =. _2 ]\ 'Fred';100;'Joe';200;'Fred';50;'Sam';30
|Joe |200|
|Fred|50 |
|Sam |30 |

A small database of amounts we owe.  We can quickly get a total for each creditor:

   ]b =. ({."1 a) +/@:>/. ({:"1 a)
150 200 30

({."1 a) gives the list of names, which we use as keys for the data given by ({:"1 a) .  For each subset, we open the boxes and add up the results.  Note that +/@>/. would not do in place of +/@:>/., and understand why.

It would be nice to have names associated with the totals:

   (~. {."1 a) ,. <"0 b
|Joe |200|
|Sam |30 |

We box the totals using <"0 before we join items of the totals to the items of the nub.  Recall that x ,. y concatenates each item of x with the corresponding item of .We take advantage of the fact that the results of u/. have the same order as the nub.

Apply On Partitions: Monad u;.1 and u;.2

Unlike dyad u/. which applies u to scattered subsets of y, u;.n applies u to sequential subsets of .  In monad u;.n, the subsets are computed from information in y itself; in dyad u;.n, x specifies the subsets.  u;.n is really 4 different cases distinguished by the number n; we will start with the case where n is 1, 2, _1, or _2 .

Avoid the error of thinking that an operation on a subset of y somehow modifies .  Never happens.  In J, the result of a verb is always a new noun.  You may assign that noun to the same name as one of the operands, but until you do, the old operand is unchanged, and both it and the result of the verb are available for further use.

u;.1 y partitions y by finding the items of y that match the first item of y (i. e. 0{y); each such item, called a fret, is the start of an interval of y which runs from the fret to the last item before the next fret (the last interval runs to the end of y).  Monad u is applied to each interval (which is always a list of items of y), and the results of u become the items of the overall result of u;.1 y .

   <;.1 ' a list of words '
| a| list| of| words| |

Each ' ' character, even the one at the end, started a new interval.  We are not restricted to boxing the intervals:

   #;.1 ' a list of words '
2 5 3 6 1

Here we report the length of each word.

u;._1 y is like u;.1 y except that the fret itself is omitted from the interval.  The interval could be empty in that case:

   <;._1 ' a list of words '
   #;._1 ' a list of words '
1 4 2 5 0

u;.2 y and u;._2 y are like u;.1 y and u;._1 y except that the frets are those items of y that match its last item, and each marks the end of an interval:

   <;.2 'Mississippi'

Creating An Array Using 0 :0 And ;._2

Use u;.2 y to split a list when you know the ending marker for the intervals.  An important use of u;._2 is to execute u on each line of a noun that is defined with 0 :0 :

   ;: ;._2 (0 : 0)
Fred 500
Joe 200
Fred 50
Sam 30
|Joe |200|
|Fred|50 |
|Sam |30 |

Here (0 : 0) creates a noun from the lines that follow.  Each line ends with an unseen LF character which provides a convenient fret for splitting the lines.  We use ;._2 to apply monad ;: to the text of each line (not including the fret, which is dropped by ;._2).  Monad ;: splits each string into words and boxes the words.  The result is not the same as the array we created earlier, because the second column is character rather than numeric.  We could have written noun define instead of (0 : 0) .

The verb you apply to each line of the noun depends on what is in your noun.  One common choice is "., to be discussed in the next chapter, which executes each line as a sentence, returning the result.

Monad u;.n has rank 1 _ when n is 1, 2, _1, or _2 .

Apply On Specified Partitions: Dyad u;.1 and u;.2

When n is 1, 2, _1, or _2, x u;.n y has infinite rank and, in the one-dimensional case where x is a Boolean list of 0s and 1s, resembles u;.n y .  The difference is that the frets are given by the positions of 1s in x rather than by values of the items of .  We can define u;.1 y as (({.y)="_ _1 y) u;.1 y, and u;.n y for the other values of n similarly.

   0 1 0 1 0 +/;.1 (20 30 40 50 60)
70 110

As in this example, some leading items of y may not be in any interval.

In the general case x is a list of boxed Boolean lists, with j{::x supplying the frets for axis .  The partition is then multidimensional:

   (0 1 1;1 0 0 1) <;.1 i. 3 4
|4 5 6 |7 |
|8 9 10|11|

In both the monadic and dyadic cases of u;.1, u;._1, u;.2, and u;._2, the partitions to which u is applied have the same rank as y (but the shapes along the leading axes are reduced by the partitioning).

Find Sequence Of Items: Dyad E.

Dyad E. has infinite rank and is used to slide a pattern across an array and look for positions at which the pattern matches the items of the array.  x and y should have the same rank, and the sliding occurs along all axes of y, giving a Boolean result with the same shape as .  We will consider only the cases where y is a list.

   'is' E. 'Mississippi'
0 1 0 0 1 0 0 0 0 0 0

The 1s in the result tell where the pattern starts.  The result of E. is often used as input to u;.1 :

   ('is' E. 'Mississippi') <;.1 'Mississippi'

A more ambitious example:

   html =. 0 : 0
<th><a href='page1.html'>Press here to go back</a></th>
<th><a href='page2.html'>Press here to go home</a></th>
<th><a href='page3.html'>Press here to go away</a></th>
   ('<a' E. html) {.@:(<@:(8&}.);._1)@:('>'&,);.1 html

That looks like a useful result, but what a mess to produce it!  If you want, you may try to understand that, but a better approach is to break it up:

   extracthref =: <@:(8&}.) ;._1 @:('>'&,)

This seems understandable, with effort: the execution order is ((<@:(8&}.)) ;._1) @:('>'&,); we prepend '>' to y, then we use ;._1 on the string.  That will make the prepended '>' the fret, and so we will break the string into parts that start with '>', throw away the first 8 characters of each part, and box each trimmed-down part.  We can even try it on a test bit:

   extracthref 'abcdefghijkl>xxx'

Now we can look again at our original line, rewritten:

   ('<a' E. html) {.@:extracthref ;.1 html

This makes some sense now.  The execution order is ('<a' E. html) ({.@:extracthref ;.1) html . We use E. to find all the starting positions of '<a' tags; then, for each one, we split what follows the '<a' into blocks terminated by '>', and then we take the first one, which will have the data before the '>' that matched the '<a' .

Multidimensional Partitions

The left argument to the partitioning dyads u;.n can be a list of boxes.  In this case the first box contains the partition marks for axis 0, then next box contains the marks for axis 1, and so on.  If a set of partition marks is an empty list, the corresponding axis will be unpartitioned.

Apply On Subarray: Dyad u;.0

Dyad u;.0 has rank 2 _ x u;.0 y uses x to specify a subarray of y; the subarray is extracted, and u is applied to it to produce the final result.  We will discuss the simple case where x is a rank-2 array.

The first item of x (call that s) gives the starting corner of the subarray.  The second item of x gives the length of each axis of the subarray.  If x is shorter than the rank of y, unspecified axes are taken in full.  The following examples use the test array :

   ]a =. a. {~ (a. i. 'a') + i. 4 4

A cute little expression in its own right.  See why this produces the 4x4 array of characters shown.  Remember that a. is the alphabet of all ASCII characters in order.  An even more elegant way to produce the same result would be (i. 4 4)&+&.(a.&i.) 'a' .

   (0 0 ,: 2 2) ];.0 a

Starting corner 0 0; lengths 2 2; result is a 2x2 subarray, left unchanged by monad .

   (1 2 ,: 3 2) ];.0 a

Starting corner 1 2; lengths 3 2; result is a 3x2 subarray, left unchanged by monad .  You get the idea.

If an item of s is negative, the negative index selects a starting point counting back from the end of the array, just as a negative index does in dyad .  The interval extends backwards from there:

   (2 _1 ,: 2 2) ];.0 a

Starting corner 2 _1 (the character 'l'); lengths 2 2, but running backward in axis 1; result is a 2x2 subarray, left unchanged by monad .  Note that the axis extends backward, but the items retain their normal order--you have merely specified the interval by its endpoint rather than its beginning point.  If you want to reverse the order of the axis, you can do that too, by making the corresponding length negative:

   (2 _1 ,: 2 _2) ];.0 a

Starting corner 2 _1 (the character 'l'); lengths 2 2, but running backward in axis 1; result is a 2x2 subarray with axis 1 reversed, left unchanged by monad .

The subarray must end at the limits of the array.  If the length requests more than that, the subarray will be shorter than was requested.  A length of _ or __ will always fetch from the starting point to the limit of the array.

Dyad u;.0 is a great way to pick out a subarray to work on.  Always consider using it whenever you find yourself using more than one application of {.and }. to select a portion of an array.  Even if you are working with a list, using dyad u;.0 is a good way to work on a portion of the list without copying any part you aren't working on.

Apply On All Subarrays: Dyad u;.3 and u;._3

x u;._3 y applies u to subarrays of y as specified by .  The operation is similar to dyad u\ (infix), but dyad u\ slides a one-dimensional window across y to select sequences of items of y, while dyad u;._3 moves a multidimensional window throughout y to select multidimensional regions.  When y has only one dimension, the operation is like dyad u\ but with a little extra control over the positions of the window.

The second item of x (actually, | 1{x) gives the size of the window (if an item is negative the corresponding axis is reversed before u is applied to the window; an infinite value means 'take all the way to the end of the array').  The first item of x (0{x) is the movement vector: the window is positioned at every point at which each item of the window position is an integral multiple of the corresponding item of the movement vector, as long as the window fits inside .  If an item of the movement vector is smaller than the corresponding item of the size, the windows will overlap along that axis.

If the rank of y is greater than the length of an item of x (in other words if the window has lower rank than y), the omitted trailing dimensions have starting point 0 and length .

In all cases the verb u is applied to an array of cells in which the cells have the shape of the window and the frame of the array with respect to those cells has the rank of .

Dyad u;._3 is useful in imaging applications.  The following example averages 2x2-pixel regions in an image using a simple box filter:

   ]image =. ? 8 8 $ 100
31 88 65 15 68 38 38 49
14 58 84 59 95 55 14 98
40 14 56 25 48 46 96 12
19 31 62 12 65 62 80 24
47 38 20  2 90 42 14 94
41 13 88  9 16  7 36 25
13 78 45 34 45 80 93 65
21 67 90 25 86 47 50 60
   (2 2,:2 2) (*&0.25)@:(+/)@:, ;._3 image
47.75 55.75    64 49.75
   26 38.75 55.25    53
34.75 29.75 38.75 42.25
44.75  48.5  64.5    67

The rank of each operand presented to u is the same as the rank of .  The shape of each operand presented to u is the shape of y with the leading items replaced by |1{x (formally, this is (|1{x),(#x)}.$y).

If x has rank less than 2, it is processed as if it were 1,:x which will try the window at all possible locations.

x u;.3 y is similar to x u;._3 y, but the window is positioned at every starting point that is within y, even if the entire window will not fit within .  For those positions at which the window will not fit, the operand to u is truncated.

Extracting Variable-Length Fields Using ^:a: and ;.1

Breaking a string of variable-length fields into its individual fields seems to be a problem requiring a loop.  How can you find the second field until you have examined the length of the first?  There happens to be a good way to do this in loopless J.  We will use as an example a set of character-string records where each record is preceded by a one-digit length (maximum string length is 9 characters).  For example, the string

   data =. '5There2is1a4tide2in3the7affairs2of3men'

contains 9 records, each containing a single English word.  How do we split the string into its records?

First, we examine each possible starting position and calculate how long a record would be if it started at that position.  The calculated length must be the entire record length including that of the length field itself.  Obviously, we will be calculating spurious lengths at the places that turn out not to be record-start points, but that is the small price we pay for loopless coding.  Here, the data length is given by the difference between the ASCII value of each byte and the ASCII value of '0', and we add one to account for the length of the length field:

   ]l =. >: (a. i. data) - a. i. '0'
6 37 57 54 67 54 3 58 68 2 50 5 69 58 53 54 3 ...

Next, we calculate, for each position, where the next record will start assuming that a record starts at that position.  Clearly, we do this by adding the putative record length to the offset of the position.  If the resulting total is past the end of the string, we limit the position to one character past the end of the string:

   ]n =. (#l) <. l + i. # l
6 38 38 38 38 38 9 38 38 11 ...

Now for the trick.  The first record starts at offset 0.  The next record starts at offset 0{n .  The record after that starts at offset (0{n){n, and so on.  To get the list of all the starting positions, we use

   ]pos =. (n,_1) {~^:a: 0
0 6 9 11 16 19 23 31 34 38 _1

The special power u^:a: means 'keep applying u until the result stops changing, and return the vector of results'.  It is like u^:_ in that it applies u until the result stops changing, but u^:a: returns all the intermediate results from u, not just the last one.  The result from {~^:a: is the list of start-of-record positions, and all that remains is to collect the records.  We discard record pointers that are off the end, and box the records starting at the positions given.  We do this by converting the list of valid record-start points to a Boolean vector, and using the partitioning functions to box the records:

   ((i. #data) e. pos) <;._1 data

Example: Combining Adjacent Boxes

Sometimes the mapping of a problem into J is not obvious.  Consider this problem: we have a list of boxes, each containing a set of positive numbers in a list.  If the last item of one box equals the first item of the next, we want to join the two boxes, replacing them by a single box whose contents are all the unique items in both boxes.  If a box is to be joined to both the previous and the following box, all three are to be combined into one box; generally, if there is a sequence of n boxes each to be joined to the next, those n boxes, plus the following box, are replaced by a single box containing all the unique items of the n+1 boxes.

For example, the input

   ]a =. 1 2;2 3;3 4;5 6;(,7);7 8 9;10
|1 2|2 3|3 4|5 6|7|7 8 9|10|

should produce the output

|1 2 3 4|5 6|7 8 9|10|

The trick is to see that this is a job for partitioning.  Each group of boxes to be joined should be in a separate partition, and each box not to be joined should be in a partition by itself.  Then, we just need a verb that will do the join if there are multiple boxes, and do nothing if there is only one box.

First, let's write that verb.  We could do an explicit test, using the If conjunction ^: or the agenda conjunction @., but the simplest solution is the verb


; runs together the contents of the boxes, ~. throws out any duplicates, and < puts the result in a single box.  It does what we want whether it's given one box or many.

Next, we need to establish the partitions.  We come up with the rule: if a box is not to be joined to the previous box, it starts a new partition.  So in any string of boxes to be joined together, the first will start a partition and the others will not, with the result that they will all be in the same partition.

A box is not joined to the previous box if its first item is different from the previous box's last item (or if it is the first box).  The first item of all boxes except the first is

   }.@:({.@>) a
2 3 5 7 7 10

(we take the first item of each box and discard the first item of the result--make sure you know why we use {.@> rather than {.@:>, and also why }.@:{.@>, leaving out the parentheses, would fail but {.@>@}. would work [hint: it's all about rank]).  The last item of all boxes except the last is

   }:@:({:@>) a
2 3 4 6 7 9

We set the partition mask based on inequality of these two lists, add on a leading 1 for the first partition, and apply our joining verb to the partitions:

   (1 , (}:@:({:@>) a) ~: (}.@:({.@>) a))  <@:~.@:; ;.1  a
|1 2 3 4|5 6|7 8 9|10|

>> << Pri JfC LJ Phr Dic Voc !: Rel NuVoc wd Help J for C Programmers