Fifty Shades of J/Chapter 11

From J Wiki
Jump to navigation Jump to search

Table of Contents ... Glossary ... Previous Chapter ... Next Chapter

Time for amendment of data

Principal Topics

} (amend, item amend), ? (deal), " (rank conjunction) gerund, ‘cleaning’ small numbers to zero, multiple choice tests.

Updating – the amend adverb

Updating part of a list disguises the fact that there are really two processes present which are telescoped into one. The first process involves a data transformation which selects those parts which are to be changed, while the second process does the actual replacement with a second set of data. Syntactically amend (}) is an adverb qualifying a selector (noun) to its left to produce a verb whose left and right arguments are the new and old data respectively, so ‘new selector } old' should be read as ‘new (selector}) old’.

   a=:i.2 3 4
   ((9) 1}a);((9) 1}"1 a);(9) 1}"2 a
┌─────────┬──────────┬───────────┐
│0 1  2  3│ 0 9  2  3│ 0  1  2  3│
│4 5  6  7│ 4 9  6  7│ 9  9  9  9│
│8 9 10 11│ 8 9 10 11│ 8  9 10 11│
│         │          │           │
│9 9  9  9│12 9 14 15│12 13 14 15│
│9 9  9  9│16 9 18 19│ 9  9  9  9│
│9 9  9  9│20 9 22 23│20 21 22 23│
└─────────┴──────────┴───────────┘

The phrase (9) 1} a demonstrates that an adverb, unlike an adverb in ordinary grammar, may qualify either a noun or a verb. By default, selection takes place at the level of items within lists, in this case at rank 3, but the rank conjunction allows indexing to apply at lower levels. In particular (9)0}"0 a replaces all atoms with 9.

} also provides a quick way of generating coarse plots of data presented in the form of co-ordinate pairs which act as scatter index coordinates, e.g.

   z=:0 0;1 1;2 4;3 10
   '*'  z}4 11$' '
*
 *
    *
          *

An updating problem: a choice of methods

One way to change the initial letter of a set of words is

   words=:'blood';'blight';'bear'
   words
┌─────┬──────┬────┐
│blood│blight│bear│
└─────┴──────┴────┘
   (<'B'),each }.each words
┌─────┬──────┬────┐
│Blood│Blight│Bear│
└─────┴──────┴────┘

which involves two essential operations, drop and append. Amend allows these to be telescoped into one.

   'B' 0}each words
┌─────┬──────┬────┐
│Blood│Blight│Bear│
└─────┴──────┴────┘

Selectors do not have to be explicit, they can be returned by verbs as in the next example. If open (>) is applied to words the result is a set of homogeneous (equal length) lists, and so in order to change the last characters, it is necessary to compute the ‘coordinates’ of the final non-blank characters, as the following section shows.

Changing last characters

Suppose you want to replace the last characters in each of the list of words

   llc=:<:@i.&' '               NB. locate last character
   ilc=:<"1@(i.@# ,. llc"1)     NB. indexes of last characters
   ilc >words
┌───┬───┬───┐
│0 4│1 5│2 3│
└───┴───┴───┘
   'xsk' (ilc >words)}>words
bloox
blighs
beak

Repeating the data-name words in the above phrase is not inherently pleasing. However, the specification of amend allows this to be tidied up by using a gerund, which at the same time allows the replacement characters to appear as the left argument

   replasts=:[`(ilc@])`]}
   'xsk'replasts >words
bloox
blighs
beak

Without the gerund option, it is hard to accommodate amend in explicit definitions. A nontrivial transformation of the old data might be to convert lowercase characters to upper case

   lctouc=:monad :'(t-32*96<t=.a.i.y){a.'
   lctouc every words
┌─────┬──────┬────┐
│BLOOD│BLIGHT│BEAR│
└─────┴──────┴────┘

Using a boxed list presents a difficulty because of the need for an open between successive indexing activities. To get round this an amend based verb can be defined to work at the item level and applied to each of the items in the object

   RepLASTS=:[`(<:@#@])`] }
   'xsk'RepLASTS each words
bloox
blighs
beak

Item amend

So far, the adverbially qualified verb ‘selector}’ has been used dyadically. It can also be monadic in which case } is called item amend. The result has the structure of a single item of the right argument y, and its value is determined by selecting indices for ys of those items which are to be amended. Take for example a simulation of answers to a multiple choice test. The data is five items, each comprising twenty repetitions of the same character; the result of each execution is a further item, each of whose items comes from just one of the original five.

    ]mch=:|:20 5$'ABCDE'   NB. Construct char matrix mch
AAAAAAAAAAAAAAAAAAAA
BBBBBBBBBBBBBBBBBBBB
CCCCCCCCCCCCCCCCCCCC
DDDDDDDDDDDDDDDDDDDD
EEEEEEEEEEEEEEEEEEEE

The following is a random selection of responses

    (?20$5) } mch          NB. (?20$5)} is a noun
ADDEBCAAECDABACDCEEC       NB. result may be different!

or if the responses are required in strict sequence

   rint=:({. | i.@{:)@$    NB. Repeat row indices to length of
   rint mch                NB. number of columns
0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4
   rint} mch
ABCDEABCDEABCDEABCDE

Selection by criterion

Amend along with a selector generating verb allows updating using arbitrary indexes, and hence updating by criterion. For example, suppose an array contains some instances of 0

   ]p=:3 5$i.3
0 1 2 0 1
2 0 1 2 0
1 2 0 1 2

The instances of 0 can be identified by

   (=&0)p
1 0 0 1 0
0 1 0 0 1
0 0 1 0 0

This criterion can be transformed into an array of indices satisfying the criterion by

   (i.$p)*(=&0)p
0 0  0 3 0
0 6  0 0 9
0 0 12 0 0

To change all instances of 0 into 99 say

   99((i.@$*=&0)@])}p
99  1  2 99  1
 2 99  1  2 99
 1  2 99  1  2

This amendment has been applied for one specific criterion, namely ‘equals zero’, whereas the technique is clearly generalisable, suggesting an adverb which transforms a criterion verb into the verb which gives the matching indices in y

   ind=:adverb : '(i.@$*x)@]'
   99(=&0 ind)}p
99  1  2 99  1
 2 99  1  2 99
 1  2 99  1  2

A good use of this technique is to ‘clean’ numeric arrays of very small near-zero numbers which typically arise from floating point calculations

   clean=:0&((<&1e_6@| ind)})
   clean %1000000*_5+i.10
0 0 0 0 _1e_6 _ 1e_6 0 0 0

To round numbers which are very close to integers use swingu. Here cleani rounds to five digits after the decimal.

   cleani=:]`swingu@.(<&1e_6@(|@- swingu))
   swingu=:<.@+&0.5    NB. move to nearest integer
   cleani every 5.999999 5.99999 6.000001 6.000009
6 5.99999 6 6.00001

Code Summary

ilc=:<"1@(i.@# ,. llc"1)                   NB. indexes of last chars
llc=:<:@i.&' '                             NB. locate last char
lctouc=:monad :'(t-32*96<t=.a.i.y){a.'     NB. change l/c to u/c
replasts=:[`(ilc@])`]}                     NB. replace last chars
RepLASTS=:[`(<:@#@])`] }                   NB. ditto for boxed lists
rint=:({. | i.@{:)@$                       NB. repeat row indices
ind=:adverb : '(i.@$*x)@]'
clean=:0&((<&1e_6@| ind)})                 NB. clean small values to zero
cleani=:]`swingu@.(<&1e_6@(|@- swingu))    NB. round to 5 decimal digits
swingu=:<.@+&0.5                           NB. move to nearest integer

Script

File:Fsojc11.ijs