Essays/Attribute-Value Processing

From J Wiki
Jump to navigation Jump to search

KenLettow posed the following problem to the J Programming Forum on 2008-11-11:

Each line of a file contains strings of the form attribute=value separated by the & character (see below). Generate a table of the values for each line of the file for a specified list of attributes.

y=: 0 : 0
att0=4010&att7=2457&att2=439
att3=902&att2=413&att5=4262&att4=4967
att5=4040&att1=465

att4=2733
att3=2397&att2=1104&att6=2625
)


A Solution

The problem can be solved by using cut (;.) as follows. The solution works on all the lines at once rather than a line at a time.

NB. y: lines of  att=value&att=value&att=value& ...  terminated by LF
NB. x: required attributes
tab=: 4 : 0
 av=. a: -.~ (y e. LF,'=&') <;._2 y  NB. attribute-value pairs
 a=. av #~ (#av)$1 0                 NB. attributes
 v=. av #~ (#av)$0 1                 NB. values
 n=. (y=LF) +/;._2 y='='             NB. # attributes in each line
 }:"1 v (<"1 (I.n),.x i. a)} ((#n),1+#x)$a:
)

For example:

   ] x=: <;._1 ' att0 att1 att2'
┌────┬────┬────┐
│att0│att1│att2│
└────┴────┴────┘
   x tab y
┌────┬───┬────┐
│4010│   │439 │
├────┼───┼────┤
│    │   │413 │
├────┼───┼────┤
│    │465│    │
├────┼───┼────┤
│    │   │    │
├────┼───┼────┤
│    │   │    │
├────┼───┼────┤
│    │   │1104│
└────┴───┴────┘

Program Logic

0. If y is cut on trailing LF , = , and & characters, and empty boxes are removed, the result is a boxed vector of   attribute value attribute value ...

   ] av=. a: -.~ (y e. LF,'=&') <;._2 y
┌────┬────┬────┬────┬────┬───┬────┬───┬────┬───┬────┬────┬────┬────┬────┬────┬────┬───┬────┬────┬
│att0│4010│att7│2457│att2│439│att3│902│att2│413│att5│4262│att4│4967│att5│4040│att1│465│att4│2733│...
└────┴────┴────┴────┴────┴───┴────┴───┴────┴───┴────┴────┴────┴────┴────┴────┴────┴───┴────┴────┴

1. The even-numbered entries are the attribute names.

   ] a=. av #~ (#av)$1 0
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│att0│att7│att2│att3│att2│att5│att4│att5│att1│att4│att3│att2│att6│
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

2. The odd-numbered entries are the corresponding values.

   ] v=. av #~ (#av)$0 1
┌────┬────┬───┬───┬───┬────┬────┬────┬───┬────┬────┬────┬────┐
│4010│2457│439│902│413│4262│4967│4040│465│2733│2397│1104│2625│
└────┴────┴───┴───┴───┴────┴────┴────┴───┴────┴────┴────┴────┘

3. The number of a-v pairs on each line obtains by a partitioned sum on the number of = on the line.

   ] n=. (y=LF) +/;._2 y='='
3 4 2 0 1 3

4. The overall result has shape (# lines),(# attributes of interest). The value part of each a-v pair amends entry i,j where i is the line number and j is x i. a . The program temporarily works with a table with one extra column, with the values for attributes not in x amending that extra column.

   (#n),1+#x
6 4
   ] i=. (I.n) ,. x i. a
0 0
0 3
0 2
1 3
1 2
1 3
1 3
2 3
2 1
4 3
5 3
5 2
5 3
   v (<"1 i)} 6 4$a:
┌────┬───┬────┬────┐
│4010│   │439 │2457│
├────┼───┼────┼────┤
│    │   │413 │4967│
├────┼───┼────┼────┤
│    │465│    │4040│
├────┼───┼────┼────┤
│    │   │    │    │
├────┼───┼────┼────┤
│    │   │    │2733│
├────┼───┼────┼────┤
│    │   │1104│2625│
└────┴───┴────┴────┘

Line-at-a-Time

line=: 4 : 0
 av=. a: -.~ (y e. LF,'=&') <;._2 y
 a=. av #~ (#av)$1 0
 v=. av #~ (#av)$0 1
 (a i. x) { v,a:
)

tab2=: 4 : 'x&line;.2 y'

   x (tab -: tab2) y
1

Notes

  • If lines of y are terminated by CRLF rather than by LF , the CR characters must first be removed: x tab y-.CR
  • If lines of y are separated by LF rather than terminated by LF , append a LF to y : x tab y,LF
  • The program does not handle cases where a value contains = or & (or even LF), nor a value enclosed in quotes.



Contributed by Roger Hui.