Essays/Inverted Table

From J Wiki
Jump to: navigation, search

Introduction

A table is 2-dimensional data in which the items of a column have the same data type; an inverted table is a table in which the columns are boxed. An inverted table is more efficient in time and space than other representations, such as one in which each atom of the table is boxed.

x below is an example of an inverted table. It has 5 columns and 8 rows. The i-th row is i{&.>x .

   x0=: ];._1 ' Smith Jones Chan Wilson Saxon Angelo Smith Wilson'
   x1=: ];._1 ' John Dakota Wilson Diana Joan Roberto John John'
   x2=: 0 1 0 1 1 0 0 1
   x3=: 23 29 47 23 31 19 23 23
   x4=: 1.25 0.97 2.11 1.25 2.8 1.11 1.25 1.25
   x=: x0;x1;x2;x3;x4

   x
┌──────┬───────┬───────────────┬───────────────────────┬──────────────────────────────────────┐
│Smith │John   │0 1 0 1 1 0 0 1│23 29 47 23 31 19 23 23│1.25 0.97 2.11 1.25 2.8 1.11 1.25 1.25│
│Jones │Dakota │               │                       │                                      │
│Chan  │Wilson │               │                       │                                      │
│Wilson│Diana  │               │                       │                                      │
│Saxon │Joan   │               │                       │                                      │
│Angelo│Roberto│               │                       │                                      │
│Smith │John   │               │                       │                                      │
│Wilson│John   │               │                       │                                      │
└──────┴───────┴───────────────┴───────────────────────┴──────────────────────────────────────┘
   ,.&.> x
┌──────┬───────┬─┬──┬────┐
│Smith │John   │0│23│1.25│
│Jones │Dakota │1│29│0.97│
│Chan  │Wilson │0│47│2.11│
│Wilson│Diana  │1│23│1.25│
│Saxon │Joan   │1│31│ 2.8│
│Angelo│Roberto│0│19│1.11│
│Smith │John   │0│23│1.25│
│Wilson│John   │1│23│1.25│
└──────┴───────┴─┴──┴────┘

   $ x
5
   #&> x
8 8 8 8 8
   (<_1 2) {&.> x
┌──────┬───────┬───┬─────┬─────────┐
│Wilson│John   │1 0│23 47│1.25 2.11│
│Chan  │Wilson │   │     │         │
└──────┴───────┴───┴─────┴─────────┘

Alternative Representation

As indicated above, an alternative representation of a table is one where the table atoms are individually boxed. It is straightforward to convert from an inverted table to this representation and vice versa.

   ] x1=: |:@:(<"_1&>) x
┌──────┬───────┬─┬──┬────┐
│Smith │John   │0│23│1.25│
├──────┼───────┼─┼──┼────┤
│Jones │Dakota │1│29│0.97│
├──────┼───────┼─┼──┼────┤
│Chan  │Wilson │0│47│2.11│
├──────┼───────┼─┼──┼────┤
│Wilson│Diana  │1│23│1.25│
├──────┼───────┼─┼──┼────┤
│Saxon │Joan   │1│31│2.8 │
├──────┼───────┼─┼──┼────┤
│Angelo│Roberto│0│19│1.11│
├──────┼───────┼─┼──┼────┤
│Smith │John   │0│23│1.25│
├──────┼───────┼─┼──┼────┤
│Wilson│John   │1│23│1.25│
└──────┴───────┴─┴──┴────┘
   7!:5 ;:'x x1'  NB. # bytes
640 2816

   x2=: <@(>"1)@|: x1
   x -: x2
1

The remainder of this note presents utilities for working with inverted tables. If a table has "key" columns, then computations such as index of should be applied to those columns rather than to the entire table.

Assertions

tassert asserts that the argument is an inverted table: a boxed vector with a least one element, having the same number of items in each box.

tassert=: 3 : 0
 assert. (1>:#$y) *. 32=3!:0 y  NB. boxed scalar or vector
 assert. 1=#~.#&>y              NB. same # items in each box (with at least one box)
 1
)

   tassert x
1

   tassert x1
|assertion failure: tassert
|   (1=#$y)*.32=3!:0 y

   tassert (i.5);i.6
|assertion failure: tassert
|   1=#~.#&>y

Tally

The number of rows in an inverted table is the tally (#) of one of its opened atoms.

   ttally=: #@>@{.
   ttally x
8

From

i tfrom x selects record(s) i from table x .

   tfrom=: <@[ {&.> ]
   4 3 tfrom x
┌──────┬───────┬───┬─────┬────────┐
│Saxon │Joan   │1 1│31 23│2.8 1.25│
│Wilson│Diana  │   │     │        │
└──────┴───────┴───┴─────┴────────┘

Index of

To compute index-of (analog of i.), replace column i by its indices in column i of x , then compute i. as usual on the transposed tables of integers.

   y=: 3 1 1 2 tfrom x
   |: x i.&> x
0 0 0 0 0
1 1 1 1 1
2 2 0 2 2
3 3 1 0 0
4 4 1 4 4
5 5 0 5 5
0 0 0 0 0
3 0 1 0 0
   |: x i.&> y
3 3 1 0 0
1 1 1 1 1
1 1 1 1 1
2 2 0 2 2
   (x i.&> x) i.&|: (x i.&> y)
3 1 1 2
   tindexof=: i.&>~@[ i.&|: i.&>
   x tindexof y
3 1 1 2

Member of

x tmemberof y are those indices in y tindexof x that are less than the number of rows in y ; alternatively, it can be computed more directly from the transposed tables of integer indices on columns of y rather than on columns of x . Thus:

   y tindexof x
4 1 3 0 4 4 4 4
   ttally y
4
   (y tindexof x) < ttally y
0 1 1 1 0 0 0 0
   tmemberof1=: tindexof~ < ttally@]
   x tmemberof1 y
0 1 1 1 0 0 0 0

   y i.&> x
4 1 3 0 4 4 4 0
4 1 3 0 4 4 4 4
3 0 3 0 0 3 3 0
0 1 3 0 4 4 0 0
0 1 3 0 4 4 0 0
   y i.&> y
0 1 1 3
0 1 1 3
0 0 0 3
0 1 1 3
0 1 1 3
   (y i.&> x) e.&|: (y i.&> y)
0 1 1 1 0 0 0 0
   tmemberof=: i.&>~ e.&|: i.&>~@]
   x tmemberof y
0 1 1 1 0 0 0 0

Less

x tless y are rows of x which are not rows of y . Thus:

   tless=: <@:-.@tmemberof #&.> [
   ,.&.> x tless y
┌──────┬───────┬──┬─┬────┐
│Smith │John   │23│0│1.25│
│Saxon │Joan   │31│1│ 2.8│
│Angelo│Roberto│19│0│1.11│
└──────┴───────┴──┴─┴────┘

Nub Sieve and Nub

The nub sieve of a table can be computed by comparison of its self-tindexof values against the indices from 0 to n-1 ; alternatively, it can be computed directly from the transposed table of integer indices.

   x tindexof x
0 1 2 3 4 5 0 7
   i. ttally x
0 1 2 3 4 5 6 7
   (x tindexof x) = i. ttally x
1 1 1 1 1 1 0 1
   tnubsieve1=: tindexof~ = i.@ttally
   tnubsieve1 x
1 1 1 1 1 1 0 1

   x i.&> x
0 1 2 3 4 5 0 3
0 1 2 3 4 5 0 0
0 1 0 1 1 0 0 1
0 1 2 0 4 5 0 0
0 1 2 0 4 5 0 0
   ~: |: x i.&> x
1 1 1 1 1 1 0 1
   tnubsieve=: ~:@|:@:(i.&>)~
   tnubsieve x
1 1 1 1 1 1 0 1

Nub obtains by selecting each column using the nub sieve.

   tnub=: <@tnubsieve #&.> ]
   tnub x
┌──────┬───────┬─────────────┬────────────────────┬─────────────────────────────────┐
│Smith │John   │0 1 0 1 1 0 1│23 29 47 23 31 19 23│1.25 0.97 2.11 1.25 2.8 1.11 1.25│
│Jones │Dakota │             │                    │                                 │
│Chan  │Wilson │             │                    │                                 │
│Wilson│Diana  │             │                    │                                 │
│Saxon │Joan   │             │                    │                                 │
│Angelo│Roberto│             │                    │                                 │
│Wilson│John   │             │                    │                                 │
└──────┴───────┴─────────────┴────────────────────┴─────────────────────────────────┘
   ,.&.> tnub x
┌──────┬───────┬─┬──┬────┐
│Smith │John   │0│23│1.25│
│Jones │Dakota │1│29│0.97│
│Chan  │Wilson │0│47│2.11│
│Wilson│Diana  │1│23│1.25│
│Saxon │Joan   │1│31│ 2.8│
│Angelo│Roberto│0│19│1.11│
│Wilson│John   │1│23│1.25│
└──────┴───────┴─┴──┴────┘

Key

x and y are inverted tables; rows of x specify keys for the corresponding rows of y . Then: x u tkey y applies u to rows of y having identical keys.

tkey=: 1 : '<@tindexof~@[ u/.&.> ]'

   (0 1{x) < tkey 2 3 4{x
┌─────────────────┬─────────────────────────┬────────────────────────────────────────┐
│┌───┬─┬─┬─┬─┬─┬─┐│┌─────┬──┬──┬──┬──┬──┬──┐│┌─────────┬────┬────┬────┬───┬────┬────┐│
││0 0│1│0│1│1│0│1│││23 23│29│47│23│31│19│23│││1.25 1.25│0.97│2.11│1.25│2.8│1.11│1.25││
│└───┴─┴─┴─┴─┴─┴─┘│└─────┴──┴──┴──┴──┴──┴──┘│└─────────┴────┴────┴────┴───┴────┴────┘│
└─────────────────┴─────────────────────────┴────────────────────────────────────────┘
   (0 1{x) +/ tkey 2 3 4{x
┌─────────────┬────────────────────┬────────────────────────────────┐
│0 1 0 1 1 0 1│46 29 47 23 31 19 23│2.5 0.97 2.11 1.25 2.8 1.11 1.25│
└─────────────┴────────────────────┴────────────────────────────────┘

Grade and Sort

A table can be graded by grading column c-1 , then column c-2 , and so on, to column 0 .

   (0{::x) (]/:{~) (1{::x) (]/:{~) (2{::x) (]/:{~) (3{::x) (]/:{~) /:4{::x
5 2 1 4 0 6 3 7
   tgrade    =: > @ ((] /: {~)&.>/) @ (}: , /:&.>@{:)
   tgradedown=: > @ ((] \: {~)&.>/) @ (}: , \:&.>@{:)
   tgrade x
5 2 1 4 0 6 3 7
   tgradedown x
7 3 0 6 4 1 2 5

   ,.&.> (<tgrade x) {&.> x
┌──────┬───────┬─┬──┬────┐
│Angelo│Roberto│0│19│1.11│
│Chan  │Wilson │0│47│2.11│
│Jones │Dakota │1│29│0.97│
│Saxon │Joan   │1│31│ 2.8│
│Smith │John   │0│23│1.25│
│Smith │John   │0│23│1.25│
│Wilson│Diana  │1│23│1.25│
│Wilson│John   │1│23│1.25│
└──────┴───────┴─┴──┴────┘
   ,.&.> (<tgradedown x) {&.> x
┌──────┬───────┬─┬──┬────┐
│Wilson│John   │1│23│1.25│
│Wilson│Diana  │1│23│1.25│
│Smith │John   │0│23│1.25│
│Smith │John   │0│23│1.25│
│Saxon │Joan   │1│31│ 2.8│
│Jones │Dakota │1│29│0.97│
│Chan  │Wilson │0│47│2.11│
│Angelo│Roberto│0│19│1.11│
└──────┴───────┴─┴──┴────┘

The grade can also be computed by replacing each column with the rankings within each column, with equal items being assigned equal rankings. (!.0 is used because /: is non-tolerant.) Thus:

   ranking=: i.!.0~ { /:@/:
   ranking&> x
4 2 1 6 3 0 4 6
3 0 7 1 2 6 3 3
0 4 0 4 4 0 0 4
1 5 7 1 6 0 1 1
2 0 6 2 7 1 2 2
   /: |: ranking&> x
5 2 1 4 0 6 3 7
   tgrade1=: /: @ |: @: (ranking&>)
   tgrade1 x
5 2 1 4 0 6 3 7

Sort obtains by indexing with the grade.

   tsort=: <@tgrade {&.> ]
   ,.&.> tsort x
┌──────┬───────┬─┬──┬────┐
│Angelo│Roberto│0│19│1.11│
│Chan  │Wilson │0│47│2.11│
│Jones │Dakota │1│29│0.97│
│Saxon │Joan   │1│31│ 2.8│
│Smith │John   │0│23│1.25│
│Smith │John   │0│23│1.25│
│Wilson│Diana  │1│23│1.25│
│Wilson│John   │1│23│1.25│
└──────┴───────┴─┴──┴────┘

Selfie

Index-of with both arguments being the same, an instance of a “selfie”. x⍳x are like ID numbers; questions of identity on x can often be answered more efficiently on x⍳x than on x itself.

   tselfie=: |:@:(i.~&>)
   tselfie x
0 0 0 0 0
1 1 1 1 1
2 2 0 2 2
3 3 1 0 0
4 4 1 4 4
5 5 0 5 5
0 0 0 0 0
3 0 1 0 0

Collected Definitions

ifa =: <@(>"1)@|:               NB. inverted from atoms
afi =: |:@:(<"_1@>)             NB. atoms from inverted

tassert=: 3 : 0
 assert. (1>:#$y) *. 32=3!:0 y  NB. boxed scalar or vector
 assert. 1=#~.#&>y              NB. same # items in each box (with at least one box)
 1
)

ttally    =: #@>@{.
tfrom     =: <@[ {&.> ]
tindexof  =: i.&>~@[ i.&|: i.&>
tmemberof =: i.&>~ e.&|: i.&>~@]
tless     =: <@:-.@tmemberof #&.> [
tnubsieve =: ~:@|:@:(i.&>)~
tnub      =: <@tnubsieve #&.> ]
tkey      =: 1 : '<@tindexof~@[ u/.&.> ]'
tgrade    =: > @ ((] /: {~)&.>/) @ (}: , /:&.>@{:)
tgradedown=: > @ ((] \: {~)&.>/) @ (}: , \:&.>@{:)
tsort     =: <@tgrade {&.> ]
tselfie   =: |:@:(i.~&>)

NB. alternatives
tmemberof1=: tindexof~ < ttally@]
tnubsieve1=: tindexof~ = i.@ttally
ranking   =: i.!.0~ { /:@/:
tgrade1   =: /: @ |: @: (ranking&>)



Contributed by Roger Hui.