User:Devon McCormick/Trees

From J Wiki
Jump to: navigation, search

Here's an exposition of one method for representing trees in J. <<TableOfContents>>

Parent Index Vector

The tree is a vector of integers where each element is the index of the parent node but with _1 for the root's parent. This is handy for modeling directory trees and it works well for that.

Say we have a directory tree like this:

C:
|__n0
|   |_n00
|   |_n01
|
|__n1
    |__n10
    |   |__n100
    |__n11
    |   |__n110
    |   |__n111
    |   |__n112
    |__n12

We typically separate the tree structure from the corresponding vector of nodes which, in this case, are the directory or file names.

For example, we can translate the above tree as follows using breadth-first ordering:

   nmsb=. 'C:';'n0';'n1';'n00';'n01';'n10';'n11';'n12';'n100';'n110';'n111';'n112'
   trb=. _1    0    0    1     1     2      2     2     5      6      6      6

We can also translate it using depth-first ordering:

   nmsd=. 'C:';'n0';'n00';'n01';'n1';'n10';'n100';'n11';'n110';'n111';'n112';'n12'
   trd=.  _1   0    1     1     0    4     5      4     7      7      7      4

Whichever we choose is not functionally significant: all the following code works the same on either version. This representation of trees is good for looking up an item and locating its parent, or joining sub-trees into a larger tree. By "good", we mean simple to code and fast-running.

Basic Navigation

Here are some basic verbs with examples of using them.

   whChild=: [: I. [ e. ]     NB.* whChild: where are children of x in tree y
   whRoot=: [:I. _1=]         NB.* whRoot: where roots are in tree y
   nextLevel=: [ whChild ]    NB.* nextLevel: indexes of descendants of x in tree y

   trb whChild whRoot trb     NB. Indexes of children of root
1 2
   trd whChild whRoot trd     NB. Indexes of children of root on depth-first version
1 4

Though the numbers differ, they refer to the same items in each case:

   nmsb{~trb whChild whRoot trb NB. Children of root using breadth-first
+--+--+
|n0|n1|
+--+--+
   nmsd{~trd whChild whRoot trd NB. Depth-first
+--+--+
|n0|n1|
+--+--+

Here we see how to generalize moving to levels further down the tree using the power conjunction " ^: ".

   trb nextLevel whRoot trb                NB. Next level down from root
1 2
   trb nextLevel trb nextLevel whRoot trb  NB. two levels down
3 4 5 6 7
   trb nextLevel^:2 ] whRoot trb           NB. two levels down
3 4 5 6 7

Now let's graft a new tree onto an existing one:

   newnms=. 'new0';'new01';'new02';'new010';'new011';'new020';'new021'
   newtree=. _1     0       0       1        1        2        2
   #&>newtree;<newnms    NB. Sanity check
7 7

graftRoot=: 3 : 0
   'ot jn nt'=. y        NB. old tree;join-node;new tree
   ot,(_1=nt)}(nt+#ot),:jn
)

   graftRoot trb;(nmsb i. <'n0');newtree
_1 0 0 1 1 2 2 2 5 6 6 6 1 12 12 13 13 14 14
   #newtr2=. graftRoot trb;(nmsb i. <'n0');newtree
19
   #newnms2=. nmsb,newnms
19
   trb nextLevel^:1 ] whRoot trb
1 2
   newnms2{~newtr2 nextLevel^:1 ] whRoot newtr2
+--+--+
|n0|n1|
+--+--+
   newnms2{~newtr2 nextLevel^:2 ] whRoot newtr2
+---+---+---+---+---+----+
|n00|n01|n10|n11|n12|new0|
+---+---+---+---+---+----+
   newnms2{~newtr2 nextLevel^:3 ] whRoot newtr2
+----+----+----+----+-----+-----+
|n100|n110|n111|n112|new01|new02|
+----+----+----+----+-----+-----+
   newnms2{~newtr2 nextLevel^:4 ] whRoot newtr2
+------+------+------+------+
|new010|new011|new020|new021|
+------+------+------+------+
   newnms2{~newtr2 nextLevel^:5 ] whRoot newtr2  NB. No fifth level

Validity Checking

This representation of trees allows multiple root nodes - which is called a "forest" - and can represent closed loops, which are invalid as trees. Since invalid representations are possible, it may behoove us to check if a given tree is valid. We can do this in the following way.

NB.* verifyTree: check that tree has >:1 root, no cycles, no unreachable nodes:
NB. 0 for bad node, 1 for good one.
verifyTree=: 3 : 0
   cc=. 0$~#y                 NB. Cycle counter
   nli=. whRoot y             NB. Next level indexes: start at root(s)
   cc=. (1+nli{cc) nli}cc     NB. Count # visits to node
   while. (0<#nli) *. 1=>./cc do.
       nli=. y nextLevel nli
       cc=. (1+nli{cc) nli}cc NB. Count # visits to node
   end.
  cc *. -.(_1=y) *. 1<+/_1= y    NB. Dis-allow forests
NB.EG (1 0;0 0;1 0 0) -: verifyTree &.> _1 1; 0 1; _1 2 1  NB. bad ones
NB.EG (1 1 1;1 1 1 1) -: verifyTree &.> _1 0 1; _1 0 _1 2  NB. good ones
NB.EG (0 0;0 1 0 1) -: verifyTree &.> _1 _1; _1 0 _1 2     NB. bad ones, multi-root
)

The "EG" comments at the end give examples of use and the results expected: each phrase should return a single "1" indicating the results match the outputs. The three initial bad trees fail verification (indicated by zeros corresponding to bad nodes) because the first one is cyclical because it has a node that is its own parent, the second one is cyclical and has no root, and the third has a cycle between 1 and 2.

Notice that we choose to treat multi-rooted trees as invalid only with the final condition. This is an arbitrary choice. Similarly, our exclusion of cycles is arbitrary as well. We could just as well allow forests or cycles if we wanted to generalize beyond trees to graphs of some kind. However, this "parent index" scheme would allow for only a limited subset of graphs, so is not a good choice to represent more general graphs.

Using "Tree Display"

There's an existing utility to display a tree in a printable format. To use this utility, we must convert our "parent index" form to one in which edges are listed as point pairs. One way to do this is shown as follows.

   trb=. _1 0 0 1 1 2 2 2 5 6 6 6
   |:]t1b=. }.(i.#trb),.~trb      NB. Assume root is at start (hence }.)
0 0 1 1 2 2 2 5 6  6  6
1 2 3 4 5 6 7 8 9 10 11

   tree t1b
+----------------------+
|           +- 3       |
|     +- 1 -+- 4       |
|     |     +- 5 --- 8 |
|- 0 -+     |     +- 9 |
|     +- 2 -+- 6 -+- 10|
|           |     +- 11|
|           +- 7       |
+----------------------+

   9!:7 '+++++++++|-'              NB. Simple ASCII characters for box-drawing
   11{.16}.a.
┌┬┐├┼┤└┴┘│─
   9!:7 ] 11{.16}.a.               NB. Box-drawing characters

   trd=. _1 0 1 1 0 4 5 4 7 7 7 4  NB. Depth-first version of tree
   |:]t1d=. }.(i.#trd),.~trd
0 1 1 0 4 5 4 7 7  7  4
1 2 3 4 5 6 7 8 9 10 11

   tree t1d
┌──────────────────────┐
│           +- 2       │
│     +- 1 -+- 3       │
│     |     +- 5 --- 6 │
│- 0 -+     |     +- 8 │
│     +- 4 -+- 7 -+- 9 │
│           |     +- 10│
│           +- 11      │
└──────────────────────┘

That this latter representation, based on the depth-first ordered tree, is the same as the former one can be see if we insert the node names to illuminate the correspondence between the indexes of the two tree representations.

   nmsd=. 'C:';'n0';'n00';'n01';'n1';'n10';'n100';'n11';'n110';'n111';'n112';'n12'
   nmsb=. 'C:';'n0';'n1';'n00';'n01';'n10';'n11';'n12';'n100';'n110';'n111';'n112'

   #&>nmsd
2 2 3 3 2 3 4 3 4 4 4 3
   >./#&>nmsd                      NB. Longest name is 4 characters long.
4

   tree t1d+1000                   NB. Pad numbers to 4 digits so replacement preserves spacing.
┌─────────────────────────────────┐
│                 +- 1002         │
│        +- 1001 -+- 1003         │
│        |        +- 1005 --- 1006│
│- 1000 -+        |        +- 1008│
│        +- 1004 -+- 1007 -+- 1009│
│                 |        +- 1010│
│                 +- 1011         │
└─────────────────────────────────┘
   $ttd=. >{.tree t1d+1000
7 33
   <($ttd)$(,ttd) rplc ,(":&.>1000+i.12),._4{.&.>nmsd
┌─────────────────────────────────┐
│                 +-  n00         │
│        +-   n0 -+-  n01         │
│        |        +-  n10 --- n100│
│-   C: -+        |        +- n110│
│        +-   n1 -+-  n11 -+- n111│
│                 |        +- n112│
│                 +-  n12         │
└─────────────────────────────────┘
   ttb=. >{.tree t1b+1000           NB. Do the same with the breadth-first version.
   <($ttb)$(,ttb) rplc ,(":&.>1000+i.12),._4{.&.>nmsb
┌─────────────────────────────────┐
│                 +-  n00         │
│        +-   n0 -+-  n01         │
│        |        +-  n10 --- n100│
│-   C: -+        |        +- n110│
│        +-   n1 -+-  n11 -+- n111│
│                 |        +- n112│
│                 +-  n12         │
└─────────────────────────────────┘

We can use a similar display trick to see our grafted tree example from above.

   ttn=. >{.tree 1e5+}.(i.#newtr2),.~newtr2
   <($ttn)$(,ttn) rplc ,(":&.>1e5+i.#newnms2),._6{.&.>newnms2,~&.><6$'-'
┌────────────────────────────────────────────────────┐
│                     ┌─ ---n00                      │
│                     ├─ ---n01                      │
│          ┌─ ----n0 ─┤                     ┌─ new010│
│          │          │          ┌─ -new01 ─┴─ new011│
│          │          └─ --new0 ─┤          ┌─ new020│
│─ ----C: ─┤                     └─ -new02 ─┴─ new021│
│          │          ┌─ ---n10 ─── --n100           │
│          │          │          ┌─ --n110           │
│          └─ ----n1 ─┼─ ---n11 ─┼─ --n111           │
│                     │          └─ --n112           │
│                     └─ ---n12                      │
└────────────────────────────────────────────────────┘

Code

The code so far is File:Tree.ijs