Essays/Tree Sum

From J Wiki
Jump to: navigation, search

Introduction

We represent a tree as a list p of indices where i{p is the parent of node i . (i=i{p if node i is a root and is parentless.) For example:

   p=: _1 p: i.15
   (i.15) ,: p
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
0 0 0 1 2 2 3 3 4 4  4  4  5  5  6

                  ┌─ 6 ─── 14
     ┌── 1 ─── 3 ─┴─ 7
     │            ┌─ 8
─ 0 ─┤      ┌─ 4 ─┼─ 9
     │      │     ├─ 10
     └── 2 ─┤     └─ 11
            │     ┌─ 12
            └─ 5 ─┴─ 13

List v are numeric values for each node. The tree sum of a node is the sum of the values of the subtree rooted at that node. (This is also known as summarized explosion or bill of materials.) Given p and v , find the tree sums of all the nodes.

   v=: 15 ?.@$ 20
   (i.15) , p ,: v
0  1  2  3  4  5 6  7 8  9 10 11 12 13 14
0  0  0  1  2  2 3  3 4  4  4  4  5  5  6
6 15 19 12 14 19 0 17 0 14  6 18 13 18 11

                                           ┌─ 6  ( 0 11) ─── 14 (11 11)
             ┌── 1 (15  55) ─── 3 (12 40) ─┴─ 7  (17 17)
             │                             ┌─ 8  ( 0  0)
─ 0 (6 182) ─┤               ┌─ 4 (14 52) ─┼─ 9  (14 14)
             │               │             ├─ 10 ( 6  6)
             └── 2 (19 121) ─┤             └─ 11 (18 18)
                             │             ┌─ 12 (13 13)
                             └─ 5 (19 50) ─┴─ 13 (18 18)

   v tsum p
182 55 121 40 52 50 11 17 0 14 6 18 13 18 11

In the diagram above, each node is labelled with its index, value, and tree sum.

The following verb can be used to generate test data. dataset k creates trees with _1+2^k nodes.

testdata=: 3 : 0
 m =: _1+2^y
 v =: m ?@$ 20
 pa=: _1 p: i.m
 pb=: 0,2#i.<.m%2                           NB. complete binary tree
 pc=: 0>.<:i.m                              NB. tall skinny tree
 pd=: m ($,) (c*i.c) ([,.+/) i.<:c=. <.%:m  NB. forest
 i.0 0
)

Adjacency Matrix

The tree sum is the product of the reflexive-transitive closure of the adjacency matrix and the list of values. Thus:

am   =: i.@,~@# e. # #. ] ,. i.@#
rc   =: +. =@i.@#
tc   =: +./ .*.~^:(2>.@^.#)
tsum_mat=: tc@rc@am@] +/ .* [


   {&'.x'&.> (am ; rc@am ; tc@rc@am) p
┌───────────────┬───────────────┬───────────────┐
│xxx............│xxx............│xxxxxxxxxxxxxxx│
│...x...........│.x.x...........│.x.x..xx......x│
│....xx.........│..x.xx.........│..x.xx..xxxxxx.│
│......xx.......│...x..xx.......│...x..xx......x│
│........xxxx...│....x...xxxx...│....x...xxxx...│
│............xx.│.....x......xx.│.....x......xx.│
│..............x│......x.......x│......x.......x│
│...............│.......x.......│.......x.......│
│...............│........x......│........x......│
│...............│.........x.....│.........x.....│
│...............│..........x....│..........x....│
│...............│...........x...│...........x...│
│...............│............x..│............x..│
│...............│.............x.│.............x.│
│...............│..............x│..............x│
└───────────────┴───────────────┴───────────────┘
   (tc rc am p) +/ .* v
182 55 121 40 52 50 11 17 0 14 6 18 13 18 11
   v tsum_mat p
182 55 121 40 52 50 11 17 0 14 6 18 13 18 11

Descendants

An adjacency matrix is impractical for large trees. For example, a tree with a million nodes has an adjacency matrix of size 1e6 by 1e6, requiring at least 1e12 bytes. A more efficient alternative is to work with lists of descendants, starting with lists of sons.

sons=: (~. i. i.@#) { a: ,~ (</. i.@#)
gen =: 3 : '(*c) #^:_1 (I.c) <@;/. (;y){y [ c=. #&> y'
ad  =: 4 : '(*c) #^:_1 (I.c)  +//. (;y){x [ c=. #&> y'

tsum_des=: 4 : 0
 x=. x + t=. x ad d=. (sons y) -.&.> i.#y
 while. 0 +./@:~: t do. x=. x + t=. x ad d=. gen d end.
)
   v tsum_des p
182 55 121 40 52 50 11 17 0 14 6 18 13 18 11

gen d computes the descendants between generations k and 2*k , if d are the descendants between generations 0.5*k and k . Starting with the list of sons, the loop will terminate after at most 2^.#p iterations.

tsum_des is effective for trees with hundreds of thousands of nodes.

Depth First Search

In scalar-oriented languages an efficient way to solve this problem can be described in 3 words: depth first search. That is:

treesum(x) is

  • value(x) if x is a leaf
  • value(x) + +/ treesum"0 (sons of x) if not

The time and space should be within a small factor of the time and space to do +/\values .

The following solution is based on the depth first search idea. In each iteration, tree sums for the leaves are added to the parents and then the edges with leaves at one end are removed. The iteration terminates when no edges remain.

tsum_dfs=: 4 : 0
 e=. y (~: #"1 ,:) i.#y
 while. {:$e do.
  b=. e.~/e
  'p q'=. (-.b)#"1 e
  e=. b#"1 e
  j=. ~.p
  x=. ((j{x)+p+//.q{x) j}x
 end.
 x
)

tsum_dfs is more efficient than the others for trees that are shallow relative to the number of nodes (say, logarithmic depth), and less so for tall skinny trees.

Collected Definitions

testdata=: 3 : 0
 m =: _1+2^y
 v =: m ?@$ 20
 pa=: _1 p: i.m
 pb=: 0,2#i.<.m%2                           NB. complete binary tree
 pc=: 0>.<:i.m                              NB. tall skinny tree
 pd=: m ($,) (c*i.c) ([,.+/) i.<:c=. <.%:m  NB. forest
 i.0 0
)

am   =: i.@,~@# e. # #. ] ,. i.@#
rc   =: +. =@i.@#
tc   =: +./ .*.~^:(2>.@^.#)
tsum_mat=: tc@rc@am@] +/ .* [

sons =: (~. i. i.@#) { a: ,~ (</. i.@#)
gen  =: 3 : '(*c) #^:_1 (I.c) <@;/. (;y){y [ c=. #&> y'
ad   =: 4 : '(*c) #^:_1 (I.c)  +//. (;y){x [ c=. #&> y'

tsum_des=: 4 : 0
 x=. x + t=. x ad d=. (sons y) -.&.> i.#y
 while. 0 +./@:~: t do. x=. x + t=. x ad d=. gen d end.
)

tsum_dfs=: 4 : 0
 e=. y (~: #"1 ,:) i.#y
 while. {:$e do.
  b=. e.~/e
  'p q'=. (-.b)#"1 e
  j=. ~.p
  x=. ((j{x)+p+//.q{x) j}x
  e=. b#"1 e
 end.
 x
)



Contributed by Roger Hui. The tree sum problem was posed by Raul Miller to the J programming forum on 2008-03-11.