User:Devon McCormick/Interactive3DScatterplots

From J Wiki
Jump to navigation Jump to search

Extending Diffusion-Limited Aggregation to Multiple Dimensions

“Diffusion-limited aggregation” is a simple process whereby an initial cluster of points, typically in two-dimensions, grows by having randomly-released particles stick to this central cluster. This gives us random patterns like the one to the right here. The feathery branches mimic some types of growth in the natural world.

We’ll look at some J code I developed a while ago to build diffusion-limited aggregates. The code was written as an exercise in re-vamping some non-J-like code into a more natural form for the language. The changes also generalized the code to, among other things, extend it to build aggregates in an arbitrary number of dimensions. First we’ll look at the two-dimensional case for which the code was originally developed, then consider a problem with generalizing this to three dimensions: it’s harder to examine a picture of the three-dimensional result. This difficulty motivates us to develop some browser-based code to help us visualize our three-dimensional results.

height="559px",width="519px"

Initializing a Point-set

The initialization code is a logical place to start, after we load the script and put its namespace in our environment:

   load '~Code/DiffLimAgg.ijs'
   coinsert 'dla'               NB. Path in the namespace for easier typing...

The code is defined in the namespace “dla”, so we insert this into our class object path after loading the script so we don’t have to suffix all the names with “_dla_” in the following exposition. This means we can simply enter the name “init” of the initialization function instead of “init_dla_”, as shown here, to see its definition.

init=: 3 : 0
   D=: y            NB. D-dimensions
   RNI=: D dimnbr 1 NB. Relative neighbor indexes: 1 cell around center.
   neigh2=: [ -.~ [: ~. [: ,/ RNI +"1/~ ]   NB. Empty neighbors of these
   PTS=: ,:y$0      NB. Start point list w/Origin.
NB.EG  init 2       NB. Initialize 2-dimensional space.
)

We see that running init with an argument of “2” initializes a two-dimensional list of points, starting at the origin, in the global PTS.

   init 2
0 0
   $PTS
1 2

Next we’ll look at the cover function created during the development of the code to aggregate points to the list according to a random walk. This cover function returns a lot of timing information as this is an important consideration in the development of this code. The original version of this process used a fixed-size Boolean matrix to represent the aggregation. This had the advantage of simplicity and is easy to handle in non-J languages as well as in J.

However, this data structure introduces hard limits at the outset to how large our result can be. On the other hand, a fixed matrix helps the run-time efficiency as it also limits how far from the central aggregate we can release a random point and the distance from a possible “sticking” point is the primary determinant of how long it will take a particle to add the structure.

By changing the data structure to represent the points explicitly, we allow more than two dimensions but we also open the release space to be infinitely large. So, we need a way to limit the release space to be near the existing cluster. We’ve worked out a way to do this by building a perimeter around the existing cluster which we’ll illustrate shortly. First, we’ll continue the logical progression of our code exposition by looking at the next verb we’d run in the sequence.

   do1=: 3 : 'tm,(#PTS),(sv-~#PTS),(tm%~sv-~#PTS),(usus nn),(#nn)%~nn+/ . =>./nn [ tm=. 6!:2 ''nn=. growMany y''[sv=. #PTS'

This rather long line buries the core work done by the growMany verb in a mess of statistics about the run-time performance. It’s easier to show before we explain, so we’ll do so now by giving an example of running this cover function.

   do1 25;20;5 2
0.0298599 5 4 133.959 7 25 22.25 6.02517 0.8

The arguments to do1 are number of steps (25, here) per random walk (before abandoning an attempt), number of points (20) to release, then a pair of numbers denoting how far out from the cluster to we draw the release perimeter (5 cells) and how wide to make this perimeter (2 cells). Note that the number of steps should be greater than or equal to the perimeter distance squared (5^2); “squared” because it’s a two-dimensional space and raising to this power reflects how many steps a random walk should take, on average, to travel that distance. For the higher-dimensional case, we would limit the step count by (5^D) where "D" is the dimensionality.

The results of this run are: number of seconds to complete (about 0.03), total number of resulting points (5), number of points added this time (4), number of points added per second (about 134); the five numbers following these first four give statistics on the number of random steps: the minimum (7), maximum (25), mean (22.25), standard deviation (6), and portion of walks reaching the maximum (80%).

Running this again with the same arguments results in about the same efficiency.

   do1 25;20;5 2
0.00197011 10 5 2537.93 5 25 22.55 5.2463 0.75

The time spent is much shorter but the little importance should be attached to differences between such small numbers. The more important numbers in these initial accretions are the statistics of the number of random steps which are essentially the same as for the earlier invocation. Before we look at the resulting set of points, we’ll run this one more time, doubling the number of points we try (to 40) for a slight increase in efficiency.

   do1 25;40;5 2
0.00315529 22 12 3803.14 2 25 20.975 7.48498 0.7

We can use the standard viewmat verb to visualize the set of points – shown here transposed for efficiency of display - we have so far:

   |:PTS
0 _1 _1 _2 _3 _2  0 _4 _5 _5 _6 _1 _6 1 0 _6 1 _7 0 1 _8  0
0 _1 _2 _2 _1 _3 _3 _2 _1 _3  0  1 _1 0 2 _4 3 _2 4 4 _2 _4
   viewmat 1 bordFill PTS
height="241px",width="209px" We view the result of the utility verb bordFill which converts the explicit list of points into a simple Boolean matrix for display purposes.
 
   bordFill=: 4 : '(1)(x adjPts y)}0$~D$(>:+:x)+(>./y)-<./y'

The left argument is the number of cells to put as a border around the cluster. The adjPts utility shifts the point values to be all non-negative by subtracting the smallest value in each column from each of the rows.

 
   adjPts=: [: <"1 ] -"1 [ -~ [: <./ ]

It also encloses the values in each row so we can use them as indexes to set the cell values in the simple matrix representation of the cluster.

Setting the “Release” Perimeter

This small cluster of points we’ve generated will better help us illustrate how we determine the cluster perimeter into which we release potential new points on their random walks. As mentioned before, the do1 verb is an informative wrapper for the core growMany verb, shown here.

growMany=: 3 : 0
   'maxwalk numpts pp'=. y
   pts=. numpts (]{~ ([<.[:#]) ?[:#]) calcPerim pp
   ctr=. _1 [ rr=. i.0
   while. maxwalk>ctr=. >:ctr *. (0<#pts)
   do. wcc=. PTS chkCollision pts=. walk pts
       if. 1 e. wcc do. addPt wcc#pts
           pts=. pts#~-.wcc [ rr=. rr,ctr$~+/wcc
       end.
   end.
   rr,maxwalk$~#pts
)

The three arguments to growMany are exactly those to do1; they are broken into named groups on the first line of this verb. The second line, where the release perimeter is calculated, is again best illustrated by first showing the sub-function calcPerim in action by itself. Its definition is thus:

calcPerim=: 3 : 0
   NP=: PTS neigh2^:(0{y)]PTS
   edges=. ~.PTS-.~NP -.~ ,/NP+"1/RNI
   edges=. ~.PTS neigh2^:(1{y)]edges
NB.EG PPTS_dla_=: calcPerim 5 2
)

We see that it directly references the global PTS noun in the first line, along with the first of its two inputs (0{y), as inputs to theneigh2 verb. We have already seen this verb as it was defined in the init code because it incorporates a global that depends on the dimensionality of our points. It was necessary to define this in the init function because it is an explicit verb, meaning, among other things, it resolves the global at the point of definition.

Here we see the value of neigh2, as well as its definition, along with the value of the global RNI'' that it incorporates.

   neigh2=: [ -.~ [: ~. [: ,/ RNI +"1/~ ]
   neigh2
[ -.~ [: ~. [: ,/ (8 2$_1 _1 _1 0 _1 1 0 _1 0 1 1 _1 1 0 1 1) +"1/~ ]
   |:RNI
_1 _1 _1  0 0  1 1 1
_1  0  1 _1 1 _1 0 1

This noun RNI, (which stands for Relative Neighbor Indexes) comprises the relative indexes of all the neighbors of a point in two-space, illustrated graphically here:

   3 3$1 2 3 4 0 5 6 7 8{(<''),<"1 RNI
+-----+----+----+
|_1 _1|_1 0|_1 1|
+-----+----+----+
|0 _1 |    |0 1 |
+-----+----+----+
|1 _1 |1 0 |1 1 |
+-----+----+----+

We can run the first line of calcPerim in isolation to see what it does.

   y=. 5 2                     NB. What y would have been…
   $NP=: PTS neigh2^:(0{y)]PTS NB. Array of 2D points
315 2
   viewmat 1 bordFill NP       NB. See what it looks like.
height="202px",width="179px" We can see that the current set of points matches the empty cells in the middle: NP is the set of neighboring points around this set.

Running the next line of calcPerim gives us this:

   $edges=. ~.PTS-.~NP -.~ ,/NP+"1/RNI
82 2
   viewmat 1 bordFill edges
height="202px",width="182px" We see that this gives us only the outermost edge of this surrounding region.

The last line of calcPerim is nearly the same as the first one but it is applied to these edges and uses the second element of y as the argument to the power conjunction.

Here we extend the border further out:
   $e2=. ~.PTS neigh2^:(1{y)]edges
410 2
   viewmat 1 bordFill edges
height="200px",width="182px"
  $1 bordFill edges,e2,NP       NB. Determine which set of points
28 27
   $1 bordFill edges,e2         NB. encompasses the largest bounding box
28 27
   $1 bordFill edges            NB. and sum the different sets to get all
24 23
   $1 bordFill e2               NB. in one picture with distinctive colors.
28 27
   viewmat (1 bordFill e2)+(1 bordFill edges,e2)+1 bordFill edges,e2,NP
height="202px",width="182px" So we can see the point set outlined in the center in blue and the result from calcPerim as the magenta border: this is the perimeter set of points from which we’ll choose to release our random walkers. This helps us understand that the second line of growMany selects numpts points from this perimeter.
   pts=. numpts (]{~ ([<.[:#]) ?[:#]) calcPerim pp

Looking at the next few lines after this,

   ctr=. _1 [ rr=. i.0
   while. maxwalk>ctr=. >:ctr *. (0<#pts)
   do. wcc=. PTS chkCollision pts=. walk pts

we see a counter ctr and result set rr being initialized, the start of a while loop, and the check for a collision of any of the points after they’ve been walked one random step. The last two lines within the while loop take care of adding any collided points to the global PTS and removing them from the set being randomly walked. The result set rr keeps track of how many steps were required to add any new points by repeating the current value of the counter by the number of newly-added points at any step.

       if. 1 e. wcc do. addPt wcc#pts
           pts=. pts#~-.wcc [ rr=. rr,ctr$~+/wcc

The final line of growMany returns the resulting new points to be added to the set as well as a count of how many released points reached the random walk limit without being added to the set.

   rr,maxwalk$~#pts

The two sub-functions referenced above are simply this:

   addPt=: 3 : 'PTS=: PTS,y'
   chkCollision=: 4 : '+./"1 (y+"1/RNI) e. x'"_

Looking again at the wrapper do1 for growMany, we can now better appreciate how it works:

      do1
3 : 'tm,(#PTS),(sv-~#PTS),(tm%~sv-~#PTS),(usus nn),(#nn)%~nn+/ . =>./nn [ tm=. 6!:2 ''nn=. growMany y''[sv=. #PTS'

We see that we start, going from right to left, by saving the number of starting points, then we time how long growMany takes to give us the result nn which is the vector of random walk lengths. The major part of the line uses these results to produce statistics to give us an idea of how efficiently we’re adding points to the cluster.

A Problem with Three Dimensions

Notice how useful we found the viewmat verb in the preceding exposition. This was equally true during the development of this code. Being able to look at a picture of the points being generated helped guide development of these algorithms and provided assurance that we were doing sensible things. However, when run this code to produce a three-dimensional DLA, as shown here, we face the problem of visualizing these new results.

   init 3
0 0 0
   do1 125;20;5 2
0.0243368 3 2 82.18 17 125 114.25 33.0882 0.9
   do1 125;20;5 2
0.0298845 5 2 66.9244 60 125 120.85 14.8759 0.9
…
   do1 150;100;5 2
0.14899 120 30 201.355 4 150 117.93 51.3275 0.7
   $PTS
120 3

So, it looks like we have 120 three-dimensional points. What do they look like?

Searching the web for a tool with which to visualize this point set yielded little satisfaction. Ideally, we’d like not only to be able to see the points but to move the display in real-time to look at the DLA from different angles. There is some functionality of this type in the R environment, but it appeared to be somewhat clumsy to use and it requires moving the data into that environment as well.

I did find a set of three-dimensional visualization scripts called “Dex” – a “data explorer” – that is based on the popular Javascript graphing package “d3.js”. It didn’t do exactly what I wanted but appeared to be close. So, in just a couple of hours, I was able to put together something that worked like this for our sample set of points.

   buildView PTS;(0$~#PTS);'sample3D120pts';'Sample of 120 three-dimensional points'
3072

This shows our points in a browser in a form easily rotated by moving the cursor around it, looking something like this: height="357px",width="346px"

Looking Ahead

The code is currently much cruder than I’d like, but it works fine for this case. In the future, I’d like to better integrate it into the JHS environment and produce the HTML more neatly but it’s fine for now. The J routines look like this:

buildView=: 3 : 0
   'pts clrs flpfx title'=. 4{.y
   if. 0=#title do. title=. flpfx end.

   tmplt=. fread 'ScatterPlot3D.tmplt'
   tmplt=. (tmplt rplc '{title}';title) rplc '{thisScat3D}';flpfx,'.js'
   (tmplt rplc '{points}';cvt2d3Array pts) fwrite flpfx,'.htm'
   
   set0=. (<cSets) rplc &.> (<'{i.#pts}');&.>":&.>i.#pts
   set1=. ;set0 rplc&.> (<'{colorIx}');&.>":&.>clrs
   mystmplt=. fread 'ScatterPlot3D1_files/myScatterPlot3DJS.tmplt'
   subfns=. CR-.~mystmplt rplc '{colorsets}';set1
   subfns fwrite 'ScatterPlot3D1_files/',flpfx,'.js'
)

cvt2d3Array=: 3 : '}:_1|.;}.&.>;&.><"1 (<''],[''),.~'','',&.>j2n&.>y'
rplc=: stringreplace            NB. From stdlib
j2n=: 'S'&J2StdCvtNums          NB. Convert numbers between J
J2StdCvtNums=: 3 : 0            NB.  and "standard" representations.
NB.* J2StdCvtNums: convert char rep of num from J to "Standard", or
NB. vice-versa if left arg is 'J' or '2J'; optional 2x2 left argument allows
NB. 2 arbitrary conversions: col 0 is "from", col 1 is "to" char.
NB. Monadic case changes Standard numeric representation to J representation.
   (2 2$'-_Ee') J2StdCvtNums y
:
   if. 'S'=x do. if. ' '={.,y do. y return. end. end.
   if. 0=#y do. '' return. end.
   pw16=. 0j16                        NB. Precision width: 16 digits>.
   diffChars=. 2 2$'-_Ee'             NB. Convert '-'->'_' & 'E'->'e'
   toStd=. -.'J'-:''$'2'-.~,x         NB. Flag conversion J->Standard
   if. 2 2-:$x do. diffChars=x        NB. if explicit conversion.
   elseif. toStd do. diffChars=. |."1 diffChars end.   NB. Convert other way
   if. 0=1{.0$y do.                   NB. Numeric to character
       fmts=. (8=>(3!:0)&.>y){0,pw16  NB. Full-precision floats only
       y=. fmts":y                    NB. If this is too slow, go back
   end.

   y=. y-.'+'                         NB. EG 1.23e+11 is ill-formed & the
   wh=. y=0{0{diffChars               NB.  '+' is unnecessary.
   cn=. (wh#1{0{diffChars) (wh#i. $y)}y    NB. Translate chars that need it
   wh=. y=0{1{diffChars                    NB.  but leave others alone.
   cn=. (wh#1{1{diffChars) (wh#i. $cn)}cn
   if. -.toStd do.                    NB. Special handling -> J nums
       if. '%'e. cn do.               NB. Convert nn% -> 0.nn
           cn=. pw16":0.01*".cn-. '%'
       end.
       cn=. cn-.','                   NB. No ',' in J numbers
   end.
   cn
NB.EG 'S' J2StdCvtNums _3.14 6.02e_23 NB. Convert J numbers to std rep
)

The main buildView routine simply customizes a pair of HTML templates by inserting the points, converted into a form suitable for the Javascript routines, and some title information into the templates. The main template, named ScatterPlot3D.tmplt, is assumed to be in the current directory. This is where the data points and the display title are inserted.

The second argument to buildView is a vector of colors corresponding to the matrix of points to be shown. These colors, currently 17 of them, are defined in the other template, called myScatterPlot3DJS.tmplt and assumed to be in the subdirectory ScatterPlot3D1_files. The Javascript code to set the colors is more elaborate than desired and is generated by buildView to be inserted in this latter template. This second template is named to match the main HTML file being created and written to the sub-directory. The name of these files, rather the file name prefix, is designated by the third element of the argument to buildView.

The fourth and final argument to buildView is text to title the resulting display.

These two templates are about 101, and 341 lines long, respectively.