User:Devon McCormick/DLA00

From J Wiki
Jump to: navigation, search


Diffusion-limited aggregation ("DLA") is a clustering of random particles according to simple rules: particles accumulate into a connected mass by randomly wandering around the mass until they encounter a particle already in the mass, at which point the new particle "sticks". A more complete definition is here.

This is an exercise in re-vamping J code to improve its functionality and to introduce concepts unique to J to coders coming from conventional programming backgrounds. This initial version of the code, found on the J wiki, simulates diffusion-limited aggregation but is written in a "conventional" manner (it loops unnecessarily and uses a fixed-size array) and has some other flaws as well.

After these initial, somewhat superficial changes, we'll consider more substantive issues of design and efficiency in another section. We also take a look at running the revised code and consider how we might measure its efficiency.

Initial Changes

For the first set of revisions, the goal is to tidy up the code only superficially but to keep it working largely the same. The exception to this is to include my first bug-fix to better check the out-of-bounds condition on which it was failing - "check_oob" in "grow" below.

So, we start with a leading comment in which we name the file and briefly describe what it does.

NB.* cdla6Revised0.ijs: Diffusion-limited aggregation from
NB. as initially revised.

Then, some basic inclusion and establishing our namespace since these logically have to come first.

load 'coutil'
cocurrent 'dla'

Now, the first major, though superficial, change is to move the main routine to the beginning so we immediately have an overview of what we're doing.

NB.* grow: grow cluster by random DLA, one point at a time.
grow=: 3 : 0
   c=. 0
   point=: SZ release CTR
   while. y > c do.
       point=: walk point
       if. (point outOfBounds CTR)+.check_oob point do.
           point=: SZ release CTR
       else. if. check_collision point do.
               add_to_cluster point
               SZ=: CTR dist point
       c=. >: c

I've tightened up the indenting as three or four spaces are enough for visual distinction without causing lines to be unnecessarily long. Also, I've incorporated the common standard of naming global variables in all upper-case.

Now, we set some initial values, again to be in logical order of processing, whereas putting the main routine first is for ease-of-understanding.

This is the original code:

NB. State settings
size =: 0 NB.Keep track of the size of the cluster
nbs =: 1 1j1 0j1 _1j1 _1 _1j_1 0j_1 1j_1 NB. Neighbors to a point in complex numbers
space =: 500 500 $ 0                     NB. Lattice to grow the cluster

NB. Operations and helpers
getCenter =: (j./) @ ( -: @ $)         NB. Get the center of the lattice
seed =: 3 : '1 (< +. getCenter y) } y' NB. place a single point in the center of lattice

In the original code, there is also this initialization section near the end of the file to be near the main routine which was at the very end:

NB. Init
center =: getCenter space
space =: seed space

Here is our revision to the first part of this in which we standardize names and formats of initial comments.

NB. State settings: initialize grid, set indexes of center.
SZ=: 0                   NB.* SZ: Track the maximum extent of the cluster
NB.* NBS: Neighboring points as complex numbers
NBS=: 1 1j1 0j1 _1j1 _1 _1j_1 0j_1 1j_1
SPC=: 0$~1000 1000       NB.* SPC: Lattice on which to grow cluster

Note how global names are short, because they'll be used frequently, and all upper-case to make them stand out. Also, the shortness of the names is mitigated by a defining comment where they are initialized. Note the form of an initial definition comment: "NB.* {name}: {short explanation}". This makes it easy to find these kinds of comments.

Tightening Code

This is fine so far, but the next section could use some work. I already shortened the assignments of "center" and "SPC" in an earlier pass through the code.

NB. Operations and helpers
getCenter =: (j./) @ ( -: @ $)         NB. Get the center of the lattice
seed =: 3 : '1 (< +. getCenter y) } y' NB. place a single point in the center of lattice

center=: getCenter SPC=: seed SPC

The first things to realize are that "getCenter" is only ever used once in "seed" and "seed" is only ever used once to set the global "center". It's not helpful to set aside a name, and the associated cognitive effort, for something that only ever gets used a single time. It makes more sense to combine these single-use phrases into a longer phrase and to comment this to ameliorate the effort of understanding the denser phrase.

Also, an interesting simplification happens when we start combining these. To see this, first take the expression named "getCenter" without the unnecessary parentheses or spaces, j./@(-:@$), and insert this into the definition of "seed", giving us this:

seed=: 3 : '1(<+.(j./@(-:@$))y)}y'

If you understand the J verbs "j." and "+.", you know that they are inverses - so why are they right next to each other like that? Once we see that these two things essentially cancel out each other, we can simplify this definition further:

   seed0=: 3 : '1(<(-:@:$)y)}y'

Verify that the old and new definitions give the same result on a table of random, complex numbers:

   (seed -: seed0) j./2 10 10?@$0

We define the new version with a different name and test it for equivalence with the more complex version. We might at this point also notice a potential error in this expression: what if the dimensions of the argument are not even numbers? Obviously this will fail.

   seed 3 3$0
|domain error: seed
|   1    (<+.(j./@(-:@$))y)}y
   seed0 3 3$0
|domain error: seed0
|   1    (<(-:@:$)y)}y

So, we risk changing the functionality of the code here because we can eliminate an obvious error. We'll risk displacing the "center" by one in each dimension, if the grid has odd-numbered dimensions, by rounding down the halving. However, testing this reveals something else.

   seed0=: 3 : '1 (<(<.@-:@$)y) }y'
   seed0 3 3$0
0 0 0
0 1 0
0 0 0
   seed0 4 4$0
0 0 0 0
0 0 0 0
0 0 1 0
0 0 0 0

Oops, the original expression works on even-dimensioned arrays but cannot actually center a seed in that case. We can only hit the exact center for odd-dimensioned arrays.

One final change: since we are defining this explicitly, we don't need so many parens and atops ("@:"). We can simplify this further to:

   seed0=: 3 : '1 (<<.-:$y) }y'

Of course, we test each change. In this case, our first test fails since the older version does not handle odd-dimensioned arrays, so we revise the test to get success.

   (seed -: seed0)&> 0$~&.>_2<\2#2+i.10
|domain error: seed
|   1    (<+.(j./@(-:@$))y)}y

   (seed -: seed0)&> 0$~&.>_2<\2#+:2+i.10
1 1 1 1 1 1 1 1 1 1

In any case, this digression to re-define "seed" is moot because, since it's only ever used once, there's no point in naming it. So, the original set of helper functions and initialization reduces to this:

SPC=: (3 : '1 (<<.-:$y) }y') 1000 1000$0     NB.* SPC: seed lattice in center.

I should note that "center" is actually used a few times in the main routine, so I'll define a global for it separately.

CTR=: (3 : 'j./<.-:$y') SPC                  NB.* CTR: center point of lattice.

This raises an issue to deal with in subsequent refinements to this code: the limitations of using complex numbers to deal with two-dimensional arrays. If you study this code, much of the work involves breaking apart complex numbers into two-element integer vectors and putting these vectors back into complex form. Not only does this seem unnecessary, it limits our application to only two dimensions. It seems that if we used two-column arrays instead of vectors of complex numbers, we might be able to extend the dimensionality of the DLA with little additional work. We can look at this later, though.

Simplifying Central Routines

Continuing with our initial code simplification, we see that only two of these three verbs are actually used, so we'll get rid of the unnecessary ones.

csign=: (+.^:_1) @ ( * @ +.)          NB. get the 'sign' of a complex number i.e 0.25j_0.5 -> 1j_1
walk=: 3 : 'y + (? # NBS) { NBS'      NB. Simple form of random walk
walk2=: 3 : 'y + (? 1 +  # NBS) { NBS, csign center - y' NB. Baised toward center of lattice

Also, eliminate unnecessary spaces and comment the initial definition in the standard way:

NB.* walk: Simple form of random walk: one-step.
walk=: 3 : 'y+(?#NBS){NBS'

The next few routines can all be shortened to be more succinct.

NB. Calculate the max radius of the cluster
dist =: 4 : 0
        d =:(0 { *. (x - y))
        >.(size >. d )
NB. release a new particle distance x + 5 from center y
release =: 4 : 0
        r =. (x + 5) r. (2p1 * ? 0)
        <. r + y
add_to_cluster =: 3 : 0
        space =: 1 (< +. y) } space

Also, this last one, "add_to_cluster", strikes me as a bit off since it's assigning a global internally when it could be generalized to take the lattice as an argument and return a result. This would be more aligned with the functional paradigm of J than the pseudo-OO style prevalent here, but, since this would require a change to the main routine, I'll leave it for now.

So we get the shorter versions in which I've retained a few of the internal spaces where I think it helps clarify the code.

NB.* dist: Calculate the max radius of the cluster
dist=: 4 : '>.SZ>.0{*.x-y'

NB.* release: Release a new particle distance x + 5 from center y
release=: 4 : '<.y+(x+5) r. 2p1*?0'

NB.* add_to_cluster: add point to cluster.
add_to_cluster=: 3 : 'SPC=: 1 (<+.y) } SPC'

The following group again defines single-use names which would be better folded together. Also, the repetition in the definition of "drange" is a clue that it could be more succintly defined. Finally, the double-axis summation in "check_collision" has a couple of flaws: it's specific to two dimensions and it returns a result - the sum of the cells in the neighborhood - that we don't really use.

NB. This probably could be improved. Used to find if a point has landed on the cluster
NB. Look at the 8 neighbors around a point and see if there are any occupied cells.
range=: <:,],>:
drange=: (range@(0&{@+.)); (range@(1&{@+.))
check_collision=: 3 : '+/+/(< drange y) { SPC'

By its name and use, "check_collision" should be returning a logical result, not a summation, the value of which is ignored.

However, let me get down from my high horse for a moment to acknowledge that this code taught me a bit of J I didn't know. The form of the argument generated by "drange" is a boxed pair of "row indexes" and "column indexes" which, when enclosed, are used orthogonally to extract the corresponding rectangle from the lattice matrix.

For example,

   drange 10j20
|9 10 11|19 20 21|
   (<drange_dla_ 10j20) { (10*i.40) +/ i.40
109 110 111
119 120 121
129 130 131

I did not know this form of indexing before I saw it used here.

Anyway, the preceding code gets reduced and checked for equivalence to the original version.

NB. This probably could be improved: finds if point has landed on cluster by
NB. looking at the its 8 neighbors to see if there are any occupied cells.
NB.* check_collision: check if point within 1 cell of another point.
check_collision0=: 3 : '1 e. ,(<([:<"1[: ]_1 0 1 +/~+.) y) { SPC'

   (check_collision-:check_collision0)"0 ] 400j400 500j500 501j501 502j502
1 1 1 1

Once it looks OK, we change the definition to the new one.

Finally, we look at the code to check if a point has wandered too far away from the cluster or off the edge of the lattice.

NB. Check to see if the point has wandered far away from the cluster.
outOfBounds =: 4 : '(size + 10) <(0 { *. (x - y))'

I added "check_oob" here and use it in the main routine because the existing code would fail if the point got out of indexing range for the lattice.

NB.* check_oob: Check if point close to out-of-bounds (within 1 of any edge).
check_oob=: 13 : '(2+./ . >:y)+. y+./ . >:$SPC [ y=. >:+.y'
NB.* outOfBounds: Check if point has wandered far away from cluster.
outOfBounds=: 4 : '(SZ+10)<0{*.x-y'

Conclusion of First Phase

This last verb brings up a larger, design question: when do we decide to abandon a point and try a new one? This question is implicitly answered by the form of the arguments to the main "grow" routine: we get a limit to the number of iterations we're willing to try before releasing a new particle: this controls the outermost "while." loop.

However, larger design questions will have to wait for the next phase of revisions. First, we have to run the revised code to ensure that it behaves the same - outside of bug-fixes - as the old code.

Also, before we move one, we probably want to take a look at how the revised code runs.