NYCJUG/2009-08-11
NYCJUG meeting, trees, walk directory tree, advantage of terseness, brevity, BAPLA'09, supercollider, Computer Science education, mathematics
Meeting Agenda for NYC JUG 20090811
1. Beginner's regatta: a simple recursive adverb to walk a directory tree, applying an arbitrary verb at each node - see "Walk Directory Tree.doc". 2. Show-and-tell: some comments on the BAPLA'09 conference and an excerpt from my talk there - see "Terser is Better.doc". Also, a counterpoint to this - see "Terse Confusion.doc". 3. Advanced topics: amending boxed arrays (Thomas) - see "Item Amend_Amend and Merge.doc" and "Mapped Boxed Arrays.doc". 4. Learning, teaching and promoting J: continuing difficulties in Computer Science education - see "US Computer Science Heading in Wrong Direction.doc", "WhyCSDoesn'tMatter.doc" and "Are We Losing Our Ability to Think Critically.doc". A look at another language - SuperCollider - and some ideas about extending it with concepts from J - see "J Concepts in SC.doc" et al.
Proceedings
We talked a lot about trees, both in the context of walking a directory tree and more generally in the context of generalized updating of a deeply-nested boxed array. We also discussed the arguments both for and against terse notation. We discussed briefly the spotty relation between skills and education in programming versus mathematics. Finally, we looked at a sound-synthesis language that has been explicitly influenced by J.
Beginner's Regatta: Walk Directory Tree
from Devon McCormick <devonmcc@gmail.com> date Fri, Jul 17, 2009 at 6:27 PM subject Climbing a tree
Members of the Forum -
I've recently been playing with a useful utility I'm calling "climb" because it climbs a directory tree, doing something at each node. This "doing something" is user-specifiable, so this naturally lends itself to an adverbial construction, to whit:
climb=: 1 : 0 svdir=. 1!:43 '' 1!:44 svdir [ rr=. u y [ 1!:44 y subds=. ((('d'e.&>4{"1])#0{"1])@:(1!:0@<)) y,'\*.*' NB. Only subdir names if. 0~:#subds do. rr=. rr;(u climb)&.>(<y,'\'),&.>subds end. )
The only very complicated line is the one assigning the subdirectory variable "subds". I had originally written this
subds=. jd dir y,'\*.*' NB. Only subdir names
but have since substituted the definitions for "jd" (just directories) and "dir" since this makes the function self-contained.
I'm interested in comments on a couple of design choices. First of all, is the name good? The more normal terminology has one "walk" a directory tree. In fact, the Rosetta code page "Walk Directory Tree" (http://rosettacode.org/mw/index.php?title=Walk_Directory_Tree) presents an application of this code though the problem statement is poorly worded. It states that this algorithm will "[w]alk a given directory tree and print files matching a given pattern" when it apparently means not "print files" but "display file information", based on the submissions.
Of course, the J function here is more general as it allows one to do anything while climbing the tree.
Another design choice has us changing directories to run the supplied verb but it's not clear this is really necessary.
Yet another choice is to make this recursive in order to simplify the logic. However, if we were to handle the sub-directory navigation more explicitly, this opens up the possibility of being able to specify either "depth-first" or "breadth-first" processing of the tree rather than the implicit "depth-first" approach of recursion.
Finally, when I first attempted to use this to list files matching a certain pattern, I had an initial problem specifying my verb. My first attempt,
aa=. (dir&'*Bis-B*.csv') climb 'C:\amisc'
failed. When I examined the behavior of my verb - which intentionally ignores the default argument inside "climb" of the current directory path - I found this:
(dir&'*Bis-B*.csv') 'C:\amisc' |length error: dir | (dir&'*Bis-B*.csv')''
Playing around with some variations led me to this
aa=. ((dir@:])&'*Bis-B*.csv') climb 'C:\amisc'
which worked fine but I'm not clear on why my initial attempt to bind the right argument of "dir" fails.
In any case, this is a useful utility that offers some room for improvement but I've already used it successfully in a couple of different ways.
from Devon McCormick <devonmcc@gmail.com> date Sun, Jul 19, 2009 at 2:04 PM
On Sun, Jul 19, 2009 at 1:47 PM, Zsbán Ambrus <ambrus@math.bme.hu> wrote:
... I think it's called _descending_ a directory tree because computer scientists have their trees upside down. Or just call it find. > However, if we were to handle the sub-directory navigation more > explicitly, this opens up the possibility of being able to specify either > "depth-first" or "breadth-first" processing of the tree rather than the > implicit "depth-first" approach of recursion. The term "depth-first" also implies a tree growing downward. Ambrus
These are good points. However, coming from the APL/J world, I've always felt little incentive to observe computer "science" tradition in the cases where it seems backwards, like upside-down trees. At the same time, I'd like to strike a balance as it's an ongoing problem with J that we use a different vocabulary than most everyone else and this presents a non-essential barrier to entry.
So, on the basis of spurning mistaken tradition, I would prefer "climb" to "descend". The word "find" is perhaps too broad though my "climb" is very much like the Unix "find", so that's maybe the way to go.
Bjorn chided me for incompleteness –
from Björn Helgason <gosinn@gmail.com> date Fri, Jul 24, 2009 at 4:13 AM
When I a wanted to try out your interesting example here I discovered that it needed a bit more to run.
To begin with, the dir command is missing
require'dir'
Then a simple example
aa=.dir climb 'C:\j601'
And something to display the results
,.aa
So like this to start with
require'dir' climb=: 1 : 0 svdir=. 1!:43 '' 1!:44 svdir [ rr=. u y [ 1!:44 y subds=. ((('d'e.&>4{"1])#0{"1])@:(1!:0@<)) y,'\*.*' NB. Only subdir names if. 0~:#subds do. rr=. rr;(u climb)&.>(<y,'\'),&.>subds end. ) aa=.dir climb 'C:\j601' $aa 5 $ &.> aa +-----+-+-+-+-+ |25 54|5|6|2|6| +-----+-+-+-+-+ ,.aa
The results from the example is still a bit complicated for a beginner I guess but it sure looks interesting.
I re-worked the function based on experimenting with its use. The final version is more well-commented and has a line at the end with a usage example. Also, I changed the order to extract the sub-directory names before applying the verb so that those names are available to it.
NB.* walkDirTree: walk directory tree performing "u y" at each node where NB. "y" is current directory name. walkDirTree=: 1 : 0 subds=. ((('d'e.&>4{"1])#0{"1])@:(1!:0@<)) y,'\*' NB. Only subdir names rr=. u y NB. Do it in current, then look to subdirs if. 0~:#subds do. rr=. rr;(u walkDirTree)&.>(<y,'\'),&.>subds end. NB.EG ] walkDirTree 'C:\Temp' )
My experiments revealed some interesting behavior: apparently the operating system (Windows XP) caches the result of a directory sweep on a local drive, so subsequent sweeps run much more quickly. The following screenshots illustrate this.
Some Apparent OS Caching on Local Drives When Walking Directory Tree
Here we see how the initial run takes almost 563 seconds but incurs no obvious system load (as shown by the Process Explorer information on the right.)
However, a subsequent run, even in a separate J session, completes in about five seconds.
This behavior does not appear to extend to network drives: drive “E:” shown here.
This display shows an “.ijx” window on top and a jee session running under emacs below.
All timings take over 300 seconds when sweeping the network drive.
Final Version
In the meeting, Thomas pointed out how I could simplify the code by getting rid of the "if" statement and simply applying the recursion to any subdirectories found - if there are none, the function returns an empty result. This gives us what I now consider to be my final version:
NB.* walkDirTree: walk directory tree performing "u y" at each node where NB. "y" is current directory name. walkDirTree=: 1 : 0 subds=. ((('d'e.&>4{"1])#0{"1])@:(1!:0@<)) y,'\*' NB. Only subdir names rr=. u y NB. Do it in current, then look to subdirs rr=. (<rr),(u walkDirTree)&.>subds ([,~&.>[:<'\',~]) y )
I obtain the subdirectories prior to invoking the user-supplied function with the idea of having that information available to the invocation.
Show-and-tell
When I put together my talk on J for the BAPLA'09 conference (videos here), I was initially daunted by the prospect of filling up forty minutes, so I pulled in a lot of material. However, once I was in the midst of the talk, I realized that I had far too much material and had to skip and minimize certain sections. It was like my one talk was two or three talks jammed together.
This is an attempt to extract one of those two or three talks into a stand-alone topic: the argument in favor of terseness. What follows are my slides with a little commentary sandwiched between each.
Terser is Better
Here’s a very simple problem with a very simple solution – why shouldn’t the code reflect this simplicity?
The common view is that less code is less legible. Meanwhile, people typically emphasize parts of their solution that are irrelevant, like thread-safety and parallelizability: how are high-performance, heavy-duty concepts like these relevant to such a trivial problem?
Along the same lines, how is “only 44 lines of code” in any way commensurate with this simple problem? That is a lot of code for a very trivial algorithm.
This equation of terseness to unreadability is common even where we should know better – in the APL community! We should turn this argument around: why is a page of baby-talk better than a few lines of concise, high-level, well-articulated expression?
Even though the APL and J solutions are superficially different, they are actually very similar. We can even line up the single line solution from each language to see how they both do the same things in the same order. Here we see them lined up with the same part of each line doing essentially the same thing.
Having a succinct solution allows us more flexibility to play with variants and discover potentially useful generalizations. This is something we see APL people do quite frequently: present a simple, succinct solution, then play with it to explore the boundaries of the solution and how it might be generalized. You don't have time for this once you've labored to put together many pages of code like you would in most conventional languages.
To be fair, we must concede the danger of being too telegraphic. However, the ease of reaching a workable solution allows us the luxury of mitigating some of the difficulty of parsing dense code by using a small, domain-specific language as shown here.
But to emphasize the folly of verbosity = readability, consider the following. Detail at too fine a level can obscure what’s important about an algorithm. There probably is some actual processing being done here that's relevant to the problem, it's just hard to find in the morass of irrelevant detail.
Another Illustration of the Virtue of Brevity
Here we show a fairly simple spec for calculating a check-digit.
The amount of code should reflect the complexity of the algorithm which, in this case, is not much.
Since this was so easy to code in the first place, we have plenty of bandwidth to break down the steps into little pieces to demonstrate our intermediate results to verify that we’re doing what the spec requires.
Here, we see the small, domain-specific language - or little language as Jon Bentley would have it - we’ve customized for this problem.
Since the implementation of the spec is so terse, it’s easy to compare differing implementations as shown by the last two lines of code here: the differences are obvious. Contrast this with the difficulty of comparing algorithms of even one or two pages.
See also Jon Bentley, of Programming Pearls" fame, on "little languages":
A "Programming Pearls" column on "little languages" (subscription required)
examples of good "little languages"
gloss on Jon Bentley's "little languages" idea
Advanced topics
Thomas spoke to us about the problems with amending deeply nested arrays using map }:: .
Learning and Teaching J
Some essays on the continuing difficulties in Computer Science education - see "US Computer Science Heading in Wrong Direction.doc", "WhyCSDoesn'tMatter.doc" and "Are We Losing Our Ability to Think Critically.doc"
J Concepts in SC
SuperCollider is a real-time audio synthesis programming language. It is maintained on SourceForge.
Some J concepts, like multiple assignment among others, have found their way into this community. Here is an essay on applying concepts from J to this (array-oriented) language.
The J programming language is a successor of APL; see http://www.jsoftware.com. These languages are made for processing arrays of data and are able to express complex notions of iteration implicitly.
The following are some concepts borrowed from or inspired by J. Thinking about multidimensional arrays can be both mind-bending and mind-expanding. It may take some effort to grasp what is happening in these examples.
// iota fills an array with a counter z = Array.iota(2, 3, 3); z.rank; // 3 dimensions z.shape; // gives the sizes of the dimensions z = z.reshape(3, 2, 3); // reshape changes the dimensions of an array z.rank; // 3 dimensions z.shape; // fill a 2D array Array.fill2D(3,3,{1.0.rand.round(0.01)}); Array.fill2D(2,3,{|i,j| i@j}); // fill a 3D array Array.fill3D(2,2,2,{1.0.rand.round(0.01)}); Array.fill3D(2,2,2,{|i,j,k| `[i,j,k]}); // using dup to create arrays (1..4) dup: 3; 100.rand dup: 10; {100.rand} dup: 10; {100.rand} dup: 3 dup: 4; {{100.rand} dup: 3} dup: 4; {|i| i.squared} dup: 10; {|i| i.nthPrime} dup: 10; // ! is an abbreviation of dup (1..4) ! 3; 100.rand ! 10; {100.rand} ! 10; {100.rand} ! 3 ! 4; {{100.rand} ! 3} ! 4; {|i| i.squared} ! 10; {|i| i.nthPrime} ! 10; // other ways to do the same thing: // partial application _.squared ! 10; _.nthPrime ! 10; // operating on a list (0..9).squared; (0..9).nthPrime; Operator Adverbs Adverbs are a third argument passed to binary operators that modifies how they iterate over SequenceableCollections or Streams. See the Adverbs help file. [10, 20, 30, 40, 50] + [1, 2, 3]; // normal [10, 20, 30, 40, 50] +.f [1, 2, 3]; // folded [10, 20, 30, 40, 50] +.s [1, 2, 3]; // shorter [10, 20, 30, 40, 50] +.x [1, 2, 3]; // cross [10, 20, 30, 40, 50] +.t [1, 2, 3]; // table Operator Depth J has a concept called verb rank, which is probably too complex to understand and implement in SC, but operator depth is similar and simpler. A binary operator can be given a depth at which to operate negative depths iterate the opposite operand. These are better understood by example. It is not currently possible to combine adverb and depth. z = Array.iota(3,3); y = [100, 200, 300]; z + y; z +.0 y; // same as the above. y added to each row of z z +.1 y; // y added to each column of z z +.2 y; // y added to each element of z z +.-1 y; // z added to each element of y // deepCollect operates a function at different dimensions or depths in // an array. z = Array.iota(3, 2, 3); f = {|item| item.reverse }; z.deepCollect(0, f); z.deepCollect(1, f); z.deepCollect(2, f); f = {|item| item.stutter }; z.deepCollect(0, f); z.deepCollect(1, f); z.deepCollect(2, f); // slice can get sections of multidimensional arrays. // nil gets all the indices of a dimension z = Array.iota(4,5); z.slice(nil, (1..3)); z.slice(2, (1..3)); z.slice((2..3), (0..2)); z.slice((1..3), 3); z.slice(2, 3); z = Array.iota(3,3,3); z.slice([0,1],[1,2],[0,2]); z.slice(nil,nil,[0,2]); z.slice(1); z.slice(nil,1); z.slice(nil,nil,1); z.slice(nil,2,1); z.slice(nil,1,(1..2)); z.slice(1,[0,1]); z.flop; Sorting Order z = {100.rand}.dup(10); // generate a random array; // order returns an array of indices representing what would be the // sorted order of the array. o = z.order; y = z[o]; // using the order as an index returns the sorted array // calling order on the order returns an array of indices that returns // the sorted array to the original scrambled order p = o.order; x = y[p]; // bubbling wraps an item in an array of one element. It takes the // depth and levels as arguments. z = Array.iota(4,4); z.bubble; z.bubble(1); z.bubble(2); z.bubble(0,2); z.bubble(1,2); z.bubble(2,2); // similarly, unbubble unwraps an Array if it contains a single element. 5.unbubble; [5].unbubble; [[5]].unbubble; [[5]].unbubble(0,2); [4,5].unbubble; [[4],[5]].unbubble; [[4],[5]].unbubble(1); z.bubble.unbubble; z.bubble(1).unbubble(1); z.bubble(2).unbubble(2); Laminating with the +++ Operator The +++ operator takes each item from the second list and appends it to the corresponding item in the first list. If the second list is shorter, it wraps. z = Array.iota(5,2); z +++ [77,88,99]; z +++ [[77,88,99]]; z +++ [[[77,88,99]]]; z +++ [ [[77]],[[88]],[[99]] ]; // same as: z +++ [77,88,99].bubble; z +++ [77,88,99].bubble(0,2); z +++ [77,88,99].bubble(1,2); z +++ [11,22,33].pyramidg; z +++ [11,22,33].pyramidg.bubble; z +++ [[11,22,33].pyramidg]; z +++ [[[11,22,33].pyramidg]]; ( z = (1..4); 10.do {|i| z.pyramid(i+1).postln; z.pyramidg(i+1).postln; "".postln; }; ) // reshapeLike allows you to make one nested array be restructured in // the same manner as another. a = [[10,20],[30, 40, 50], 60, 70, [80, 90]]; b = [[1, 2, [3, 4], [[5], 6], 7], 8, [[9]]]; a.reshapeLike(b); b.reshapeLike(a); // If the lengths are different, the default behaviour is to wrap: a = [[10,20],[30, 40, 50]]; b = [[1, 2, [3, 4], [[5], 6], 7], 8, [[9]]]; a.reshapeLike(b); // but you can specify other index operators: a.reshapeLike(b, \foldAt); a.reshapeLike(b, \clipAt); a.reshapeLike(b, \at); // allTuples will generate all combinations of the sub arrays [[1, 2, 3], [4, 5], 6].allTuples; [[1, 2, 3], [4, 5, 6, 7], [8, 9]].allTuples;