NYCJUG/2014-06-10

Fibonacci, J labs, graph theory, error messages, functional programming, keywords

Meeting Agenda for NYCJUG 20140610

1. Beginner's regatta: see "Example of Finding Small Changes in Large Files.doc".

2. Show-and-tell: see "Fibonacci word in J - and generalized.doc" as an example
of array thinking?

Better labs: see "Demo of Extended Labs.doc".

3. Advanced topics: which might be more interesting: topics in graph theory
(perhaps concentrating on trees) or "array-thinking"?  See "Graph Examples".

4. Learning, teaching and promoting J, et al.: how could programming be made
better?  See "Toward a better programming.doc" and (maybe)
"MeasuringEffectivenessOfErrorMessagesForNoviceProgrammers.pdf".

Examples of better ways: see "CShell interactive alternative for C# coding.doc"
and "A year of functional programming.doc".  Look at suggestions in "Pain we
forgot-Why programming remains difficult.doc".

Which is better: see "Pendulum Simulations in J and Javascript.doc".

A graph example: see "Keywords Hold Vocabulary Together in Memory.doc".

Beginner's Regatta

Here we give an example of locating a small difference between two files that are largely the same.

Example of Finding Small Changes in Large Files

   pp=. 'C:\amisc\Clarifi\Data\'
   fsize pp,'SP100_2014.txt'
318444

Here we're looking at a file containing information about the constituents of the S&P 100 index for a series of dates over which the index composition changed.

We start by reading in the data from the tab-delimited file and assigning the titles (first row) to a separate variable sptit from the data spdat.

   'sptit spdat'=. split <;._1&>TAB,&.><;._2 CR-.~fread pp,'SP100_2014.txt'
   4{.sptit,spdat
+--------+---------+-------+-----------------------------+
|Date    |$issue_id|$ticker|PRCCD - Price - Close - Daily|
+--------+---------+-------+-----------------------------+
|20131231|00107801 |ABT    |38.33                        |
+--------+---------+-------+-----------------------------+
|20140102|00107801 |ABT    |38.23                        |
+--------+---------+-------+-----------------------------+
|20140103|00107801 |ABT    |38.64                        |
+--------+---------+-------+-----------------------------+
   sptit
+----+---------+-------+-----------------------------+
|Date|$issue_id|$ticker|PRCCD - Price - Close - Daily|
+----+---------+-------+-----------------------------+

Now, we separate the dates and tickers (security identifiers) into their own vectors for ease of use.

   dts=. spdat{"1~sptit i. <'Date'
   tkrs=. spdat{"1~sptit i. <'$ticker'
   $~.dts
109
   (<./,>./) ".&>~.dts     NB. Date range
20131231 20140606

We see from the above that there are 109 dates spanning the range from 12/31/2013 through 6/6/2014. In order to track the day-to-day change of the constituents, we'll want to be sure that the dates are in strictly ascending order. We do this with the following statement which returns a one indicating that this is true.

   (-: /:~) ~.dts         NB. Confirm that dates are in order
1

This statement works on the unique set of dates ( ~.dts ); monadic "nub" ( ~. ) returns the unique list of items, preserving the original order of the first occurrence of each unique item. To this unique set of dates, we apply the hook, the expression in parentheses, to check the equivalence ( -: ) of the original set to that set sorted in ascending order ( /:~ ). So, since a hook invoked monadically automatically supplies the right argument as both the left and right argument, this is equivalent to (~.dts) -: /:~ ~.dts .

Next we generate an integer vector with the number of differences between the sets of tickers on adjacent dates. We do this by using the "key" ( /. ) adverb to box ( < ) the tickers in groups for each date. We scan-reduce ( /\ )these boxes of tickers, excluding ( -. ) the elements of each ( &.> ) adjacent pair ( 2 ), flipping ( ~ ) the exclusion so that we remove yesterday's set of tickers from today's set. Finally, we tally ( # ) the resulting number for each unboxed ( &> ) set of differences.

   #&>2-.~&.>/\dts </. tkrs        NB. Day-by-day changes in tickers
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Since we have only zeros and ones - most days the index constituents remained the same but changed by only one on a given day - we can look up where the changes occurred by applying I. to the result above.

   I. 0~:#&>2-.~&.>/\dts </. tkrs  NB. Where there were changes
55 63
   55 63 +/0 1                     NB. Index of change and following entry.
55 56
63 64

   (~.dts){~55 63 +/0 1            NB. Dates of changes: old versus new sets
+--------+--------+
|20140321|20140324|
+--------+--------+
|20140402|20140403|
+--------+--------+

Now that we have the pairs of dates on which changes occurred, let's isolate what the changes were. First, we exclude ( -. ) the set of tickers on the 55th date from those on the 56th date to see what was added.

   -.~&>/55 56{dts </. tkrs   NB. New ticker added
+----+
|BIIB|
+----+

To see which ticker was removed, we do the exclusion in the opposite order by removing the "flip" ( ~ ) from the statement above. This excludes the tickers on the 56th date from the set on the 55th date.

   -.&>/55 56{dts </. tkrs    NB. Old ticker removed
+---+
|AEP|
+---+

Performing the same two steps on the other set of dates where a change occurred, we see that a new ticker was added to the index but none were dropped at that time.

   -.&>/63 64{dts </. tkrs    NB. Old ticker removed

   -.~&>/63 64{dts </. tkrs   NB. New ticker added
+----+
|GOOG|
+----+

Show and Tell

Fibonacci Word/Fractal

The Fibonacci word may be represented as a fractal as described here:

  For F_word,,m,, start with F_wordChar,,n=1,,
  Draw a segment forward ||
  If current F_wordChar is 0 ||
  Turn left if n is even ||
  Turn right if n is odd ||
  next n and iterate until end of F_word

For this task create and display a fractal similar to Fig 1.

from:  [[User:Raul Miller|Raul Miller]] <rauldmiller@gmail.com> =
to:   Programming forum <programming@jsoftware.com> =
date:  Sun, May 18, 2014 at 2:06 PM =
subject: [Jprogramming] Turtle graphics, Gaussian integers and Mechanical action.. =

I added a rosettacode entry, yesterday, which was inspired by Brian Schott's turtle graphics work:

. http://rosettacode.org/wiki/Fibonacci_word/fractal#J Repeating the code here:

  F_Words=: (,<@;@:{~&_1 _2)@]^:(2-~[)&('1';'0')
  require 'plot'
  plot }:+/\ 0,*/\(^~ 0j_1 0j1 $~ #)'0'=_1{::F_Words 20'''

This uses complex number arithmetic to achieve a sort of simplified "turtle graphics" system - basically just drawing a parametric curve constructed of unit length line segments with a heading which is a unit-length complex integer.

. And then today, I saw this: http://507movements.com

This is a book from 1868, that someone must have taken years to write, originally. And, now Matt Keveney is working through it, translating the original woodcut images to code driving html5 canvas rendered animations. (This also looks like it took significant time to write, though the code looks like it might have been obfuscated and I do not know what tools and source material he used to build it out.)

This is obviously considerably more complex than my little bit of code (which Cliff Reiter probably has already written and discarded several times and maybe even forgotten about), but perhaps this sort of thing can inspire someone else's work?

(But it's a lot simpler than what people building airplanes deal with. I think...)

. Thanks,

[Following from http://rosettacode.org/wiki/Fibonacci_word/fractal#J]

J version of Fibonacci Word on RosettaCode

Plotting the fractal as a parametric equation, this looks reasonably nice:

   require 'plot'
   plot }:+/\ 0,*/\(^~ 0j_1 0j1 $~ #)'0'=_1{::F_Words 20

Note that we need the definition of F_Words from the Fibonacci word page:

   F_Words=: (,<@;@:{~&_1 _2)@]^:(2-~[)&('1';'0')

However, image uploads are currently disabled, and rendering images of this sort as wikitext gets bulky.

Instead, I'll just describe the algorithm:

This draws a discrete parametric curve. Right turn is 0j_1, left turn is 0j1 (negative and positive square roots of negative 1), straight ahead is 1. So: build a list of alternating 0j_1 and 0j1 and raise them to the first power for the 0s in the fibonacci word list and raise them to the 0th power for the 1s in that list. Then compute the running product, shift a 0 onto the front of the list of products and compute the running sum. (Of course, this would translate to a rather simple loop, also, once you see the pattern.)

Better J Labs

We looked at a proposal from Bob Therriault for adding media to labs, as seen here. Bob plans to present his ideas at the J conference in Toronto this July.

Learning, teaching and promoting J, et al.

We looked at an essay by Chris Granger in which he outlined some of his thinking that led him to develop "Light Table", an IDE designed to provide instant feedback while developing code by showing how values change throughout the code flow. Many of his insights about making programming better should be familiar to the J community. Here are some excerpts from that essay, highlighting some of these familiar concepts.

Toward a Better Programming

27 Mar 2014 - Chris Granger

this post is based on my talk,' '"Finding a way out" [1], from Strange Loop 2013

When I built the original prototype of Light Table I didn't have any grand purpose or goal in mind. I simply had some ideas on how programming could be better and I wanted to see how hard they would be to build. Until fairly recently, it never dawned on me that I've actually spent the past decade trying out ideas on how programming could be better, from web frameworks, to Visual Studio, to Light Table and its future. And it wasn't until I had that realization that I also came to the conclusion that I'd been going about this all wrong. As a matter of fact, I made a classic rookie mistake: I set out to answer a question I didn't understand.

How do we make programming better?

I kept asking myself "How can we make programming better?", but I never took a step back and concretely figured out what's wrong with programming. I've always been very reactionary to either my own work or the observations I had of others, and it turns out that has continually led me to local maxima. With Light Table for example, I thought shortening the feedback loop would make a tremendous difference, and while it does make a big impact, it's overshadowed by the fact that it's the same old broken version of programming. People still struggled. I still struggled. It didn't deliver us from the incredible frustration that is writing and debugging software. So if it's not just the feedback loop, what is it then? What's wrong with programming?

To answer that I needed more data, not just from my time behind one way mirrors or my own experiences, but from the "real world". So I started asking everyone who would talk to me two questions: "What is programming and what's wrong with it?" I talked to people who had never programmed a day in their life, people in the middle of their careers, and even some of the best engineers in the industry. And what I found surprised me.

What is programming?

The answers I got to this were truly disheartening. Not once did I hear that programming is “solving problems." Instead, I found out that it's just clumping things together with half-dried glue and praying that it holds water long enough to replace it with a new clump of things. I heard over and over again that we're plumbers and glue factories, that programming is largely about working around its own deficiencies. This is particularly sad to me, because it turned out to be the only real commonality among the answers: we plumb things together to get something that kind of works. And the "kind of works" part is the source of an unbelievable amount of frustration.

Programming should be about solving problems, but somewhere along the way it turned into us solving the problems around our problems (a friend and ridiculously good engineer likes to call this a "self licking ice cream cone"). The glue factory answer certainly isn't a desirable one, so I've been trying to come up with something workable since. The one I like the best so far is that programming is our way of encoding thought such that the computer can help us with it. At the end of the day, what we're trying to do is build something that enables us to accomplish something new - programming just happens to be the way we talk to the computer to get it done.

And what's wrong with it?

The answers to this were why I ended up talking to so many people (around 400 in the end). They tended to fall into one of two categories: they were either so general that they didn't really provide any information, or they were so tactical that they didn't apply to anyone else (I get these deadlocks when I'm...). In aggregate, though, some patterns emerged and after hundreds of answers I was able to distill almost everyone's issues into three buckets for what's wrong with programming. Those buckets were that it is unobservable, indirect, and incidentally complex.

Programming is unobservable

This one seems to be getting the most attention lately thanks to Bret and our work on Light Table. We can't see how our programs execute. We can't see how our changes affect our programs. And we can't see how our programs are connected together. That basically means we can't observe anything. The state of the art in observability is a stepwise debugger, which forces us to stop the world and look at a single instant in time. But the truth is that few errors occur in an instant; most are found only by observing the passage of time. And for that, the best we have is print statements. This is ridiculous. We should be able to observe the entire execution of our program, forward, backward, even branched into new futures - not just when our breakpoint hits. Even sadder than that though, is that we seem to have embraced unobservability as an industry best practice. Think about a line of code like this:

person.walk();

What does it do? OOP's notion of encapsulation is by definition unobservable. I have no idea what person.walk() does. It probably does something sane, like set isWalking to true, but it could also be setting ateCarrots to true and it may have determined that I passed out from exhaustion - I have no idea and no way to tell. We are quite literally throwing darts in the dark and praying that we at least hit the board. We simply cannot see what our programs do and that's a huge problem whether you're just starting out or have written millions of lines of beautiful code.

Programming is indirect

Writing a program is an error-prone exercise in translation. Even math, from which our programming languages are born, has to be translated into something like this:

#include <algorithm>
#include <iostream>
#include <iterator>
#include <cmath>
#include <vector>
#include <iterator>
#include <numeric>

template <typename Iterator>
double standard_dev( Iterator begin , Iterator end ) {
   double mean = std::accumulate( begin , end , 0 ) / std::distance( begin , end ) ;
   std::vector<double> squares ;
   for( Iterator vdi = begin ; vdi != end ; vdi++ )
      squares.push_back( std::pow( *vdi - mean , 2 ) ) ;
   return std::sqrt( std::accumulate( squares.begin( ) , squares.end( ) , 0 ) / squares.size( ) ) ;
}

int main( ) {
   double demoset[] = { 2 , 4 , 4 , 4 , 5 , 5 , 7 , 9 } ;
   int demosize = sizeof demoset / sizeof *demoset ;
   std::cout << "The standard deviation of\n" ;
   std::copy( demoset , demoset + demosize , std::ostream_iterator<double>( std::cout, " " ) ) ;
   std::cout << "\nis " << standard_dev( demoset , demoset + demosize ) << " !\n" ;
   return 0 ;
}

Why do we have to see that, instead of this?

And that's with something that at least translates fairly cleanly. What happens when it doesn't? All we get in code are symbols. We never see or interact with the real things, just symbolic representations of them.

Ah yes, when playing cards, I love it when I get the cards[0][12]. We're writing a card game and cards have real representations, so why can't we just see this instead?	[[File:AceOfSpades.png	height="181",width="125",v:shapes="_x0000_i1026",border="0"]]

Translation is hard and using symbols is error-prone, especially coupled with operations on top of other symbols. This indirectness, this inability to represent things usefully and directly manipulate them, is killing us. A vast number of programming errors are simple translation problems. We had the solution in our head, but in trying to turn it into code we just forgot something or translated it very slightly wrong. We have to get away from that translation. When we do UI, we should do it in a visual editor. When we do math, we should have something like Mathematica at our fingertips. We should express domains in the ways they most naturally present themselves in, not with our own homegrown obfuscations.

Programming is incidentally complex

There is an immense amount of incidental complexity in writing software, by which I mean there's a bunch of work that needs to be done that isn't directly related to the real problem you're trying to solve. Just think about how long it takes to even get something simple to run or the fact that people can spend the better part of a week just trying to set up a dev machine from scratch. These are some simple examples of unnecessary complexity at a systems level, but the truth is incidental concerns are pervasive throughout the entire process of writing software. One of the worst for us right now is at the logic level - managing time in our code. Most people tend to jump immediately to concurrency and parallelism when I say that, but it's actually more fundamental than that. Every time we add an event handler or create a callback, we're doing time management. We're saying at some point later we want this code to execute. Considering the increasingly more complex schemes of interaction that our programs have on top of the proliferation of multiple cores (and the very real concurrency problems that brings), we're quickly learning that callbacks are a terrible solution to this problem. Every time we've found ourselves manually managing something, we've come up with a solution that does it for us. We were manually handling binary and so we created Fortran. We were managing memory and so we created garbage collectors. We were managing data constraints and so we got type systems. I think the next big step in terms of removing incidental complexity in code will come from automatically managing time. The implications of which would be tremendous for our ability to cleanly express intent.

There are so many examples of incidental complexity in programming it would be disheartening to try and enumerate all of them, but we have to start addressing them at some point. We should be focused on solving our problems - not solving the problems around solving our problems. At the end of the day, we should be able to go from nothing to a solution in the time it takes us to think of one, not the weeks and months it takes us now.

Chasing local maxima

If you look at much of the advances that have made it to the mainstream over the past 50 years, it turns out they largely increased our efficiency without really changing the act of programming. I think the reason why is something I hinted at in the very beginning of this post: it's all been reactionary and as a result we tend to only apply tactical fixes. As a matter of fact, almost every step we've taken fits cleanly into one of these buckets. We've made things better but we keep reaching local maxima because we assume that these things can somehow be addressed independently. The best analogy I've heard for what this has resulted in is teacups stacked on top of teacups. Each time we fix something, we push ourselves up some, but eventually we're working under so many layers of abstraction that the tower starts to lean and threatens to fall down. We have to stop thinking about these issues individually, and instead start imagining what it would take to address them all simultaneously.

A Study of Error Messages

We briefly considered this study which attempts to develop a framework for making error messages more effective in aiding novice programmers. The paper makes the point that "...error messages are one of the most important points of contact between the system and the programmer...." Furthermore, "[t]his is all the more critical in tools for novice programmers, who lack the experience to decipher complicated or poorly-constructed feedback."

There was some debate on the effectiveness of J's error messages. These seem too terse to aid novices but this terseness is appreciated by more experienced J programmers, especially when compared to the enormous number of lines generated by other languages - like Java - often to no good effect.

Some Examples of Better Ways to Program

Continuing the theme - as in the "Light Table" essay above - of people "discovering" many of the good features that have been in J for years, we looked at an announcement for an "interactive C# scripting environment". One of this tool's advantages is that it allows developers to "bypass Visual Studio", which is an interesting comment on an environment that's supposed to make programming easier.

This essay details the experiences of a "conventional" programmer's education about functional programming. Some of the more interesting parts from a J perspective are the sections about "Feeling Intimidated" and "Realisation: Abstractions". The intimidation upon encountering a rich and strange language like J is something that we tend to forget as we get more comfortable with its unconventional conventions.

Of particular interest is the writer's realization that FP (functional programming) in general is "a mindset change" about which the author wishes he'd known earlier "as it would've saved [him] frustration and doubt", especially as it becomes necessary to "unlearn what you think you know about coding". This is something we often see in questions posed on the J forum from those who have started to learn J: there's a tendency to want to cast known code directly into J with little change in the basic approach. We see this in questions about how to write a prompting function in J - like this one - addressed here.

It was particularly amusing to read the comments about the author revisiting code written in "good OO-style" and noticing "this is just a monoid and a bunch of crap because I didn't realise this is a monoid" and, based on this, rewriting it in "about a third of the code size".

We wrapped up this section on "better ways to program" by looking at another essay promoting the virtues of Light Table, titled Pain we forgot. This essay starts by noting

Much of the pain in programming is taken for granted. After years of repetition it fades into the background and is forgotten. The first step in making programming easier is to be conscious of what makes it hard.

One interesting pain point the author notes is simply "running code". This is particularly painful with compiled languages as there are numerous steps necessary to transform code into something runnable. Except for spurious line-breaks, it's refreshingly simple to cut-and-paste a line of J code from an e-mail into a J session and have it run without any further effort.

Comparing J to Javascript

An example of the ease and simplicity of J can be seen in this comparison of code, from RosettaCode, for animating a pendulum. The J code provides immediate gratification, even compared to the JavaScript version though this language is reasonably compact and easy to use as well.

Here's the J code (that does need to be run in a GUI-capable environment like the J version 8 Qt IDE):

NB.* pendulums.ijs: some pendulum simulations.
NB. From http://rosettacode.org/wiki/Animate_a_pendulum#J:
NB. "modified for J8"
require 'gl2 trig'
coinsert 'jgl2'

DT   =: %30      NB. seconds
ANGLE=: 0.45p1   NB. radians
L    =: 1        NB. metres
G    =: 9.80665  NB. ms_2
VEL  =: 0        NB. ms_1

PEND=: noun define
pc pend;pn "Pendulum";
minwh 320 200; cc isi isigraph type flush;
)

pend_run=: verb define
  wd PEND,'pshow'
  wd 'timer ',":DT * 1000
)

pend_close=: verb define
  wd 'timer 0; pclose'
)

sys_timer_z_=: verb define
  recalcAngle_base_ ''
  wd 'psel pend; set isi invalid'
)

pend_isi_paint=: verb define
  drawPendulum ANGLE
)

recalcAngle=: verb define
  accel=. - (G % L) * sin ANGLE
  VEL  =: VEL + accel * DT
  ANGLE=: ANGLE + VEL * DT
)

drawPendulum=: verb define
  width=. {. glqwh''
  ps=. (-: width) , 20
  pe=. ps + 150 <.@* (cos , sin) 0.5p1 + y    NB. adjust orientation
  glclear''
  glbrush glrgb 91 91 91                      NB. gray
  gllines ps , pe
  glellipse (,~ ps - -:) 40 15
  glrect 0 0, width, 20
  glbrush glrgb 255 255 0                     NB. yellow
  glellipse (,~ pe - -:) 15 15                NB. orb
)

pend_run''

This produces an animation that looks like this: .

However, the JavaScript version isn't too bad:

<!DOCTYPE html>
<!-- pendulum.html: animated pendulum from http://rosettacode.org/wiki/Animate_a_pendulum
-->
<html><head>  <title>Pendulum</title></head><body style="background: gray;">

<canvas id="canvas" width="600" height="600">
  <p>Sorry, your browser does not support the <canvas> used to display the pendulum animation.</p>
</canvas><script>
  function PendulumSim(length_m, gravity_mps2, initialAngle_rad, timestep_ms, callback) {
var velocity = 0;
    var angle = initialAngle_rad;
    var k = -gravity_mps2/length_m;
    var timestep_s = timestep_ms / 1000;
    return setInterval(function () {
      var acceleration = k * Math.sin(angle);
      velocity += acceleration * timestep_s;
      angle    += velocity     * timestep_s;
      callback(angle);
    }, timestep_ms);
  }

var canvas = document.getElementById('canvas');
  var context = canvas.getContext('2d');
  var prev=0;
  var sim = PendulumSim(1, 9.80665, Math.PI*99/100, 10, function (angle) {
    var rPend = Math.min(canvas.width, canvas.height) * 0.47;
    var rBall = Math.min(canvas.width, canvas.height) * 0.02;
    var rBar = Math.min(canvas.width, canvas.height) * 0.005;
    var ballX = Math.sin(angle) * rPend;
    var ballY = Math.cos(angle) * rPend;

    context.fillStyle = "rgba(255,255,255,0.51)";
    context.globalCompositeOperation = "destination-out";
    context.fillRect(0, 0, canvas.width, canvas.height);

    context.fillStyle = "yellow";
    context.strokeStyle = "rgba(0,0,0,"+Math.max(0,1-Math.abs(prev-angle)*10)+")";
    context.globalCompositeOperation = "source-over";

    context.save();context.translate(canvas.width/2, canvas.height/2);
      context.rotate(angle); context.beginPath();
      context.rect(-rBar, -rBar, rBar*2, rPend+rBar*2);
      context.fill(); context.stroke();
      context.beginPath(); context.arc(0, rPend, rBall, 0, Math.PI*2, false);
      context.fill();  context.stroke(); context.restore();  prev=angle;
  });
</script></body></html>

This gives us an animation that looks like this: .

Keywords Hold Vocabulary Together in Memory

We ended up looking at this research showing how memory is organized in clusters around particular keywords, allowing us to process these keywords "...more quickly and accurately than similar words that they hold together in our memory." This is illustrated by the following pair of examples: one, showing how a single network can be partitioned into two separate networks by the removal of a keyword "fish", unlike the other case where the removal of a similar, non-keyword "dish" does not break apart the network.