NYCJUG/2013-03-12

From J Wiki
Jump to navigation Jump to search

Postscript, rolling dice, group-think, functional code, function composition


Location:: ThomasNet

Beginner's regatta

We took a look at a problem in a recent issue of the Communications of the ACM - unfortunately, this is behind a paywall, but the first problem is simply this:

Six dice are rolled simultaneously, and the number N of different numbers that appear is determined; for example, if the dice show 3,4,1,6,5,6, then N = 5, and if they show 6,2,2,3,6,2, then N = 3. Clearly, N could be any number from one to six, but these values are not equally likely. What is the probability that N = 4?

We might solve this in J thusly:

   ND=: NS=: 6                         NB. Number of dice, number of sides
   rolls=. ND?@$&.>1e6$NS              NB. A million rolls
   3{.rolls                            NB. Check result to ensure it looks right
+-----------+-----------+-----------+
|1 1 5 4 2 5|3 4 3 2 3 1|0 1 3 2 0 2|
+-----------+-----------+-----------+

To count the number of distinct values in a roll, combine tally (#) and nub (~.) with atop (@:).

   $Ns=. (#@:~.)&>rolls                NB. Count number of distinct items per roll
1000000
   $Ns=/>:i.#ND                        NB. Table of number of possible values 1 to 6
1000000 6
   (>:i.#ND),:+/Ns=/>:i.#ND            NB. Show values over counts
  1     2      3      4      5     6
117 19921 231065 502317 231123 15457

From this sample, we can see the number of values peaks at 4. Assuming the sample size of one million gives us about three digits of precision, the probability that N = 4 is 50.2%.

We'll leave a follow-up problem to the reader:

Alice and Bob roll a single die repeatedly. Alice is waiting until all six of the die's faces appear at least once. Bob is waiting for some face (any face) to appear four times. The winner is the one who gets his or her wish first; for example, if the successive rolls are 2,5,4,5,3,6,6,5,1, then Alice wins, since all numbers have appeared, none more than three times. If the successive rolls instead happen to be 4,6,3,6,6,1,2,2,6, then Bob wins because he has seen four 6s and no 5 (yet). Now answer this easy question: What is the maximum number of rolls needed to determine a winner? And this more difficult question: Which player is more likely to win? This can be worked out with only a little bit of arithmetic, assuming you are clever enough.

Eric Iverson on JDB

Eric Iverson of JSoftware summarized the development of a J-based database system called JDB. This was originally developed for use by our evening's host - ThomasNet - to help them organize the large amounts of data they cull from hits on their website every day. JDB is a fully-functional, basic system that is freely available as an add-on to J.

However, due to the immense size of ThomasNet's data, they soon outgrew this solution, so JSoftware developed a more robust, higher performance version of JDB called Jd.

Advanced Topics

We looked at this essay on Why Functional Code is Shorter (than code in other types of languages). It begins with something relevant to the array-orientation of J:

Was rereading John Carmack's Functional Programming in C++ post. For the most part, it's an excellent discussion of FP and its benefits, but there are a few points I wanted to dissect. Here, Carmack suggests the conciseness of functional programs has more to do with features like pattern matching and list comprehensions...

Also relevant is this comment

Features like pattern matching and list comprehensions can indeed make code shorter, and they are convenient, but they're just constant factors. Like Carmack says, imperative programs could have access to these features too. The real decrease in code size when using FP comes from there being exponentially more code reuse possible due to the compositionality of pure code. I don't mean compositionality in the narrow sense of function composition, I mean it in the sense of being able to assemble complex functionality by sticking together simpler components in various ways. And compositionality dwarfs all other factors affecting our ability to write complex software.

It seems true that J's development with an eye to making function composition a primary consideration is one of the reasons for its conciseness. Also, J primitives do a good job of covering a lot of important, basic operations commonly done with data.

Why Functional Code is Shorter

[Inserting this essay here because the link above is now blocked]

Posted by Paul Chiusano on Saturday, February 23, 2013

Was rereading John Carmack's Functional Programming in C++ post. For the most part, it's an excellent discussion of FP and its benefits, but there are a few points I wanted to dissect. Here, Carmack suggests the conciseness of functional programs has more to do with features like pattern matching and list comprehensions:

The process of refactoring towards purity generally involves disentangling computation from the environment it operates in, which almost invariably means more parameter passing. This seems a bit curious – greater verbosity in programming languages is broadly reviled, and functional programming is often associated with code size reduction. The factors that allow programs in functional languages to sometimes be more concise than imperative implementations are pretty much orthogonal to the use of pure functions — garbage collection, powerful built in types, pattern matching, list comprehensions, function composition, various bits of syntactic sugar, etc. For the most part, these size reducers don’t have much to do with being functional, and can also be found in some imperative languages.

Features like pattern matching and list comprehensions can indeed make code shorter, and they are convenient, but they're just constant factors. Like Carmack says, imperative programs could have access to these features too. The real decrease in code size when using FP comes from there being exponentially more code reuse possible due to the compositionality of pure code. I don't mean compositionality in the narrow sense of function composition, I mean it in the sense of being able to assemble complex functionality by sticking together simpler components in various ways. And compositionality dwarfs all other factors affecting our ability to write complex software.

The 'convenience' features are important more because they reduce the friction of composition (for instance, if your syntax for function literals are too heavyweight you are discouraged from writing your list-processing code by composing use of higher-order functions like map and filter, and you are discouraged from ever writing higher-order functions yourself).

We have only just begun to appreciate how compositionality can change the nature of software. I keep getting surprised by it. The progression of functional programming shows the rabbit hole goes very deep--yes, we can write loops with less boilerplate using map and filter, but we can also write combinator libraries, which let us assemble programs compositionally within a domain. Going further, we can abstract over commonalities across libraries, letting us assemble libraries for a domain from a preexisting suite of compositional library construction components (typeclasses like Monad, Applicative, and so on). And further patterns emerge from there--FP has now discovered patterns of architecture, like stream processing libraries, in which the typeclasses themselves are simply building blocks.

Moving on, another point Carmack mentioned was that we can't maintain purity throughout our programs since we eventually have to interact with the outside world:

Not everything can be pure; unless the program is only operating on its own source code, at some point you need to interact with the outside world. It can be fun in a puzzly sort of way to try to push purity to great lengths, but the pragmatic break point acknowledges that side effects are necessary at some point, and manages them effectively.

One of the big discoveries of FP is that while effects need to occur, observable side effects are not necessary. This is more than an accounting trick; we can indeed write programs that have effects on the outside world and which mutate data, etc, all while preserving referential transparency of the overall system and letting us program in a nice compositional style. FP is not in any way a restriction on what can be expressed, only a restriction on how it is expressed. It only requires us to be 'honest' about what is occurring, not limiting what functionality is expressible.

1 comment

DevonMcC said...

I agree that functional composition helps make code more succinct. For example, in the functional language J (see jsoftware.com), we have the notion of function (what we call a "verb") insertion across an array, represented by the adverb "/".

So, to sum an array along its first dimension, we combine the adverb with the verb "+", giving us "+/array". If, instead of the sum we want the product, we use "*/array". This extends to user-defined verbs, so "foo/array" inserts our "foo" verb between elements of the array.

However, there's also an aspect to thinking functionally that can shorten code. I recently encountered some Java script with numerous case statements like this one:

windowY += 10;
if( windowY > photo.height - windowHeight)
{ windowY = photo.height - windowHeight;
}

We can shorten and simplify this logic with a functional variant like this:

windowY = min( windowY+10, photo.height - windowHeight);

This seems to me to be conceptually much simpler than the typical "if...then" logic it replaces.

February 25, 2013 at 2:17 PM

Function Composition and Elegance

We looked at this message from Dan Bron.

from:	 Dan Bron <j@bron.us>
to:	 programming@jsoftware.com
date:	 Wed, Feb 20, 2013 at 12:18 PM
subject: [Jprogramming] Function composition and elegance (was applying >1 ...)

I wrote:

> Statistical correlation
> (+/@:* % *&(+/)&.:*:)&(- +/%#)

Rephrase corr in Simplistic J
> ([ - [:(+/%#) [) (([:+/*) % [: %: ([: ([:+/*:) [) * [:([:+/*:) ]) ] - [:(+/%#) ]

I'll also take this opportunity to emphasize the differences between these two (equivalent) verbs, which result _directly_ from the language features I mentioned in my original message: > It's got specimens of all the major compositions in J: fork, hook, atop, compose, and under.

You can judge for yourself the consequences of avoiding these, and the wisdom of making that a blanket rule.

Still, you might be willing to pay a "complexity tax" to avoid explicit function composition in J. That's ok, but even paying that cost won't make you whole: you're still missing out on some of the major advantages of working in the language.

For example, later in the thread I mentioned, Oleg was able to refine his definition and produce an even more compact version [1]:

        +/@:*&(% +/&.:*:)&(- +/%#)

Again, he achieved this result through judicious application of conjunctions. Now, I still prefer the original because of its structural symmetries, but this version also has its merits: it makes it clear that correlation is simply the linear combination of the series, after some normalization/standardization. We're getting insight into the _math_ simply by trying different ways of expressing ourselves!

I think June Kim summed it up best [2]:
> While I try to translate a mathematical expression into a J expression, I often discover hidden patterns in the
> expression with great joy.
>
> It sometimes leads me to a path to new insights.
-Dan

  1. http://www.jsoftware.com/pipermail/programming/2007-June/007209.html
  2. http://www.jsoftware.com/pipermail/programming/2007-June/007181.html

PS: In case my point about +/@:*&(% +/&.:*:)&(- +/%#) wasn't clear:

     sum             +/
     of              @ 
     the_product     * 
     after           & 
     scaling         (% +/&.:*:) 
     after           & 
     standardizing   (- +/%#) 

Some Correlation Examples

Here we see an illustration of how different sets of points have characteristic correlation coefficients.

Multiple correlation examples.JPG

Several sets of (x, y) points, with the Pearson correlation coefficient of x and y for each set. Note that the correlation reflects the noisiness and direction of a linear relationship (top row), but not the slope of that relationship (middle), nor many aspects of nonlinear relationships (bottom). N.B.: the figure in the center has a slope of 0 but in that case the correlation coefficient is undefined because the variance of Y is zero. [From http://en.wikipedia.org/wiki/Correlation] -- The code to generate the examples above is Scalable Vector Graphics (SVG) code – apparently generated by R code - which looks something like this:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- Created with Inkscape (http://www.inkscape.org/) -->

<svg
   xmlns:dc="http://purl.org/dc/elements/1.1/"
…
  <circle
     cx="41.040001"
     cy="40.419998"
     r="0.40000001"
     id="circle12"
     style="fill:#00008b;fill-opacity:1;stroke:#ffffff;stroke-width:1px;stroke-opacity:0" />
  <circle
     cx="30.780001"
     cy="49"
     r="0.40000001"
     id="circle14"
     style="fill:#00008b;fill-opacity:1;stroke:#ffffff;stroke-width:1px;stroke-opacity:0" />
  <circle
...

And so on for about 75 thousand lines.

Learning and Teaching J

The Rise of the New Groupthink

Opinion

The Rise of the New Groupthink

By SUSAN CAIN / Published: January 13, 2012

15CAINCOVER-popup-v2.jpg

This article decrying the contemporary trend called "Groupthink" makes some good points.

SOLITUDE is out of fashion. Our companies, our schools and our culture are in thrall to an idea I call the New Groupthink, which holds that creativity and achievement come from an oddly gregarious place. Most of us now work in teams, in offices without walls, for managers who prize people skills above all. Lone geniuses are out. Collaboration is in

That is a dismal assessment but it rings true from what I've seen in years of schooling and office work.

But there’s a problem with this view. Research strongly suggests that people are more creative when they enjoy privacy and freedom from interruption. And the most spectacularly creative people in many fields are often introverted, according to studies by the psychologists Mihaly Csikszentmihalyi and Gregory Feist. They’re extroverted enough to exchange and advance ideas, but see themselves as independent and individualistic. They’re not joiners by nature.

One explanation for these findings is that introverts are comfortable working alone — and solitude is a catalyst to innovation.

Not many reading this who are surprised by this finding.

Solitude has long been associated with creativity and transcendence. “Without great solitude, no serious work is possible,” Picasso said. A central narrative of many religions is the seeker — Moses, Jesus, Buddha — who goes off by himself and brings profound insights back to the community.

15CAINJP1-popup.jpg

The story of Apple’s origin speaks to the power of collaboration. Mr. Wozniak wouldn’t have been catalyzed by the Altair but for the kindred spirits of Homebrew. And he’d never have started Apple without Mr. Jobs. But it’s also a story of solo spirit. If you look at how Mr. Wozniak got the work done — the sheer hard work of creating something from nothing — he did it alone. Late at night, all by himself. Intentionally so. In his memoir, Mr. Wozniak offers this guidance to aspiring inventors: “Most inventors and engineers I’ve met are like me ... they live in their heads. They’re almost like artists. In fact, the very best of them are artists. And artists work best alone .... I’m going to give you some advice that might be hard to take. That advice is: Work alone... Not on a committee. Not on a team.”

But reality continues to get harsher....

The New Groupthink has overtaken our workplaces, our schools and our religious institutions. Anyone who has ever needed noise-canceling headphones in her own office or marked an online calendar with a fake meeting in order to escape yet another real one knows what I’m talking about.

One more reason in favor of remote work?

15CAINJP2-popup.jpg

Our schools have also been transformed by the New Groupthink. Today, elementary school classrooms are commonly arranged in pods of desks, the better to foster group learning. Even subjects like math and creative writing are often taught as committee projects. In one fourth-grade classroom I visited in New York City, students engaged in group work were forbidden to ask a question unless every member of the group had the very same question.

Sounds like it might be more crowd control than anything else.

The New Groupthink also shapes some of our most influential religious institutions. Many mega-churches feature extracurricular groups organized around every conceivable activity, from parenting to skateboarding to real estate, and expect worshipers to join in. They also emphasize a theatrical style of worship — loving Jesus out loud, for all the congregation to see.

Reflecting the prevailing zeitgeist?

Privacy also makes us productive. In a fascinating study known as the Coding War Games, consultants Tom DeMarco and Timothy Lister compared the work of more than 600 computer programmers at 92 companies. They found that people from the same companies performed at roughly the same level — but that there was an enormous performance gap between organizations. What distinguished programmers at the top-performing companies wasn’t greater experience or better pay. It was how much privacy, personal workspace and freedom from interruption they enjoyed. Sixty-two percent of the best performers said their workspace was sufficiently private compared with only 19 percent of the worst performers. Seventy-six percent of the worst programmers but only 38 percent of the best said that they were often interrupted needlessly.

Brainstorming

Brainstorming, as it turns out, is also very bad because "...decades of research show that individuals almost always perform better than groups in both quality and quantity, and group performance gets worse as group size increases."

In part this is because people in groups tend to be lazy and once you let someone else do the work, you may as well go along with whatever the rest of the group decides to do - losing most of the value of having a diverse group.

The one important exception to this dismal record is electronic brainstorming, where large groups outperform individuals; and the larger the group the better. The protection of the screen mitigates many problems of group work. This is why the Internet has yielded such wondrous collective creations

MY point is not that man is an island. Life is meaningless without love, trust and friendship.

And I’m not suggesting that we abolish teamwork. Indeed, recent studies suggest that influential academic work is increasingly conducted by teams rather than by individuals. (Although teams whose members collaborate remotely, from separate universities, appear to be the most influential of all.) The problems we face in science, economics and many other fields are more complex than ever before, and we’ll need to stand on one another’s shoulders if we can possibly hope to solve them.

This is especially true in computing and thinking about computing.

But even if the problems are different, human nature remains the same. And most humans have two contradictory impulses: we love and need one another, yet we crave privacy and autonomy.

To harness the energy that fuels both these drives, we need to move beyond the New Groupthink and embrace a more nuanced approach to creativity and learning.

Finally, lest we forget real-world anecdotes...

Before Mr. Wozniak started Apple, he designed calculators at Hewlett-Packard, a job he loved partly because HP made it easy to chat with his colleagues. Every day at 10 a.m. and 2 p.m., management wheeled in doughnuts and coffee, and people could socialize and swap ideas. What distinguished these interactions was how low-key they were. For Mr. Wozniak, collaboration meant the ability to share a doughnut and a brainwave with his laid-back, poorly dressed colleagues — who minded not a whit when he disappeared into his cubicle to get the real work done.

Susan Cain is the author of the forthcoming book “Quiet: The Power of Introverts in a World That Can’t Stop Talking.”

A version of this op-ed appeared in print on January 15, 2012, on page SR1 of the New York edition with the headline: The Rise Of the New Groupthink.

Meeting Materials

File:Simple Dice Simulations.pdf

File:Why Functional Code is Shorter.pdf

File:Function Composition and Elegance.pdf

File:RiseOfTheNewGroupthink.pdf