NYCJUG/2013-03-12
Postscript, rolling dice, group-think, functional code, function composition
Location:: ThomasNet
Meeting Agenda for NYCJUG 20130312
1. Beginner's regatta: solving the first of three simple dice problems using simulation in J: see "Simple Dice Simulations.doc". 2. Show-and-tell: Eric Iverson on JDB. John Randall on generating Postscript with J. 3. Advanced topics: the elegance and simplicity of functional code considered obvious - see "Why Functional Code is Shorter.doc" and "Function Composition and Elegance.doc". 4. Learning, teaching and promoting J, et al.: high-leverage tools allow one to do the work of many - see "RiseOfTheNewGroupthink.doc".
Beginner's regatta
We took a look at a problem in a recent issue of the Communications of the ACM - unfortunately, this is behind a paywall, but the first problem is simply this:
Six dice are rolled simultaneously, and the number N of different numbers that appear is determined; for example, if the dice show 3,4,1,6,5,6, then N = 5, and if they show 6,2,2,3,6,2, then N = 3. Clearly, N could be any number from one to six, but these values are not equally likely. What is the probability that N = 4?
We might solve this in J thusly:
ND=: NS=: 6 NB. Number of dice, number of sides rolls=. ND?@$&.>1e6$NS NB. A million rolls 3{.rolls NB. Check result to ensure it looks right +-----------+-----------+-----------+ |1 1 5 4 2 5|3 4 3 2 3 1|0 1 3 2 0 2| +-----------+-----------+-----------+
To count the number of distinct values in a roll, combine tally (#) and nub (~.) with atop (@:).
$Ns=. (#@:~.)&>rolls NB. Count number of distinct items per roll 1000000 $Ns=/>:i.#ND NB. Table of number of possible values 1 to 6 1000000 6 (>:i.#ND),:+/Ns=/>:i.#ND NB. Show values over counts 1 2 3 4 5 6 117 19921 231065 502317 231123 15457
From this sample, we can see the number of values peaks at 4. Assuming the sample size of one million gives us about three digits of precision, the probability that N = 4 is 50.2%.
We'll leave a follow-up problem to the reader:
Alice and Bob roll a single die repeatedly. Alice is waiting until all six of the die's faces appear at least once. Bob is waiting for some face (any face) to appear four times. The winner is the one who gets his or her wish first; for example, if the successive rolls are 2,5,4,5,3,6,6,5,1, then Alice wins, since all numbers have appeared, none more than three times. If the successive rolls instead happen to be 4,6,3,6,6,1,2,2,6, then Bob wins because he has seen four 6s and no 5 (yet). Now answer this easy question: What is the maximum number of rolls needed to determine a winner? And this more difficult question: Which player is more likely to win? This can be worked out with only a little bit of arithmetic, assuming you are clever enough.
Eric Iverson on JDB
Eric Iverson of JSoftware summarized the development of a J-based database system called JDB. This was originally developed for use by our evening's host - ThomasNet - to help them organize the large amounts of data they cull from hits on their website every day. JDB is a fully-functional, basic system that is freely available as an add-on to J.
However, due to the immense size of ThomasNet's data, they soon outgrew this solution, so JSoftware developed a more robust, higher performance version of JDB called Jd.
Advanced Topics
We looked at this essay on Why Functional Code is Shorter (than code in other types of languages). It begins with something relevant to the array-orientation of J:
Was rereading John Carmack's Functional Programming in C++ post. For the most part, it's an excellent discussion of FP and its benefits, but there are a few points I wanted to dissect. Here, Carmack suggests the conciseness of functional programs has more to do with features like pattern matching and list comprehensions...
Also relevant is this comment
Features like pattern matching and list comprehensions can indeed make code shorter, and they are convenient, but they're just constant factors. Like Carmack says, imperative programs could have access to these features too. The real decrease in code size when using FP comes from there being exponentially more code reuse possible due to the compositionality of pure code. I don't mean compositionality in the narrow sense of function composition, I mean it in the sense of being able to assemble complex functionality by sticking together simpler components in various ways. And compositionality dwarfs all other factors affecting our ability to write complex software.
It seems true that J's development with an eye to making function composition a primary consideration is one of the reasons for its conciseness. Also, J primitives do a good job of covering a lot of important, basic operations commonly done with data.
Meeting Materials
File:Simple Dice Simulations.pdf
File:Why Functional Code is Shorter.pdf