NYCJUG/2016-03-08

From J Wiki
Jump to navigation Jump to search

Beginner's regatta

The UpCase Saga: How I Learn New Programming Languages

Here we take a look at a lengthy essay detailing a novice's experience with figuring out how to use J to to do a fairly simple task of uppercasing letters in a string and the problems he encountered along the way.

Introduction

J is so different from other programming languages that I might as well be a complete beginner, even though I've been programming for over 20 years. While my experience writing code doesn't help much with J, I've also accumulated experience about how to learn new languages, and how to solve programming problemsin general.

This is the story of how I tackled a small problem in an unfamiliar language.

Note: this article is about a learning process, not about the actual code. J is cryptic, and my "beginner J" code is probably terrible. I will explain the basic ideas as I go along, but try to focus on the general approach to problem solving and don't worry too much about understanding the code.

Problem Statement and Intermediate Solutions

My goal was to implement a J verb (basically a function) to make all the lower-case letters in an ASCII string uppercase, while leaving other characters unchanged. It took me an hour to produce this:

string =. '"Please, Sir, could you make this upper case?" said Pip.'
(chr=.{&a.) s -32&* e.&((ord 'a')+(i.26))  s=.(ord=.a.&i.) string

If you type that into the J terminal, you get:

"PLEASE, SIR, COULD YOU MAKE THIS UPPER CASE?" SAID PIP.

The above code is just an expression, not a verb, but once I got that far, I used one of J's built-in routines to "simplify" it for me:

upcase =. [: ([: {&a. ] - [: 32&* e.&(97+i.26)) a.&i.

I told you J was cryptic.

I can't fully explain this line myself yet, but I can tell you how I came up with it, starting almost from scratch.

Blank Slate : A Quick Taste of J

Code in general is often unreadable to the untrained eye, and part of learning a language is adjusting to the visual shape of the code.

For me, even python was difficult to read until I got used to looking at it, and I was coming from perl, another notoriously cryptic language.

From a learning to learn perspective, J has the advantage of a large vocabulary comprised almost entirely of ASCII punctuation characters, many of which have at least six different meanings – so I have a tendency to forget what I've learned and have to re-learn it. :) For example, the character \plus can be used to form any of the following three verbs: {\plus \plus. \plus: }, each of which has separate meanings depending on whether it's used in a prefix ("monadic") or infix ("dyadic") context. Dyadic 1 \plus 1 means one plus one, and monadic + 1 means something like positive one[1]. In J, every verb has two meanings like this. Further, J generalizes each verb to work with multi-dimensional arrays. For example, dyadic x \plus. y is the greatest common denominator verb. The following transcript of an interactive J session shows how these concepts are combined. Note: J is available for free from http://jsoftware.com/ if you want to follow along.

   NB. Input lines are indented in the J terminal.

   8 +. 5 6 7 8 9 10 11 12     NB. GCD(8,5) GCD(8,6) ... GCD(8,12)
1 2 1 8 1 2 1 4

   12 11 10 9 8 7 6 +. 6 7 8 9 10 11 12  NB. GCD(12,6) GCD(11,7) ...
6 1 2 9 2 1 6

   NB. Ranges can be generated with monadic ' i. ' ('Integers')
   NB. Dyadic ; composes values into an array (so an array of arrays below)
   NB. The symbol '_7' is the literal value 'negative seven'
   NB. So this line shows how to construct 4 different ranges:
   (i. 7)  ;  (i. _7)  ;  (6 + i. 7) ;  (6 + i. _7)
┌─────────────┬─────────────┬────────────────┬────────────────┐
│0 1 2 3 4 5 6│6 5 4 3 2 1 0│6 7 8 9 10 11 12│12 11 10 9 8 7 6│
└─────────────┴─────────────┴────────────────┴────────────────┘

   NB. Now we can simplify our earlier expressions.
   8 +. 5 + i.8   NB. Strict right-to-left evaluation: (8 +. (5 + (i. 8)))
1 2 1 8 1 2 1 4

   (6 + i. _7) +. 6 + i. 7
6 1 2 9 2 1 6

   NB. If that's too readable for you, you can skip the whitespace... :)
   (6+i._7)+.6+i.7
6 1 2 9 2 1 6

That's enough to get the basic idea of J code: very little syntax, just a lot of verbs to make things happen.

On to upcase

I learned the basic concept of converting an ASCII string to upper case a long time ago. Here's how I first implemented it in turbo pascal, probably around 1992 (again, you don't really need to understand the code):

function upstr( s : string ) : string;
    var count : byte;
  begin
    for count := 1 to length( s ) do
      s[ count ] := upcase( s[ count ] );
    upstr := s;
  end;

Basically, in turbo pascal, upcase(ch:char):char was built-in, so I just had to loop through a string and apply upcase to each character.[2]

If I'd had to write upcase myself, though, it probably would have looked like this:

function upcase( ch : char ) : char;
  begin
    if ch in ['a'..'z'] then
      upcase := ord(ch) + (ord('A') - chr('a'))
    else
      upcase := ch;   { Or in moden-day pascal, 'result := ...' }
  end;

The functions Ord and Chr are primitives.[3] Ord(ch) converts a character into a number and Chr(x) converts a number to the equivalent character.

Eight bit characters are indistinguishable from any other bytes in memory[4], and in languages like C, there is no distinction between a byte and a character. Pascal made the distinction at compile time, for type safety, but there is no actual machine code required to perform the operations.

J, on the other hand, is a dynamically typed language, where numbers and strings have different internal representations (probably differing at least by an extra byte encoding the type) and have to be explicitly converted. My hunch was that if I could figure out how to implement Ord and Chr in J, the rest of the problem would be easy.

a. is for Alphabet

I knew J supported strings, but I didn't know exactly how they worked.

The first rule of learning a programming language is to get familiar with its documentation.

You don't have to memorize every word, or even read it all at once. Just know what resources are available, and remember where to find them.

With J, I often start with the vocabulary page, because it lists all the symbols and their names.[5]

There's nothing on the vocabulary page about strings, but there is an entry for a. Alphabet ⁄ Ace , and following the link confirmed that it was a predefined array representing the character set. I typed it in the J terminal, and saw a bunch of garbage:

   a.    NB. output manually edited to remove many non-ascii characters
\001\002\003\004\005\006\007\010
\013\014  ... \037!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNO
PQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~....

Given this array, chr(n) would just mean retrieving the the nth item, and ord(ch) would involve searching through the array to find the index of the given character.

In the languages I'm used to, this would probably might mean typing a[65] and a.index('a'), but J has a completely different syntax that I can never remember.

Often, when you're learning something, the documentation will explain something, but you won't have enough experience to really understand what you're reading.

With J in particular, I have a bad tendency to gloss over examples in the docs because I often don't even understand the mathematical concepts they're trying to illustrate. In this case, the page for a. showed an example for displaying the printable ascii characters:

   1 2 3 { 8 32 $ a.                    NB. From the J docs for a.
 !"#$%&'()*+,-./0123456789:;<=>?
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_
`abcdefghijklmnopqrstuvwxyz{|}~?

I actually understand enough J that I should have known what this was doing: • Dyadic x $ y (Shape) reshapes array y to the dimensions specified by x. • Dyadic x { y (From) extracts the elements specified in x from array y.

So in this case, the 8 32 $ a. arranges the characters of a. into an 8 × 32 grid, and then the 1 2 3 { ... part extracts the second, third, and fourth rows (array indices start at 0).

Had I paid more attention, I would have seen immediately that the way to write chr(n) (or at least a[n]) in J is n{a. .

I did look at the example, but the way I mentally chunked it, I just saw "here's a way to arrange the ascii characters" without considering how it worked (or even really noticing which characters were involved). In any case, I was thinking more about Ord than Chr anyway, and I had a few guesses about how I might implement it.

Hunting down Ord

Back on the J Vocabulary page, I did a quick search for the word "index" and saw i.Integers ⁄ Index Of .

I happened to know that A is ASCII character #65 (confirmed by typing ord('A') into a python prompt) so here's what I expected to happen:

   'a' i. a.      NB. What I expected:
65

But instead:

   'a' i. a.      NB. What really happened:
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...

This bears very little correspondence to any notion I have of an index.

The docs explain what it's doing (sort of), but not why:

If rix is the rank of an item of x, then the shape of the result of x i. y is (-rix)}.$y . Each atom of the result is either #x or the index of the first occurrence among the items of x of the corresponding rix-cell of y. The comparison in x i. y is tolerant, and fit can be used to specify the tolerance, as in i. !. t .

I don't know yet what tolerance means or how to interpret (-rix)}.$y – clearly these docs are written for people who are already familiar with array languages, and perhaps when I have more experience dealing with multi-dimensional arrays, this seemingly strange behavior will make perfect sense.

And yet, this operation is called Index Of, and the practical result in this particular case is that it produces an array with a bunch of ones and one zero. I didn't count at the time, but the 0 is in the 98th slot, because

Ord('a') = 97 (lower case).

At this point, I rejected i. as a path to Ord, but I remembered there's a verb called Copy.

Take a left on Nub Street (or how to find Copy when what you want is Select)

Usually, when I play around with J, I find myself searching for that a[n] syntax. We just saw that it's n{a, but I usually forget this, and have to search for it again. What usually happens is that look through the vocabulary page for a word like "select" or "index", and, after trying:

   0 i. 'abc'     NB. hoping for 'a', but that's not what 'Index Of' means
1 1 1

I usually wind up looking at the definition of ~: Nub Sieve ⁄ Not Equal , because the word "sieve" is the closest thing that matches my idea of selecting items from an array. That page says: ~:y is the boolean list b such that b#y is the nub of y.

Apparently, nub is their word for the unique value in an array:

   ~: 'Mississippi'             NB. the example from the "nub sieve" docs
1 1 1 0 0 0 0 0 1 0 0

   (~: 'Mississippi') # 'Mississippi'   NB. not in docs, but should be. :)
Misp

So a "nub sieve" isn't what I want, but it looks like this # thing is a bit like the generic "select" I'm looking for.

This is line of searching is something I went through several times when I experimented casually with J in the past. How did I keep missing # when looking for my hypotheticalSelect operator? I kept missing it because # is named # Tally ⁄ Copy .

If the arguments have an equal number of items, then x#y copies +/x items fromy, with i{x repetitions of item i{y . Otherwise, if one is an atom it is repeated to make the item count of the arguments equal. The complex left argument a j. b copies a items followed by b fills. The fit conjunction provides specified fills, as in #!.f

Not knowing what i{x meant (again, x[i], the thing I was usually searching for when I wound up here), the text above didn't usually make sense to me.

As I sit here writing now, most of the documentation I read makes sense, but when I'm in "problem solving" mode, there just isn't time or room in my head to carefully analyze each page.

Instead, I'm doing a broad search, attempting to find the pages that are most likely to answer my question, and glossing over anything that doesn't immediately match.

This may seem like an inefficient and error-prone process compared to just working through a tutorial, but it works.

Reading a tutorial is a bit like taking a guided tour of a city. You get to see some interesting things and travel in comfort, but everything you encounter has been prepared for you in advance.

My approach is more like going to a new city and picking an arbitrary goal: find the library or find a nice park…. and then just heading out to explore. I'll probably get lost a few times, and completely miss out on a few popular attractions at first, but eventually I get to know my way around the place, in a way a tourist probably never will.

Anyway, in my wanderings through the "City of J", I kept setting out to find the Select verb and instead found myself over on Nub Sieve avenue, which of course eventually would bring me back to Copy. So why on earth is Select called Copy?

Well of course, it isn't. What I thought of s Select is just From in J, and it's just the {symbol that keeps showing up:

   0 2 4 { 'abcdefg'
ace

But Copy can do the same thing in a pinch, if you also happen to find dyadic e. (calledMember (In) in the J docs, but I would have called it Element Of).

   NB. Selecting values directly with 'Copy':
   1 0 1 0 1 0 0 # 'abcdefg'
ace

   NB. For each integer 0..6, is it an element of the array 0 2 4 ?
   (i.7) e. 0 2 4
1 0 1 0 1 0 0

   NB. Combining those two ideas:
   ((i.7) e. 0 2 4) # 'abcdefg'
ace

Obviously this so-called Copy thing this is pretty terrible compared to { , because {doesn't require you to know the length or to fill in all those zeros, but at least we know "the long way" to get to Select. But why on earth is it called Copy?!?

   0 1 2 3 # 'abcd'
bccddd

Oh.

From Copy to Chr

So there I was, looking at:

   'a' i. a.
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ...

I remember my earlier discovery of Copy and think to myself that if I could swap the zeros and ones, then perhaps it would bring me closer to Ord or Chr. I look up how to do Not in the vocabulary. It's spelled -. in J, and it's really 1 - x.[6]

So now I can do:

   NB. Just using (8 32 $) here to reformat the results:

   8 32 $        'a' i. a.      NB. My original data.
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

   8 32 $      -.'a' i. a.      NB. After applying not.
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

I didn't really know where I was going with this, yet, but I figured I could use Copy and the above pattern to get the letter 'a' back out of the alphabet. The first thing I tried happened to work:

   NB. ~ swaps the arguments of a verb so that  a. #~ x   becomes  x # a.
   a. #~  -. 'a' i. a.
a

So at this point, I'd essentially implemented an identity function that was somewhere in the ballpark of chr(ord(ch)), but I needed to figure out how to extract the individual components.

Rummaging

My next few attempts didn't work at all. I figured if I could put an 'a' in and get an 'a' out, maybe I could do the same thing for an entire string. First I tried just replacing the string, and I got an error. I tried the exact same thing again and got an error.

   a.#~-.'apples'i.a.
|domain error
|   a.    #~-.'apples'i.a.

   a.#~-.'apples'i.a.
|domain error
|   a.    #~-.'apples'i.a.
Next I tried some other stuff and some stuff happened.
   'apples'i.a       NB. I forgot the . in a. but failed to notice
'apples' i. a        NB. J is giving me back a symbolic expression,
                     NB. presumably because 'a' is not defined.
                     NB. I wonder if maybe '' is for characters and
                     NB. '"' is for strings, so try:
   "apples"i.a       NB. but " is a verb in J, not part of a string.
|syntax error        NB. 'a' is a string. there is no "character" type
|       "apples"i.a

   'apples'i.a       NB. The J terminal makes it easy to duplicate input.
'apples' i. a        NB. I cursored up and pressed enter to duplicate the
                     NB. line, and probably just automatically ran it to
                     NB. make sure it acted the same, in case I had
                     NB. accidentally changed the history.
   'apples'i./a      NB. / inserts a function between each element of a
'apples' i./ a       NB. an array (so +/1 2 3 -> 1+2+3). I have no idea
                     NB. what I was thinking about here that would have
   'apples'i.~/a     NB. caused me to type it.
'apples' i.~/ a

It's hard to express my thought process at this point.

The side comments I wrote above are things I'm observing as I write this document, but in the moment, I probably wasn't aware of any of it.

In the moment, I had no idea why things weren't working, and was just trying lots of different things.

This is basically rummaging here. I didn't have a clear idea what I was looking for, but I had a sense that something was wrong, and was mostly typing on autopilot. I probably issued all of the above commands within the span of 30 seconds or so.

Eventually, my brain caught up to my fingers, I noticed the missing period in a., and typed what I really wanted to type:

  'apples' i.a.
6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 0 6 6 6 4 6 6 6 6 6 6 3 6 6 6 1 6
6 5 6 6 6 6 6 6 6 6 6 6 6 6 ...

This result is also completely unexpected to me.

Once again, here is the description of dyadic i. (Index Of):

If rix is the rank of an item of x, then the shape of the result of x i. y is (-rix)}.$y . Each atom of the result is either #x or the index of the first occurrence among the items of x of the corresponding rix-cell of y. The comparison in x i. y is tolerant, and fit can be used to specify the tolerance, as in i. !. t .

I still don't understand why any of this would be useful, but I know #x means length (Tally) and 6 is the length of the string 'apples'.

Siting here writing this, I can see that the indices of the string 'apples' are 0 1 2 3 4 5 and so 6 is a perfectly logical "not found".

If you look closely, you can see that the numbers 0 1 3 4 5 all appear in that field of sixes, but 2 does not. That makes sense because:

   1 2 { 'apples'
pp

What's happening with the sixes is that given each ascii character, J is searching for its first position in the string 'apples'.

In other words, I had the parameters backwards, and what I should have typed was this:

   a.i.'apples'
97 112 112 108 101 115

And in fact, that is exactly the definition I eventually came up with for Ord:

ord =. a. & i.    NB. The '&' is an operator that transforms dyadic i. into
                  NB. a new verb (which I assigned to the variable 'ord')
                  NB. Now: (ord x) = ((a.&i.) x) = (a. i. x)
                  NB. This transformation is called "partial application".
   ord 'apples'
97 112 112 108 101 115

However, in the moment, the giant pile of sixes made no sense to me, and I wasn't able to follow this train of thought.

Instead, my failed attempt to use double quotes for a string reminded me of the concept of rank, and so I took a wrong turn. (to be continued…) NOTE . epilogue : a built in function

   load 'convert'
   toupper
3 : 0
x=. I. 26 > n=. ((97+i.26){a.) i. t=. ,y
($y) $ ((x{n) { (65+i.26){a.) x}t
)

Footnotes

  1. the monadic verb \plus x actually produces the complex conjugate of x.
  2. In turbo pascal, strings were always 256 bytes, with the first byte representing a length. This upstr function takes a string by value, meaning all 256 bytes are copied onto the stack. Changing the signature to procedure upstr( var s : string ); would have modified the string in place without making a copy, but I generally preferred the functional style, even back then.
  3. Characters at the time were 8 bits. ASCII only specifies 128 ncharacters, 32 of which are invisible control codes. If you were an american writing code for DOS back in the day, you probably had several printouts of the CP437 character set lying around.
  4. Pascal is case-insensitive, so ord, Ord, ORD, etc. all refer to the same thing.
  5. Looking back now, I wonder if the J phrasebook might have been a more helpful starting point.
  6. Over and over in J, I find that the verb I want is just a specific instance of some more general operation. I introduced +. earlier as greatest common divisor, but it's also the logical OR operator. This might seem like some crazy operator overloading, but that's not the case: logical OR just happens to be a special case of GCD.

Coda

Here is the verb in the standard library that accomplishes what this novice J programmer set out to do.

   toupper
3 : 0
x=. I. 26 > n=. ((97+i.26){a.) i. t=. ,y
($y) $ ((x{n) { (65+i.26){a.) x}t
)
   whereDefined 'toupper'
C:\Users\devon_mccormick\j64-804\system\main\stdlib.ijs

Show-and-tell

Here is the start of an exposition showing how to develop J code modeling (part of) the exercise outlined here.

Downloading Data to a Local Database

We start with this article which ends with over a hundred lines of Python. Sorry about that but we would like to compare it to the J in the following section.

15 years of forex tick data to MongoDB using Python. Part One

by Talaikis | Feb 12, 2016 | Investing | 21 Comments

Here's another idea of mine. I've decided to download, process and make available through my brand new MongoDB 15 years of forex tick data from GAIN (Also possible from TrueFX/ Pepperstone). The procedure, all automatic (I'm too lazy to download by hand):

  1. Walk through pages and get all files to download.
  2. Download.
  3. Unzip, read csv files, process and put into database.

What we'll need:

  • requests
  • BeautifulSoup
  • specify where to put your files

What you'll get?

  • The whole lot of data files (thousands, exact number will be known in next part).
  • Happiness you saved a lot of boring time from processing this by hand.

Example of data:

1	lTid	cDealable	CurrencyPair	RateDateTime	RateBid	RateAsk
2	1139121860	D	USD/NOK	2010-05-09 17:04:50	6.1288	6.1378
3	1139121874	D	USD/NOK	2010-05-09 17:04:51	6.1295	6.1385
4	1139121886	D	USD/NOK	2010-05-09 17:04:51	6.1293	6.1383
5	1139121910	D	USD/NOK	2010-05-09 17:04:53	6.129	6.138

Next I'm going walk directory, unzip, read and put all this to MongoDB.

Possible improvements:

  • Save list to download to the file.
  • Better handle of sessions and handling of exceptions.
  • Better walk function.
  • Making download of TrueFX/Pepperstone data work.

Python code:

1	import requests
2	import os
3	import shutil
4	from bs4 import BeautifulSoup
5	import warnings
6	import re
7	from nltk.tokenize.casual import URLS
8	import time
9	 
10	global dump
11	 
12	#download file magic
13	def download_file(url_):
14	    global dump
15	    file = requests.get(url_, stream=True)
16	    dump = file.raw
17	 
18	#save file magic
19	def save_file(path_, file_name):
20	    global dump
21	    location = os.path.abspath(path_)
22	    with open(file_name, 'wb') as location:
23	        shutil.copyfileobj(dump, location)
24	    del dump
25	    
26	#script body
27	if __name__ == "__main__":
28	    scriptStart = time.time()
29	    
30	    #supress security warnings
31	    warnings.filterwarnings("ignore")
32	    
33	    #starting point
34	    src_ = "http://ratedata.gaincapital.com/"
35	 
36	    #initialize session and get something
37	    s = requests.Session()
38	 
39	    #request www
40	    r = s.get(src_, verify=False)
41	 
42	    #get html object
43	    html_ = r.text
44	    soup = BeautifulSoup(html_)
45	 
46	    #get first level links
47	    first_level = []
48	    for link in soup.find_all('a'):
49	        l = link.get('href')
50	        if "0" in l:
51	            first_level.append("http://ratedata.gaincapital.com/"+l[2:])
52	            #print l[2:]
53	    
54	 
55	    #get second level
56	    next_level = []
57	    to_down = []
58	 
59	    for i in range(0, len(first_level)):
60	        print first_level[i]
61	    
62	        r2 = s.get(first_level[i], verify=False)
63	        html_2 = r2.text
64	        soup = BeautifulSoup(html_2)
65	        for link in soup.find_all('a'):
66	            l = link.get('href')
67	        
68	            #find new data format strings
69	            f = re.findall(r'(?<!\d)\d{1,2}\s', l)
70	            if len(f) > 0:
71	                next_level.append(first_level[i]+"/"+l[2:])
72	    
73	            #if zip add to final list
74	            if ".zip" in l:
75	                to_down.append(first_level[i]+"/"+l[2:])    
76	            
77	    #make unique list of next level
78	    next_level = list(set(next_level))
79	 
80	    #get urls for new data format
81	    for i in range(0, len(next_level)):
82	        print next_level[i]
83	    
84	        r3 = s.get(next_level[i], verify=False)
85	        html_3 = r3.text
86	        soup = BeautifulSoup(html_3)
87	        for link in soup.find_all('a'):
88	            l = link.get('href')
89	            
90	            #if this is zip file add to list to download
91	            if ".zip" in l:
92	                to_down.append(next_level[i]+"/"+l[2:])
93	            
94	    s.close()
95	 
96	    #make list unique
97	    to_down = list(set(to_down))
98	 
99	    for g in range(0, len(to_down)):
100	        #open session
101	        s = requests.Session()
102	        
103	        print "Downloading "+to_down[g]
104	 
105	        #make name of file and path
106	        file_name = to_down[g][31:].replace("/", "").replace(" ", "")
107	        print file_name
108	        path_ = "C:\\data\\fx\\"+file_name
109	 
110	        #do magic
111	        download_file(to_down[g])
112	        save_file(to_down[g], path_)
113	    
114	        #close session
115	        s.close()
116	        
117	    timeused = (time.time()-scriptStart)/60
118	    print("Done in ",timeused, " minutes")

(Start of) Download of Files Referenced by Multi-level HTML

Here is the start of some J code to replicate the file download portion of the Python code in “Downloading Data to a Local Database”. The initial noun attempts to illustrate the process of developing this code. The final verb “getReferences” is an incomplete version of what we have so far, after getting to the point of downloading a single file. It looks like this verb should have at least two other verbs, corresponding to the latter two commented sections, broken out of it and applied to each of the links extracted above it.

Here’s what a few levels of the HTML pages look like:

Top Level Second Level with Data File Names
GAINhistoricRateData.jpg GAINhistoricRateData-level2.jpg

Finding the Data

So, we have to read the “level 1” page to find the links to the files on a “level 2” page. Here’s what we have so far. First, we have three J statements followed by the response generated from the wget call.

NB.* extractFromHTML.ijs: extract data files referenced under multi-leveled HTML pages.

figuringItOut=: 0 : 0
   src=. 'http://ratedata.gaincapital.com/'  NB. Starting point
   1!:44 'C:\amisc\J\NYCJUG\201603\'
   shell 'wget -O level1Links.html ',src   NB. Arbitrary file name->generalize.
--08:40:31--  http://ratedata.gaincapital.com/
           => `level1Links.html'
Resolving corp-hts-proxy.mhc... done.
Connecting to corp-hts-proxy.mhc[152.159.215.21]:8080... connected.
Proxy request sent, awaiting response... 200 OK
Length: 6,371 [text/html]

    0K ......                                                100%  222.20 KB/s

08:40:31 (222.20 KB/s) - `level1Links.html' saved [6371/6371]

Now that we have the top-level (level 1) page, we can extract the links to the data from it by searching for the HTML "<a href=" statements preceding the URL for each link.

   whlnks=. '<a href=' E. level1=. fread 'level1Links.html'  NB. Get links from level 1…
   $&.>whlnks;level1
+----+----+
|6371|6371|
+----+----+
   +/whlnks
21
   ]whlnks=. I. whlnks
691 924 1157 1391 1626 1861 2095 2329 2563 2798 3033 3268 3503 3738 3973 4208 4443 4678 4913 5151 5507

So, whlnks is a vector of the starting locations of 21 "href" statements from which we want to extract URLs. First we figure out how to find the end of each "href", then how to extract what is in between the start of the "href" and its end.

   '>' i.~ level1}.~{.whlnks
16
   16{.level1}.~{.whlnks
<a href=".\2000"
   <;._1 '=',16{.level1}.~{.whlnks
+-------+--------+
|<a href|".\2000"|
+-------+--------+
   '"'-.~>_1{<;._1 '=',16{.level1}.~{.whlnks
.\2000
   src,'"'-.~>_1{<;._1 '=',16{.level1}.~{.whlnks
http://ratedata.gaincapital.com/.\2000
   src,2}.'"'-.~>_1{<;._1 '=',16{.level1}.~{.whlnks   NB. Extract the name of the next level page…
http://ratedata.gaincapital.com/2000
   r2=. src,2}.'"'-.~>_1{<;._1 '=',16{.level1}.~{.whlnks
   r2=. src,newlvl=. 2}.'"'-.~>_1{<;._1 '=',16{.level1}.~{.whlnks

Now that we've found these second-level links, we can build wget statements to download the data from each link.

   shell 'wget -O ',(nm=. 'level2-',newlvl,'.html'),' ',r2
--08:53:48--  http://ratedata.gaincapital.com/2000
           => `level2-2000.html'
Resolving corp-hts-proxy.mhc... done.
Connecting to corp-hts-proxy.mhc[152.159.215.21]:8080... connected.
Proxy request sent, awaiting response... 301 Moved Permanently
Location: http://ratedata.gaincapital.com/2000/ [following]
--08:53:48--  http://ratedata.gaincapital.com/2000/
           => `level2-2000.html'
Connecting to corp-hts-proxy.mhc[152.159.215.21]:8080... connected.
Proxy request sent, awaiting response... 200 OK
Length: 3,329 [text/html]

    0K ...                                                   100%  130.04 KB/s

08:53:48 (130.04 KB/s) - `level2-2000.html' saved [3329/3329]

   ]r2=. src,newlvl=. 2}.'"'-.~>_1{<;._1 '=',whEnd{.level1}.~{.whlnks
http://ratedata.gaincapital.com/2000
   shell 'wget -O ',(nm=. 'level2-',newlvl,'.html'),' ',r2   NB. Less-arbitrary 2nd level name…
--16:36:47--  http://ratedata.gaincapital.com/2000
           => `level2-2000.html'
Resolving corp-hts-proxy.mhc... done.
Connecting to corp-hts-proxy.mhc[152.159.215.21]:8080... connected.
Proxy request sent, awaiting response... 301 Moved Permanently
Location: http://ratedata.gaincapital.com/2000/ [following]
--16:36:47--  http://ratedata.gaincapital.com/2000/
           => `level2-2000.html'
Connecting to corp-hts-proxy.mhc[152.159.215.21]:8080... connected.
Proxy request sent, awaiting response... 200 OK
Length: 3,329 [text/html]

    0K ...                                                   100%  130.04 KB/s

16:36:47 (130.04 KB/s) - `level2-2000.html' saved [3329/3329]

The initial comment below is to remind us of what the line we're parsing looks like so we can figure out how to extract the link we want.

   NB. 		<td align="left" ><img src="/images/dir_misc.gif" width="16" height="16" border="0"> <a href=".\USD_JPY_2000.zip"><font color="#ffffff">USD_JPY_2000.zip</a></font></td>

   #whlnks2=. I. '<a href=' E. level2=. fread nm
7
   ]whEnd2=. '>' i.~ level2}.~{.whlnks2
28
   '"'-.~>_1{<;._1 '=',whEnd2{.level2}.~{.whlnks2
.\USD_JPY_2000.zip
   xx=. '"'-.~>_1{<;._1 '=',whEnd2{.level2}.~{.whlnks2

Now that we know what we want to do, let's select the handy phrases above so we can more simply use them.

   13 : '(2*x-:2{.y)}.y'  NB. Generalize above code to only drop 2 characters if they refer to the present
] }.~ 2 * [ -: 2 {. ]     NB. level: this should be better generalized to handle relative and absolute specifications.
   '.\' (] }.~ 2 * [ -: 2 {. ]) '"'-.~>_1{<;._1 '=',whEnd2{.level2}.~{.whlnks2
USD_JPY_2000.zip
   nxtfl=. '.\' (] }.~ 2 * [ -: 2 {. ]) '"'-.~>_1{<;._1 '=',whEnd2{.level2}.~{.whlnks2
src,newlvl
http://ratedata.gaincapital.com/2000
   src,newlvl,'/',nxtfl
http://ratedata.gaincapital.com/2000/USD_JPY_2000.zip
   shell 'wget -O ',nxtfl,' ',src,newlvl,'/',nxtfl
--16:43:33--  http://ratedata.gaincapital.com/2000/USD_JPY_2000.zip
           => `USD_JPY_2000.zip'
Resolving corp-hts-proxy.mhc... done.
Connecting to corp-hts-proxy.mhc[152.159.215.21]:8080... connected.
Proxy request sent, awaiting response... 200 OK
Length: 3,160,728 [application/x-zip-compressed]

    0K .......... .......... .......... .......... ..........  1%  476.19 KB/s
   50K .......... .......... .......... .......... ..........  3%    1.88 MB/s
  100K .......... .......... .......... .......... ..........  4%    1.88 MB/s
...
 3000K .......... .......... .......... .......... .......... 98%    3.26 MB/s
 3050K .......... .......... .......... ......               100%    2.75 MB/s

16:43:35 (2.16 MB/s) - `USD_JPY_2000.zip' saved [3160728/3160728]
)

So, we have successfully downloaded the first data files. We can get the remaining ones in the same way.

Summary of Steps So Far

Gathering together the working phrases above and packaging them into a handy verb gives us this.

getReferences=: 3 : 0
NB.   src=. 'http://ratedata.gaincapital.com/'  NB. Starting point
   src=. y
   shell 'wget -O level1Links.html ',src
   whlnks=. I. '<a href=' E. level1=. fread 'level1Links.html'

NB. Next (one) level...
   whEnd=. '>' i.~ level1}.~{.whlnks
   r2=. src,newlvl=. 2}.'"'-.~>_1{<;._1 '=',whEnd{.level1}.~{.whlnks
   shell 'wget -O ',(nm=. 'level2-',newlvl,'.html'),' ',r2
   whlnks2=. I. '<a href=' E. level2=. fread nm
   whEnd2=. '>' i.~ level2}.~{.whlnks2

NB. Get one file from this 2nd level...
   nxtfl=. '.\' (] }.~2*[-:2{.]) '"'-.~>_1{<;._1 '=',whEnd2{.level2}.~{.whlnks2
   shell 'wget -O ',nxtfl,' ',src,newlvl,'/',nxtfl
)

(Part of an) Introduction to Bayesian Statistics

Here is one of the main slides from a talk I gave recently about Bayesian statistics.

The Problem with Priors.jpg

Some J Code for the “Box” or “Urn” Problem

Here is code to do the "urn" calculation outlined on the bottom left of this slide.

whichBox=: 4 : 0"(0 1) NB. Return prob of each box y given draw of item # x
   pH=. (]$%) #y                   NB. Assume equal probs for each box
   pDH=. x { &> (] %&.> +/&.>) y   NB. Conditional probs
   pHpDH=. pH * pDH                NB. Numerators
   pD=. +/pHpDH                    NB. p(D) for item x
   pHD=. pHpDH%pD                  NB. Normalized->posterior for each box
)
   0 whichBox 6 6; 4 2             NB. Problem above
0.428571 0.571429
   0 whichBox 30 10; 20 20         NB. Example from Downey “Think Bayes”
0.6 0.4
   0 whichBox 6 6; 9 7; 4 2        NB. Multiple boxes
0.289157 0.325301 0.385542

   0 whichBox 6 6; 0 7; 0 2        NB. Edge conditions: sanity checks
1 0 0
   0 whichBox 6 6; 1 7; 0 2
0.8 0.2 0
   0 whichBox 6 6; 1 7; 1 2
0.521739 0.130435 0.347826

The power of J: all possible single draws for multiple boxes with multiple colors.

   0 1 2 whichBox/ 30 10 5;20 20 20;10 20 30
0.571429 0.285714 0.142857
    0.25    0.375    0.375
0.117647 0.352941 0.529412
   0 1 2 whichBox/ 30 10 5;20 20 20;10 20 30;1 2 3
      0.5     0.25    0.125    0.125
 0.181818 0.272727 0.272727 0.272727
0.0769231 0.230769 0.346154 0.346154

Advanced topics

How Best to Modify Existing Code?

The following is a J adverb – “doSomething” - I wrote to process a large file in pieces using a user-supplied verb. This version of the adverbs embeds a number of assumptions about the type of large file on which we would apply the user verb: that it is a tab-delimited file in which the first row is column-headings and that we want this first row available throughout the file processing, i.e. this row, as an enclosed vector of the column headers, is passed to each invocation of the user verb.

This latter feature – passing the headers to each invocation of the verb – is something I have yet to use though it seemed like a good idea at the time. However, as I’ve had many occasions to use this method to work on large file, I’ve developed a couple of different versions of this original code that will be detailed below.

The question to ponder as we look at these different versions is: how might we better write – or rewrite – the original code to accommodate the subsequent versions? Or, is the method I chose, writing new, somewhat similar versions of the original, a better way to achieve this even though it eschews re-use?

The original version is this:

NB.* doSomething: apply verb to sequential blocks of file - assuming
NB. field-delimited file - by whole lines.  Args: file current location
NB. pointer, # bytes in each chunk read, size and name of file, [any
NB. partial chunk from previous call, file header, result of previous
NB. call to be passed on to next one].
doSomething=: 1 : 0
   'curptr chsz max flnm leftover hdr passedOn'=. 7{.y
   if. curptr>:max do. ch=. curptr;chsz;max;flnm
   else. if. 0=curptr do. ch=. readChunk curptr;chsz;max;flnm
           chunk=. leftover,CR-.~>_1{ch                NB. Last complete line.
           'chunk leftover'=. (>:chunk i: LF) split chunk   NB. LF-delimited lines
           'hdr body'=. (>:chunk i. LF) split chunk    NB. Assume 1st line is header.
           hdr=. }:hdr                                 NB. Remaining part as "leftover".
       else. chunk=. leftover,CR-.~>_1{ch=. readChunk curptr;chsz;max;flnm
           'body leftover'=. (>:chunk i: LF) split chunk
       end.
       passedOn=. u body;hdr;<passedOn  NB. Pass u's previous work to next invocation
   end.
   (4{.ch),leftover;hdr;<passedOn
NB.EG ((10{a.)&(4 : '(>_1{y) + x +/ . = >0{y')) doSomething ^:_ ] 0x;1e6;(fsize 'bigFile.txt');'bigFile.txt';'';'';0  NB. Count LFs in file.
)

It uses the verb “readChunk” to return successive pieces of the file, and the standard “split” verb as well:

readChunk=: 3 : 0
   'curptr chsz max flnm'=. 4{.y
   if. 0<chsz2=. chsz<.0>.max-curptr do. chunk=. fread flnm;curptr,chsz2
   else. chunk=. '' end.
   (curptr+chsz2);chsz2;max;flnm;chunk
NB.EG chunk=. >_1{ch0=. readChunk 0;1e6;(fsize 'bigFile.txt');'bigFile.txt'
)

split=: {. ,&< }.  

Two Simpler Versions

The first new version of this is a simpler version in that it makes no particular assumptions about the format of the file. Unlike doSomething, which assumes LF-delimited lines and takes care not to return partial lines, this new version – “doSomethingSimple” – merely passes successive pieces of the file to the user-supplied verb.

It looks like this:

NB.* doSomethingSimple: apply verb to file making minimal assumptions about
NB. file structure.
doSomethingSimple=: 1 : 0
   'curptr chsz max flnm passedOn'=. 5{.y
   if. curptr>:max do. ch=. curptr;chsz;max;flnm
   else. ch=. readChunk curptr;chsz;max;flnm
       passedOn=. u (_1{ch),<passedOn  NB. Pass u's previous work on to next invocation
   end.
   (4{.ch),<passedOn
NB.EG ([:~.;) doSomethingSimple ^:_ ] 0x;1e6;(fsize 'bigFile.txt');'bigFile.txt';<'' NB. Return unique characters in file.
)

This does use the same readChunk verb as its ancestor. Also, the requirement of passing on the previous result of the user-supplied verb to the subsequent invocation led to changes in this ancestral code.

So, there’s a little code re-use here and the subsequent requirements led to an improvement on the original code but the question remains: is there a better way to incorporate a new version like this into the existing adverb?

Finally, we have a third version of this, based on doSomethingSimple, that was created to use information internal to it in order to keep track of absolute positions in the file by adding the current file pointer curptr to the result of the user-supplied verb. This incorporates a major assumption about the result of this verb that is not present in the other two versions.

NB.* trackAbsoluteLocation: return locations in file resulting from
NB. "u" applied to successive pieces of it with result offset by
NB. current file pointer -> absolute location numbers.
trackAbsoluteLocation=: 1 : 0
   'curptr chsz max flnm passedOn'=. 5{.y
   if. curptr>:max do. ch=. curptr;chsz;max;flnm
   else. ch=. readChunk curptr;chsz;max;flnm
       new=. curptr+u >_1{ch       NB. Assume numeric array of relative locations
       passedOn=. passedOn,new     NB. Allow u's work to be passed on to next invocation
   end.
   (4{.ch),<passedOn
NB.EG ('benchmarkName'&([: I. E.)) trackAbsoluteLocation ^:_ ] 0x;1e6;(fsize 'bigFile.txt');'bigFile.txt';<'' NB. Find all starting locations of a string.
)

Does there appear to be any neat way to incorporate these modifications to the original version, or is the time-honored cut-and-paste-and-modify method still the best way to have handled these changes?

Learning and Teaching J

We look at an interview with a dubious provenance, and some other arguments against OO.

Interview with The Devil

'Re:C++ (Score:4, Interesting) by cerberusss (660701) Alter Relationship on Saturday January 23, 2016 @07:52AM (#51356507) Homepage Journal

Parent refers to the following (joke) interview that's been going around on the internet for ages:

On the 1st of January, 1998, Bjarne Stroustrup gave an interview to the IEEE's Computer magazine. Naturally, the editors thought he would be giving a retrospective view of seven years of object-oriented design, using the language he created. By the end of the interview, the interviewer got more than he had bargained for and, subsequently, the editor decided to suppress its contents, 'for the good of the industry' but, as with many of these things, there was a leak. Here is a complete transcript of what was was said, unedited, and unrehearsed, so it isn't as neat as planned interviews. You will find it interesting...

Interviewer: Well, it's been a few years since you changed the world of software design, how does it feel, looking back?

Stroustrup: Actually, I was thinking about those days, just before you arrived. Do you remember? Everyone was writing 'C' and, the trouble was, they were pretty damn good at it. Universities got pretty good at teaching it, too. They were turning out competent - I stress the word 'competent' - graduates at a phenomenal rate. That's what caused the problem.

Interviewer: Problem?

Stroustrup: Yes, problem. Remember when everyone wrote Cobol?

Interviewer: Of course, I did too

Stroustrup: Well, in the beginning, these guys were like demi-gods. Their salaries were high, and they were treated like royalty.

Interviewer: Those were the days, eh?

Stroustrup: Right. So what happened? IBM got sick of it, and invested millions in training programmers, till they were a dime a dozen.

Interviewer: That's why I got out. Salaries dropped within a year, to the point where being a journalist actually paid better.

Stroustrup: Exactly. Well, the same happened with 'C' programmers.

Interviewer: I see, but what's the point?

Stroustrup: Well, one day, when I was sitting in my office, I thought of this little scheme, which would redress the balance a little. I thought 'I wonder what would happen, if there were a language so complicated, so difficult to learn, that nobody would ever be able to swamp the market with programmers? Actually, I got some of the ideas from X10, you know, X windows. That was such a bitch of a graphics system, that it only just ran on those Sun 3/60 things. They had all the ingredients for what I wanted. A really ridiculously complex syntax, obscure functions, and pseudo-OO structure. Even now, nobody writes raw X-windows code. Motif is the only way to go if you want to retain your sanity.

Interviewer: You're kidding...?

Stroustrup: Not a bit of it. In fact, there was another problem. Unix was written in 'C', which meant that any 'C' programmer could very easily become a systems programmer. Remember what a mainframe systems programmer used to earn?

Interviewer: You bet I do, that's what I used to do.

Stroustrup: OK, so this new language had to divorce itself from Unix, by hiding all the system calls that bound the two together so nicely. This would enable guys who only knew about DOS to earn a decent living too.

Interviewer: I don't believe you said that...

Stroustrup: Well, it's been long enough, now, and I believe most people have figured out for themselves that C++ is a waste of time but, I must say, it's taken them a lot longer than I thought it would.

Interviewer: So how exactly did you do it?

Stroustrup: It was only supposed to be a joke, I never thought people would take the book seriously. Anyone with half a brain can see that object-oriented programming is counter-intuitive, illogical and inefficient.

Interviewer: What?

Stroustrup: And as for 're-useable code' - when did you ever hear of a company re-using its code?

Interviewer: Well, never, actually, but...

Stroustrup: There you are then. Mind you, a few tried, in the early days. There was this Oregon company - Mentor Graphics, I think they were called - really caught a cold trying to rewrite everything in C++ in about '90 or '91. I felt sorry for them really, but I thought people would learn from their mistakes.

Interviewer: Obviously, they didn't?

Stroustrup: Not in the slightest. Trouble is, most companies hush-up all their major blunders, and explaining a $30 million loss to the shareholders would have been difficult. Give them their due, though, they made it work in the end.

Interviewer: They did? Well, there you are then, it proves O-O works.

Stroustrup: Well, almost. The executable was so huge, it took five minutes to load, on an HP workstation, with 128MB of RAM. Then it ran like treacle. Actually, I thought this would be a major stumbling-block, and I'd get found out within a week, but nobody cared. Sun and HP were only too glad to sell enormously powerful boxes, with huge resources just to run trivial programs. You know, when we had our first C++ compiler, at AT&T, I compiled 'Hello World', and couldn't believe the size of the executable. 2.1MB

Interviewer: What? Well, compilers have come a long way, since then.

Stroustrup: They have? Try it on the latest version of g++ - you won't get much change out of half a megabyte. Also, there are several quite recent examples for you, from all over the world. British Telecom had a major disaster on their hands but, luckily, managed to scrap the whole thing and start again. They were luckier than Australian Telecom. Now I hear that Siemens is building a dinosaur, and getting more and more worried as the size of the hardware gets bigger, to accommodate the executables. Isn't multiple inheritance a joy?

Interviewer: Yes, but C++ is basically a sound language.

Stroustrup: You really believe that, don't you? Have you ever sat down and worked on a C++ project? Here's what happens: First, I've put in enough pitfalls to make sure that only the most trivial projects will work first time. Take operator overloading. At the end of the project, almost every module has it, usually, because guys feel they really should do it, as it was in their training course. The same operator then means something totally different in every module. Try pulling that lot together, when you have a hundred or so modules. And as for data hiding. God, I sometimes can't help laughing when I hear about the problems companies have making their modules talk to each other. I think the word 'synergistic' was specially invented to twist the knife in a project manager's ribs.

Interviewer: I have to say, I'm beginning to be quite appalled at all this. You say you did it to raise programmers' salaries? That's obscene.

Stroustrup: Not really. Everyone has a choice. I didn't expect the thing to get so much out of hand. Anyway, I basically succeeded. C++ is dying off now, but programmers still get high salaries - especially those poor devils who have to maintain all this crap. You do realise, it's impossible to maintain a large C++ software module if you didn't actually write it?

Interviewer: How come?

Stroustrup: You are out of touch, aren't you? Remember the typedef?

Interviewer: Yes, of course.

Stroustrup: Remember how long it took to grope through the header files only to find that 'RoofRaised' was a double precision number? Well, imagine how long it takes to find all the implicit typedefs in all the Classes in a major project.

Interviewer: So how do you reckon you've succeeded?

Stroustrup: Remember the length of the average-sized 'C' project? About 6 months. Not nearly long enough for a guy with a wife and kids to earn enough to have a decent standard of living. Take the same project, design it in C++ and what do you get? I'll tell you. One to two years. Isn't that great? All that job security, just through one mistake of judgement. And another thing. The universities haven't been teaching 'C' for such a long time, there's now a shortage of decent 'C' programmers. Especially those who know anything about Unix systems programming. How many guys would know what to do with 'malloc', when they've used 'new' all these years - and never bothered to check the return code. In fact, most C++ programmers throw away their return codes. Whatever happened to good ol' '-1'? At least you knew you had an error, without bogging the thing down in all that 'throw' 'catch' 'try' stuff.

Interviewer: But, surely, inheritance does save a lot of time?

Stroustrup: Does it? Have you ever noticed the difference between a 'C' project plan, and a C++ project plan? The planning stage for a C++ project is three times as long. Precisely to make sure that everything which should be inherited is, and what shouldn't isn't. Then, they still get it wrong. Whoever heard of memory leaks in a 'C' program? Now finding them is a major industry. Most companies give up, and send the product out, knowing it leaks like a sieve, simply to avoid the expense of tracking them all down.

Interviewer: There are tools...

Stroustrup: Most of which were written in C++.

Interviewer: If we publish this, you'll probably get lynched, you do realise that?

Stroustrup: I doubt it. As I said, C++ is way past its peak now, and no company in its right mind would start a C++ project without a pilot trial. That should convince them that it's the road to disaster. If not, they deserve all they get. You know, I tried to convince Dennis Ritchie to rewrite Unix in C++.

Interviewer: Oh my God. What did he say?

Stroustrup: Well, luckily, he has a good sense of humor. I think both he and Brian figured out what I was doing, in the early days, but never let on. He said he'd help me write a C++ version of DOS, if I was interested.

Interviewer: Were you?

Stroustrup: Actually, I did write DOS in C++, I'll give you a demo when we're through. I have it running on a Sparc 20 in the computer room. Goes like a rocket on 4 CPU's, and only takes up 70 megs of disk.

Interviewer: What's it like on a PC?

Stroustrup: Now you're kidding. Haven't you ever seen Windows '95? I think of that as my biggest success. Nearly blew the game before I was ready, though.

Interviewer: You know, that idea of a Unix++ has really got me thinking. Somewhere out there, there's a guy going to try it.

Stroustrup: Not after they read this interview.

Interviewer: I'm sorry, but I don't see us being able to publish any of this.

Stroustrup: But it's the story of the century. I only want to be remembered by my fellow programmers, for what I've done for them. You know how much a C++ guy can get these days?

Interviewer: Last I heard, a really top guy is worth $70 - $80 an hour.

Stroustrup: See? And I bet he earns it. Keeping track of all the gotchas I put into C++ is no easy job. And, as I said before, every C++ programmer feels bound by some mystic promise to use every damn element of the language on every project. Actually, that really annoys me sometimes, even though it serves my original purpose. I almost like the language after all this time.

Interviewer: You mean you didn't before?

Stroustrup: Hated it. It even looks clumsy, don't you agree? But when the book royalties started to come in... well, you get the picture.

Interviewer: Just a minute. What about references? You must admit, you improved on 'C' pointers.

Stroustrup: Hmm. I've always wondered about that. Originally, I thought I had. Then, one day I was discussing this with a guy who'd written C++ from the beginning. He said he could never remember whether his variables were referenced or dereferenced, so he always used pointers. He said the little asterisk always reminded him.

Interviewer: Well, at this point, I usually say 'thank you very much' but it hardly seems adequate.

Stroustrup: Promise me you'll publish this. My conscience is getting the better of me these days.

Interviewer: I'll let you know, but I think I know what my editor will say.

Stroustrup: Who'd believe it anyway? Although, can you send me a copy of that tape?

Interviewer: I can do that.

Object-Orientation: Arguments Against

The following are selected screenshots from this YouTube video.

OO-a disaster.jpg

Why do some languages hang around.jpg

Braces Considered Loopy

The following was printed in [1] “The Communications of the ACM”].

Braces Considered Loopy

The "naked braces" discussion, beginning with A. Frank Ackerman's letter to the editor "Ban 'Naked' Braces!" (Oct. 2015), perhaps misses the forest for the trees, as a major reason for deeply nested expressions is the inability of most programming languages to handle arrays without looping. This shortcoming further compounds itself by contributing to the verbosity of the boilerplate required for such looping (and multi-conditional) constructs.

Jamie Hale's proposed solution in his letter to the editor "Hold the Braces and Simplify Your Code" (Jan. 2016)—including "... small and minimally nested blocks ..."—to the issue first raised by Ackerman pointed in a good direction but may remain lost in the forest of intrinsically scalar languages. Small blocks of code are good, but, in most languages, doing so merely results in a plethora of small blocks, pushing the complexity to a higher level without necessarily reducing it.

A more functional, array-based way of looking at problems can, however, reduce that apparent complexity by treating collections of objects en masse at a higher level. Given most programmers' lack of familiarity with array-oriented programming, it is difficult for anyone, including me, to provide a widely comprehensible pseudocode example of what I mean by this, but consider the following attempt, based on the problem of invoking different code based on transaction size breakpoints (where "transactions" is a vector of transaction sizes)

Ignoring the length discrepancy between the number of functions provided and the ostensible shape of the Boolean condition on which their selection is based, such a construct could easily be extended to additional breakpoints with something like this

For anyone interested in wrestling with a specific example of an array-based functional notation guiding my thoughts on this example, see http://code.jsoft-ware.com/wiki/Vocabulary/atdot.

Devon McCormick, New York, NY