NYCJUG/2016-01-12

From J Wiki
Jump to navigation Jump to search

Maximum drawdown, genetic algorithms, parallelize J verbs, intuition in programming, sins of programming language design, naked braces

Location:: Heartland

Beginner's Regatta

Calculating Maximum Drawdown

First, a definition of maximum drawdown from Investopedia:

DEFINITION of 'Maximum Drawdown (MDD)'

The maximum loss from a peak to a trough of a portfolio, before a new peak is attained. Maximum Drawdown (MDD) is an indicator of downside risk over a specified time period. It can be used both as a stand-alone measure or as an input into other metrics such as "Return over Maximum Drawdown" and Calmar Ratio. Maximum Drawdown is expressed in percentage terms and computed as: (Trough Value – Peak Value) ÷ Peak Value

BREAKING DOWN 'Maximum Drawdown (MDD)'

Consider an example to understand the concept of maximum drawdown. Assume an investment portfolio has an initial value of $500,000. The portfolio increases to $750,000 over a period of time, before plunging to $400,000 in a ferocious bear market. It then rebounds to $600,000, before dropping again to $350,000. Subsequently, it more than doubles to $800,000. What is the maximum drawdown?

The maximum drawdown in this case is = ($350,000 – 750,000) / $750,000 =  –53.33%

Note the following points:

The initial peak of $750,000 is used in the MDD calculation. The interim peak of $600,000 is not used, since it does not represent a new high. The new peak of $800,000 is also not used since the original drawdown began from the $750,000 peak.

The MDD calculation takes into consideration the lowest portfolio value ($350,000 in this case) before a new peak is made, and not just the first drop to $400,000. MDD should be used in the right perspective to derive the maximum benefit from it. In this regard, particular attention should be paid to the time period being considered. For instance, a hypothetical long-only U.S. fund Gamma has been in existence since 2000, and had a maximum drawdown of -30% in the period ending 2010. While this may seem like a huge loss, note that the S&P 500 had plunged more than 55% from its peak in October 2007 to its trough in March 2009. While other metrics would need to be considered to assess Gamma fund's overall performance, from the viewpoint of MDD, it has outperformed its benchmark by a huge margin.

J Calculation of Maximum Drawdown

How might we calculate this in J in an array-oriented fashion? Let’s start in the middle: assume we have the peak and trough values:

   ((>./-<./)%>./) 2108.63 1867.61 NB. Max drawdown
0.114302

That’s the easy part. The harder part is finding the correct peak and trough for a given set of numbers. Taking the numbers from the example above:

   vals=. 1000* 500 750 400 600 350 800 
   (<./,>./) vals
350000 800000

We know these aren’t the right numbers. We need to find the peak at each point in the series. This will get us close:

   >./\vals
500000 750000 750000 750000 750000 800000

Now we can find the location of each new peak:

   ]whNewPeak=. 2</\>./\vals
1 0 0 0 1
   #whNewPeak
5
   #vals
6

There’s a problem: due to the nature of comparing pairs of items, our result will always be one shorter than our starting vector but we’d like them to line up so we have to make a decision. Fortunately, the choice seems clear: we’ll assume that the first value counts as a peak under the reasoning that it is the highest value “so far”. So,

   #whNewPeak=. 1,whNewPeak
6

It seems natural, with an eye to the next step of finding the minimum in each intra-peak section, to use whNewPeak as a partition vector:

   whNewPeak <;.1 vals
+------+---------------------------+------+
|500000|750000 400000 600000 350000|800000|
+------+---------------------------+------+

This allows us to find the minimum of each interval along with its corresponding peak:

   <./&>whNewPeak<;.1 vals
500000 350000 800000
   whNewPeak#vals
500000 750000 800000

So, finding the maximum drawdown should be relatively straightforward: we simply find the maximum difference:

   >./(whNewPeak#vals) - <./&>whNewPeak<;.1 vals
400000

However, there is a complication: we have to associate this number with its corresponding peak. Also, upon reflection, it would be useful to keep track of the date associated with each of these values so we can show our work.

maxDrawdown=: 3 : 0
   whNewPeak=. 1,2</\>./\y
   md=. (whNewPeak#y) - <./&>whNewPeak<;.1 ] y
   nfp=. 0={:whNewPeak            NB. Not final peak (no peak at end)
   md0=. (-nfp)}.md%whNewPeak#y   NB. Drop last if didn't end on new peak
   wsp=. md0 i. >./md0            NB. Where's starting peak of max drawdown?
   spix=. (wsp+0 1){I. whNewPeak  NB. Start, end index of max drawdown
   span=. y{~(<./ + [: i. [: >: [: | -/) spix
   wmin=. (<./spix)+span i. <./span NB. Where minimum was in span
   md=. (>./md0);spix;wmin
NB.EG 'md whsp whmin'=. maxDrawdown 500 750 400 600 350 800
NB.EG 'md whsp whmin'=. maxDrawdown vals=. 100 150 90 120 80 200
)

   vals{~/:~whsp, whmin
750000 350000 800000
   ]'md whsp whmin'=. maxDrawdown 500 750 400 600 350 800
+--------+---+-+
|0.533333|1 5|4|
+--------+---+-+
   vals{~/:~whsp, whmin
750000 350000 800000

A fuller example of usage:

'tit1 sp500ix'=: split <;._1&>TAB,&.><;._2 ] LF (] , [ #~ [ ~: [: {: ]) CR-.~0 : 0
Date	S&P 500 Index - Index Levels - Index Value - USD
01/06/2015	2002.613587
01/07/2015	2025.90105
01/08/2015	2062.143554
01/09/2015	2044.8099
…
01/04/2016	2012.659371
01/05/2016	2016.714426
01/06/2016	1990.262292
)

   tit1
+----+------------------------------------------------+
|Date|S&P 500 Index - Index Levels - Index Value - USD|
+----+------------------------------------------------+
   'dts vals'=. <"1 |:sp500ix
   vals=. ".&>vals
   ]'md whsp whmin'=. maxDrawdown vals
+---------+-----+--+
|0.0364402|37 75|44|
+---------+-----+--+
   sp500ix{~/:whsp,whmin
+----------+-----------+
|01/06/2015|2002.613587|
+----------+-----------+
|01/08/2015|2062.143554|
+----------+-----------+
|01/07/2015|2025.90105 |
+----------+-----------+

Show-and-Tell

AI - Simple Genetic Algorithm to Solve a Card Problem

We looked some work using a genetic algorithm (GA) to solve a kind of knapsack problem. The article solves this problem:

For this example, I have used a simple card splitting exercise, which is as detailed here:

  • You have 10 cards numbered 1 to 10
  • You have to divide them into two piles so that:
    • The sum of the first pile is as close as possible to 36.
    • And the product of all in the second pile is as close as possible to 360.


The solution consists of about 200 lines of C# code. However, this particular problem is small enough to solve by brute force without using a GA, as we see below.

A Non-GA J Solution

We eventually put together a J rendition of the genetic algorithm here but a simpler, brute-force solution to the problem follows.

First we make the set of cards, then we experiment with all combinations of five cards, assuming that about half of them should be enough to satisfy each of the two target conditions.

   set=. >:i.10
   $cmb5=. {5$<set
10 10 10 10 10
   (<5$0){cmb5
+---------+
|1 1 1 1 1|
+---------+
   $cmb5=. ,cmb5
100000
   (#@:~.)&.>5{.cmb5
+-+-+-+-+-+
|1|2|2|2|2|
+-+-+-+-+-+
   $cmb5=. cmb5#~(#&>cmb5)=(#@:~.)&>cmb5
30240
   5{.cmb5
+---------+---------+---------+---------+---------+
|1 2 3 4 5|1 2 3 4 6|1 2 3 4 7|1 2 3 4 8|1 2 3 4 9|
+---------+---------+---------+---------+---------+
   cmb5=. ~./:~&.>cmb5
   $cmb5
252
   5{.cmb5
+---------+---------+---------+---------+---------+
|1 2 3 4 5|1 2 3 4 6|1 2 3 4 7|1 2 3 4 8|1 2 3 4 9|
+---------+---------+---------+---------+---------+

Next we take the sums and products of all combinations, looking for the desired answers as given above of 36 for the sum and 360 for the product.

   wh36sum=. 36=+/&>cmb5
   wh360prod=. 360=*/&>cmb5
   I. wh360prod*.wh36sum

   +/approx36sum=. (37&>:*.35&<:)&>(+/&>cmb5)
15
   ccmb=. (<>:i.10)-.&.>cmb5   NB. Complement of cmb5
   +/approx360prod=. (370&>:*.350&<:)&>(*/&>ccmb)
3
   +/approx360prod*.approx36sum
1
   I. approx360prod*.approx36sum
195
   195{&.>(<cmb5),<ccmb
+------------+-----------+
|+----------+|+---------+|
||2 7 8 9 10|||1 3 4 5 6||
|+----------+|+---------+|
+------------+-----------+
   >>195{&.>(<cmb5),<ccmb
2 7 8 9 10
1 3 4 5  6
   +/2 7 8 9 10
36
   */1 3 4 5  6
360

Final (non-GA) J Code

Summarizing the work above in a J routine, instead of our brute-force method of finding combinations, we use the comb verb from the combinatorial package in the stats add-on.

SPsolution=: 3 : 0
   'set sumtarg prodtarg nps'=. y  NB. Set of numbers, targets for sum and product, #/set
   set=. ,set
NB.   cmb=. ,{nps$<set                     NB. Brute-force all combinations
NB.   cmb=. cmb#~(#&>cmb)=(#@:~.)&>cmb     NB. No repetitions
NB.   cmb=. ~./:~&.>cmb                    NB. Unique combos
   cmb=. <"1 set{~nps comb #set    NB. All combinations
   ccmb=. (<set)-.&.>cmb                NB. Complement of cmb
   approxSum=. ((>:sumtarg)&>:*.(<:sumtarg)&<:)&>(+/&>cmb)
   approxProd=. ((10+prodtarg)&>:*.(prodtarg-10)&<:)&>(*/&>ccmb)
   whsoln=. I. approxProd*.approxSum
   whsoln&{&>(<cmb),<ccmb
)
   SPsolution (>:i.10);36;360;5
+----------+
|2 7 8 9 10|
+----------+
|1 3 4 5 6 |
+----------+

This code is somewhat fragile as we were lucky with our initial guess that there should be about 5 items for each of the solutions. If we run with a number other than 5, we may get no solution

   $SPsolution (>:i.10);36;360;4
2 0

or more than one approximate solution:

   $SPsolution (>:i.10);36;360;6
2 2
   SPsolution (>:i.10);36;360;6
+------------+------------+
|1 2 7 8 9 10|1 3 6 7 8 10|
+------------+------------+
|3 4 5 6     |2 4 5 9     |
+------------+------------+

Evaluating the two answers here shows them to be slightly incorrect for one of the two conditions. This is the result of forcing the size of the answer to have six items for one side of it.

   soln6=. SPsolution (>:i.10);36;360;6

   (+/&>0{soln6) ,&.> */&>1{soln6
+------+------+
|37 360|35 360|
+------+------+

Advanced Topics

Parallelize J Verbs

This code from Marshall Lochbaum gives a relatively clean way to parallelize "embarassingly parallel" problems--that is, those which are represented by verbs u with u-:u"_1. It can be run simply as u parallelize n y, and will fork off n tasks to run equally spaced sections of y. If one of these returns an error, the error is spat back using debug foreigns, and another thread is spawned to kill off the remaining processes as they terminate.

If it is necessary to load scripts before execution of u, they can be set with setloadfiles, which has the same arguments as load.

Tested on Linux and previous versions worked on Mac and Windows. Windows execution will give console windows that pop up for the duration of the computation. This method currently does not support dyadic evaluation.

load 'strings files task'

fork=: ([: 2!:1 '(',,&')&')`(fork_jtask_)@.IFWIN
JEXE=: jpath IFWIN{:: '~bin/jconsole';'~bin\jconsole.exe'
WAITTIME=: 0.05

NB. read, write a boxed J noun
oread  =: [: <@(3!:2)@:(1!:1)"0 boxopen
owrite =: (3!:1@>@[ 1!:2 ])"0 boxopen

getfname =: ((jpath '~user/')&,) : (((jpath '~user/') , ({.~i.&'.')@[) , ":@] , (}.~i.&'.')@[) &.>

NB. (u parallelize n) y
NB. assumes (u -: u"_1) y
NB. divides y into n parts and executes u on them in parallel.
parallelize=:2 :0
  parts=. (</.~ [:<.n* (%~i.)@#) y
  'inputs flags outputs kernels'=. <"_1|: files=. |: ('input';'flag';'output';'kernel.ijs') getfname/ i.n
  parts owrite inputs
  '0' 1!:2"0 flags

  (u buildtemplate) runkernels files

  result=.$0
  while. +./ '1'~: read=. {.@fread"0 flags do.
    if. '2'e.read do.
      err =. ; oread (read i.'2'){outputs
      ferase (read i.'2'){files
      cleanup (read='0') # files
      13!:8&>/ err
    end.
    6!:3 WAITTIME
  end.
  result=. ; oread outputs
  ferase files

  result
)

NB. run first y kernels
runkernels=:4 :0"_ 1
  'input flag output kernel'=. y
  sentence=. x rplc  '{input}';input;'{flag}';flag;'{output}';output
  sentence fwrite kernel
  fork JEXE,' "',kernel,'"'
)

NB. built a template string to use for runkernels.
NB. assumes u is monadic.
buildtemplate=:1 :0
  if. _1= 4!:0 <'loadfiles' do. loadfiles=:'' end.
  if. 3 = 4!:0 <'u' do. u=.5!:5<'u' end.
  template rplc '{loadfiles}';loadfiles;'{verb}';u
)
template =: ('\)';')') rplc~ 0 :0
{loadfiles}
f=: {verb}
3 :0 ''
try.   '{flag}' (1!:2<)~ '1'[ '{output}' (1!:2<)~ f&.(3!:2) 1!:1 <'{input}'
catch. '{flag}' (1!:2<)~ '2'[ '{output}' (1!:2<)~ 3!:1 (13!:12'') ; (13!:11'')
end.
\)
2!:55 ]0
)

cleanup =: 3 :0
  c =. >getfname <'cleanup.ijs'
  c fwrite~ cleanupscript rplc '{files}';(5!:5<'y');'{WAITTIME}';(5!:5<'WAITTIME');'{c}';c
  fork JEXE,' "',c,'"'
)
cleanupscript =: ('\)';')') rplc~ 0 :0
(3 :0) {files}
try.
  while. #y do.
    for_i. y do.
      'input flag output kernel'=.i
      if. '0'~:{.1!:1 <flag do.
        1!:55 :: _1:"0 i
        y=.y-.i
      end.
    end.
    (6!:3) {WAITTIME}
  end.
catch.
  1!:55 :: _1:"0 y
end.
\)
1!:55 :: _1: <'{c}'
2!:55 ]0
)

NB. y is list of files to load (arguments to the verb load).
setloadfiles=:3 :0
  loadfiles=: 'load ''',y,''''
  i.0 0
)

Some Timings

   mat=. <:+:1000 1000?@$0
   (*.mat)-:*."_1 mat         NB. Verb “*:” meets the entry condition.
1
   6!:2 '*.mat'
0.557813
   6!:2 '(*. parallelize 2) mat'
0.955172
   6!:2 '(*. parallelize 4) mat'
0.728528
   6!:2 '(*. parallelize 8) mat'
0.427645
   6!:2 '(*. parallelize 16) mat'
0.53171
   6!:2 '(*. parallelize 6) mat'
0.43335

So we see that parallelization does not always give us a performance boost for this case but timing multiple iterations shows us that it is on average about 23% faster for for four to six parallel processes.

   (10) 6!:2 '*.mat'
0.564276
   (10) 6!:2 '(*. parallelize 6) mat'
0.437365
   0.437365%0.564276
0.775091
   (10) 6!:2 '(*. parallelize 8) mat'
0.451844
   (10) 6!:2 '(*. parallelize 16) mat'
0.533056
   (10) 6!:2 '(*. parallelize 5) mat'
0.438182
   (10) 6!:2 '(*. parallelize 4) mat'
0.434795
   (10) 6!:2 '(*. parallelize 32) mat'
0.715866

For a larger argument, we get better speed-ups, up to about 40%, for more processes.

   mat=. <:+:4000 4000?@$0               NB. Try with a larger matrix…
   (10) 6!:2 '*.mat'
2.24517
   (10) 6!:2 '(*. parallelize 6) mat'
3.94603
   (10) 6!:2 '(*. parallelize 4) mat'
4.85044
   (10) 6!:2 '(*. parallelize 8) mat'
3.15778
   (10) 6!:2 '(*. parallelize 16) mat'
1.36534
   (10) 6!:2 '(*. parallelize 32) mat'
1.50212
   (10) 6!:2 '(*. parallelize 64) mat'
1.85384
   1.36534 % 2.24517
0.608123

Learning and Teaching J

We present in full this essay from http://amasad.me/2016/01/03/overcoming-intuition-in-programming/ which explores the question "Does making things harder help us think more deeply about them?"

Overcoming Intuition in Programming

In a series of experiments, researchers set out to discover the relationship between difficulty or “disfluency” and cognition. They presented the same test to two groups, one in an easy to read (intuitive) format and the other in a difficult (disfluent) format. And in all the experiments they carried out, the disfluency group scored substantially higher. The theory behind this is that people will default to relying on the automatic, effortless, and primitive system for reasoning. But if things are counter-intuitive or harder to understand we switch to the deeper, deliberate and analytical mode of thinking.

I’ve been thinking about how this translates to programming. Programming is an intellectually challenging task, but luckily we invent tools to make it manageable. I find that up to a certain point, the intuitive and easy properties of a given language, framework, or library might start to have negative effects. From personal experience and from mentoring beginners I noticed that when using tools that allow us to reason within our intuition, anytime we’re faced with some difficulty we feel that we’ve done something wrong. And although we might have the necessary skills to overcome the difficulty, we often start questioning and revising our work. Asking questions about best practices relative to the framework instead of programming our way out. The quintessential example of this is the Stack overflow questions for “how do I use jQuery to do X?” or the answers “use jQuery [plugin] to do X” where X could be anything from basic arithmetic to websockets.

The framework negative space

When using a framework, a certain class of problems are made easy to solve. Programming feels intuitive if we stay within that space created by the framework. We may refer to this as the framework intuitive space. On the other hand we may refer to the rest of the space that framework doesn’t solve or have an opinion on as the framework negative space. The negative space is not necessarily a defect of the framework, it’s just not in the space the framework was built to solve. However, having put the programmer in the intuitive space for a long stretch of time, it makes it feel out of place when finding oneself in the negative space.

When the beginner programmer find themselves in the negative space, they often look to the library authors to put them back in the intuitive space. That’s why for any popular framework you find that there is an entire ecosystem of plugins and addons that extends the framework’s intuitive space to cover an increasingly growing surface area. It doesn’t seem to be inherently wrong if it makes programmers more productive. However, it may have unintended negative consequences:

1. Increased reliance from the programmer on the ecosystem’s library authors 2. Offloading of architectural decisions to the libraries all the while incurring technical debt 3. Enabling the false belief that programming should always feel intuitive

The developer and library author codependency

I should start by saying that this is technically a false dichotomy. All programmers take on both those roles in any programming session. You maybe coding the product business logic and switch to building a general purpose abstraction to help you in multiple places in your codebase. However, I’ve noticed that in open-source, people tend to act in a manner that makes this dichotomy seem true. The easiest way I’ve found to succeed in open source is to pave the negative framework space to become an intuitive space. In other words, writing the plugins and extensions. As a framework becomes more popular, a growing number of developers (usually beginners) will start complaining about how it’s hard to do X in this framework (and as we’ve seen X might be totally unrelated). Now, as in the business world, open-source is extremely competitive and as soon as there is an opening to solve a perceived problem for a lot of people, many would rise up to the occasion. This becomes an enabler to the false belief that a programmer can spend all of their time programming in the intuitive space.

Conclusion

I think fixing this problem ultimately comes down to education. Very early on when someone is learning programming our culture tends to emphasize an obsession with tooling. I get a lot of questions from aspiring programmers on what’s the best tool or languages to learn. It’s almost always a premature question to ask. I used to come up with answers like “depending on what you’re building” or “pick a beginner friendly community” or “invest in a growing language”. I think all of these are good answers, but it doesn’t really matter that early on in a programmer’s learning journey. It’s all the same when you’re essentially learning how to compute. Furthermore, these sort of answers enable the culture of tooling obsession.

Code reuse, libraries, sharing, and open-source are very important to software engineering, but we should be careful to not enable the belief that programming should be as easy as gluing things together. In fact, these days I’m often skeptical when things feel a little bit too easy. If programming was as easy as this then it would’ve already been automated away.

Seven Deadly Sins of Introductory Programming Language Design

We look at this paper (PDF). These sins and some suggested remedies are summarized here but the paper is worth reading in its entirety.

However we might also bear in mind the aphorism attributed to Roger Hui which asserts that "Novice mistakes are irrelevant to language design." This takes into account a longer view of programming language usage than the initial introduction to it, though the latter is the focus of this paper.

Sin #1: Less is more

The gripe here is that having very few datatypes is detrimental to the learning process because, for the examples of LISP and Scheme, it results in code that is hard to read because of the plethora of parentheses and lack of other structuring punctuation; it also seems to require a lot of inbuilt functions. This is difficult for students because they are not used to working under a single, pure paradigm.

Sin #2: More is more

The objection here are the problems when one goes to the other extreme by providing too many ways of doing things. A prime example is C++ which has many language capabilities as well as many libraries, too many for a beginner to easily grasp.

Sin #3: Grammatical traps

This sin at first seems very closely related to the preceding one but the problem is that having many different ways to do the same thing muddies the underlying concept for a beginner. The example given is variety of ways to access the second element of an array in C "...some of which are legal only in certain contexts", e.g.

array[1]       *(array+1)       1[array]       *++array

The common mistake of using = when == is meant in C may also be an example of the problem of syntactic homonyms also mentioned in this section.

Sin #4: Hardware dependence

This refers to things like the ambiguity of the underlying representation of types like int in different languages; perhaps the ongoing difficulty people have with floating point imprecision falls under this sin as well. The essay mentions the presence of 32 distinct numerical data types in C/C++ as one example.

Sin #5: Backwards compatibility

The problem seen here is broken into two types of backward compatibility: genetic and mimetic. The former is exemplified by the relationship between similar languages like C and C++ or Scheme and LISP. In this case it is thought that retaining a similarity of "look-and-feel" between a language and its ancestor leads to compounding the "more-is-more" problem.

Sin #6: Excessive Cleverness

The discussion here mentions the trouble students often have distinguishing between declaration and usage of variables in C and C++. It also touches on languages which use indentation to specify scope.

Sin #7: Violation of Expectations

This is seen as the worst of the pedagogical sins for an introductory programming language. The examples here are a bit idiosyncratic to the different languages used as bad examples but the initial one is this C example:

   if (x=0 || -10<y<10)  { /* WHATEVER */ }

This always evaluates to true and silently resets the value of x to one.

The Concealed Pornography of Naked Braces

We looked at this letter to the editor from Communications of the ACM, October, 2015.

LETTERS TO THE EDITOR
Ban 'Naked' Braces!
By CACM Staff / Communications of the ACM, Vol. 58 No. 10, Pages 10-11 / 10.1145/2816943

One fine business afternoon early in 1990, when we still used wires and microwave towers to make phone calls, and almost all long-distance calls went through big AT&T switches, one of the 100 or so 4ESS switches that handled U.S. long-distance traffic at the time hit a glitch and executed some untested recovery code. The switch went down briefly. No biggie, since traffic automatically took other routes, but in the process the initial switch that hit the glitch dragged its neighboring switches down, and the process cascaded across the country, as all the switches that handled long-distance traffic began to repeatedly crash and auto-recover. The result was that hardly any public telephone customer in the U.S. could make a long-distance phone call that afternoon, along with millions of dollars of time-sensitive business lost.

AT&T tried to contain the damage by rebooting the misbehaving switches, but as soon as a switch was brought back up, a neighboring switch would tell it to go down. The engineers at AT&T's R&D arm, Bell Labs, who wrote the switch programs, were called in, and, by the end of the day, network normality was restored by reducing the network message load.

An investigation was launched immediately, and after digging through a few hundred lines of code, word-of-mouth within Bell Labs was that the culprit was a closing brace (}) that terminated a selection construct—but the wrong one. The lawyers at Bell Labs quickly claimed such a lapse of human frailty could never be avoided entirely, and so dodged any potential lawsuits.

The lawyers were right; the intrinsic nature of software is such that the total absence of bugs is never guaranteed. But the simple practice of tagging all closing braces (or end in some languages) with a brief comment that indicates which construct they are closing would go far toward eliminating such an error; for example, instead of just writing '}' all by its naked self, write } //for, or } //if, or whatever.

Tagging construct terminators can be done without changing existing compilers, and since such construct terminators usually appear on a line of code by themselves, the structure of the code is not affected. All this does is make the code easier to understand and helps prevent bugs like the one just described. This practice is especially helpful when code must be moved about, which happens often. In addition, if coders want to go one step further in making their code understandable, a brief comment can be added after the tag, like this

}//for all transactions over a thousand dollars

This would also eliminate the usefulness of putting the opening brace on a line by itself where it would be separated, from a syntactic viewpoint, from the construct it is punctuating, while creating an almost blank line that could better serve to separate logically distinct parts of a program. I thus propose adoption of this practice by all software engineers and coders forthwith, as well as taught to all beginners from the get-go.

A. Frank Ackerman, Butte, MT
[From http://cacm.acm.org/magazines/2016/1/195719-let-the-liable-pay/fulltext]

Hold the Braces and Simplify Your Code

A. Frank Ackerman's letter to the editor "Ban 'Naked' Braces!" (Oct. 2015) actually frightened me. The problem it described—code blocks so long and complex it is difficult for even expert programmers to determine which brace closes them—is certainly an issue, but the proposed solution would not leave the code any less complex.

When dealing with legacy code, maintainers can add annotations to a long method's closing brace, as described, to help them navigate, but this should never be necessary in new code. Modern code-writing techniques encourage small and only minimally nested blocks. See Robert C. Martin'sClean Code chapter 3 (Prentice Hall, 2009) for a good explanation and justification. According to Martin, and others, a function or method spanning more than a few lines is too much for an average programmer to absorb and understand quickly, especially in the context of a larger system, and should be refactored for simplicity. Blocks nested more than two levels will not fit in a short, readable function and should thus be refactored to decrease depth. Programmers who follow these practices cannot possibly lose track of closing braces and will not then need to mark them up.

I make a living reading and trying to understand unnecessarily complex code, and follow Martin's philosophy. If one writes code complex enough to require annotated closing braces, annotating those braces will not make the code less complex.

If you are a programmer, please write small, simple blocks of code and save yourself having to worry about naked braces. And if your habits run too deep, please do not encourage them in others, especially beginners.

Jamie Hale, Markham, Ontario, Canada
A. Frank Ackerman's letter to the editor (Oct. 2015) recommended tagging each construct terminator with a comment indicating which construct it terminates. This wise practice was carried further in the design technique described in the book Principles of Program Design (Academic Press, 1975) and the paper "Constructive Methods of Program Design" (LNCS 44, Springer, 1976). Each sequence, selection, and iteration control construct is named. The name is prefixed to start and end markers and, in a selection, to each alternative. Ackerman's coding example might thus be expanded to the following skeleton:

account seq
   ...
   activity itr (...)
     transaction sel (> $1000)
        ...
     transaction alt (<= $1000)
        ...
     transaction end
   activity end
account end

Such a discipline not only avoids simple errors. As Ackerman hinted, it also improves the program by tying its code closely to its design.

Michael Jackson, Milton Keynes, England

First Draft of a Response

The "naked braces" discussion perhaps misses seeing the forest for the trees: a major reason for deeply nested expressions is the inability of most programming languages to handle arrays without looping. This shortcoming further compounds itself by contributing to the vacuous verbosity of the boilerplate required for such looping (and conditional) constructs. Jamie Hale's proposed solution - "small and minimally nested blocks" - to the original issue raised by A. Frank Ackerman's letter points in a good direction but may remain lost in the forest of intrinsically scalar languages. Small blocks of code are good, but overly-verbose languages will end up with very many of these small blocks, pushing the complexity problem up to a higher level but not reducing it.

A more functional, array-based way of looking at problems does reduce the patent complexity by treating collections of objects en-masse at a higher level. Given the very small mind-share occupied by array-oriented languages, it's difficult to provide even a pseudo-code example of what I mean by this, but here's an attempt based on the problem of invoking different code based on transaction size breakpoints (where "transactions" below is an array of transaction sizes):

Perform ( smallT(), largeT() ) BasedOn transactions > 1000

Ignoring the length discrepancy between the number of functions provided and the ostensible shape of the Boolean condition on which their selection is based, such a construct could easily be extended to additional breakpoints something like this:

Perform ( smallT(), largeT(), veryLargeT() ) BasedOn transactions > ( 1e3, 1e5 )

For anyone interested in wrestling with a specific example of an array-based, functional notation guiding my thoughts on this example, here is one: http://code.jsoftware.com/wiki/Vocabulary/atdot.

Materials