Beginner's Regatta

Converting Numbers To and From Character Representation

We explore a basic function to convert between numeric values and their character representation in J and vice versa.

```J2StdCvtNums=: 3 : 0
NB.* J2StdCvtNums: convert char rep of num from J to "Standard", or
NB. vice-versa if left arg is 'J' or '2J'; optional 2x2 left argument allows
NB. 2 arbitrary conversions: col 0 is "from", col 1 is "to" char.
NB. Monadic case changes Standard numeric representation to J representation.
(2 2\$'-_Ee') J2StdCvtNums y
:
if. 'S'=x do. if. ' '={.,y do. y return. end. end.  NB. Already done?
if. 0=#y do. '' return. end.
pw16=. 0j16                          NB. Precision width: 16 digits>.
diffChars=. 2 2\$'-_Ee'               NB. Convert '-'->'_' & 'E'->'e'
toStd=. -.'J'-:''\$'2'-.~,x           NB. Flag conversion J->Standard
if. 2 2-:\$x do. diffChars=x          NB. if explicit conversion.
elseif. toStd do. diffChars=. |."1 diffChars end.   NB. Convert other way
if. 0=1{.0\$y do.                     NB. Numeric to character
fmts=. (8=>(3!:0)&.>y){0,pw16    NB. Full-precision floats only
y=. fmts":y                      NB. If this is too slow, go back
end.

y=. y-.'+'                           NB. EG 1.23e+11 is ill-formed & the
wh=. y=0{0{diffChars                 NB.  '+' is unnecessary.
cn=. (wh#1{0{diffChars) (wh#i. \$y)}y NB. Translate chars that need it
wh=. y=0{1{diffChars                 NB.  but leave others alone.
cn=. (wh#1{1{diffChars) (wh#i. \$cn)}cn
if. -.toStd do.                      NB. Special handling -> J nums
if. '%'e. cn do.                 NB. Convert nn% -> 0.nn
cn=. pw16":0.01*".cn-. '%'
end.
cn=. cn-.','                     NB. No ',' in J numbers
end.
cn
NB.EG 'S' J2StdCvtNums _3.14 6.02e_23   NB. Convert J numbers to std rep
)
```

With a little bit of testing this looks OK but with more examples we see some issues.

```   'S' J2StdCvtNums _3.14 6.02e_23      NB. Convert J numbers to std rep
-3.1400000000000001 0.0000000000000000  NB. Didn't stop - why?
```

It looks like we need to insert some breakpoints and look at intermediate values to track down what's causing this behavior.

```   13!:3 'J2StdCvtNums : 0 9 17 21 22 24 26 28'  NB. Dyadic stops
13!:0]1                                       NB. Debug on
'S' J2StdCvtNums _3.14 6.02e_23   NB. Convert J numbers to std rep
|stop: J2StdCvtNums
|   'S'=x
|J2StdCvtNums[:0]
y
_3.14 6.02e_23
x
S
```

This looks OK so far, so proceed to the next stop.

```      13!:4''
|stop
|   ' '={.,y
|J2StdCvtNums[:0]                     NB. Same line, multiple statements
13!:4''
|stop
|   fmts=.(8=>(3!:0)&.>y){0,pw16
|J2StdCvtNums[:9]
y
_3.14 6.02e_23
```

Still looking good, proceed.

```      13!:4''
|stop
|   y=.y-.'+'
|J2StdCvtNums[:17]
y
_3.1400000000000001 0.0000000000000000
```

This is where we see the start of the problem. If we proceed from here, we see that these extra trailing zeros don't get fixed.

```      13!:4''
|stop
|   cn=.(wh#1{1{diffChars)(wh#i.\$cn)}cn
|J2StdCvtNums[:21]
cn
-3.1400000000000001 0.0000000000000000

13!:4''
|stop
|   cn
|J2StdCvtNums[:28]
cn
-3.1400000000000001 0.0000000000000000
13!:4''
-3.1400000000000001 0.0000000000000000
```

Let's define a couple of functions: one to remove trailing zeros after the decimal point and another to remove any trailing decimal point.

```   rmTrailing0sAfterPoint=: ] #~ [: -. ([: *./\&.|. '0' = ]) *. [: +./\ '.' = ]
rmTrailingPoint=: ] }.~ [: - '.' = {:
13!:0]0                           NB. Turn off debugging
'S' J2StdCvtNums _3.14 6.02e_23   NB. Convert J numbers to std rep
-3.1400000000000001 0.0000000000000000
```

Now try our new utilities on this result.

```   rmTrailing0sAfterPoint 'S' J2StdCvtNums _3.14 6.02e_23
-3.1400000000000001 0.
rmTrailingPoint rmTrailing0sAfterPoint 'S' J2StdCvtNums _3.14 6.02e_23
-3.1400000000000001 0
```

It's still not clear exactly what to do with the trailing almost-zeros "00000000000001". We could drop a final digit if there are more than a certain number of digits past the decimal but there is a corresponding problem of numbers ending in strings like "9999999999999" for which this does not work.

Show and Tell

Words with Letters Typed by Alternating Hands

This exercise was inspired by the idea that there are some words potentially faster to type because adjacent letters are typed with the fingers on opposite hands. For instance, the word sigh is typed left (s), right (i), left (g), right (h). These would seem like good words to form the basis of something like a password, making it easier to type more quickly with fewer errors.

Getting a List of Words

To do this, first we need a large list of words to evaluate. [Since the following URL is no longer valid, you could look here for a good, long list of English words.]

The following site has/used to have lists arbitrarily broken into separate files, so let’s get all of them.

```   urlTemplate=. 'http://www.manythings.org/vocabulary/lists/l/words.php?f=noll{n}'

+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|01|02|03|04|05|06|07|08|09|10|11|12|13|14|15|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
```

There are 15 separate files. We can build a list of wget commands to retrieve the URLs we generate from this template appended with the 15 suffixes.

```   cmds=. (<'wget -O '),&.>(nn,~&.><'TopAmericanEnglish'),&.>(<'.htm '),&.>(<urlTemplate) rplc&.><"1 (<'{n}'),.nn
```

Check two of the commands to see if they look right:

```   >2{.cmds
wget -O TopAmericanEnglish01.htm http://www.manythings.org/vocabulary/lists/l/words.php?f=noll01
wget -O TopAmericanEnglish02.htm http://www.manythings.org/vocabulary/lists/l/words.php?f=noll02
```

Now run them and check how long it takes to retrieve them all.

```   6!:2 'shell&.>cmds'
5.25957
```

Check that we have files we're expecting in our local directory.

```   \$dir 'Top*.htm'
15 5
0{dir 'Top*.htm'
+------------------------+-----------------+----+---+------+
|TopAmericanEnglish01.htm|2018 3 9 10 15 51|8416|rw-|-----a|
+------------------------+-----------------+----+---+------+
```

Now extract the words we want from among the HTML tags by keying on strings preceding and following the payload:

```   locStr=. '<div class="wrapco"><div class="co"><ul>'
endStr=. '</ul></div><br />'
flnms=. 0{"1 dir 'Top*.htm'
```

First get our extraction working with a single file:

```   +/locStr E. fread >0{flnms
1
1
I. ((locStr E. ])+.endStr E. ])fread >0{flnms
4974 7518
```

Now that we have start and end locations

```   ss=. I. ((locStr E. ])+.endStr E. ])fread >0{flnms
-~/ss+(#locStr),0
2504
len=. -~/ss+(#locStr),0
2504
50{.words
```

The words we want are wrapped in list tags, so remove those.

```   +/'</li><li>' E. '</li>',words,'<li>'
185
words=. '</li>',words,'<li>'
ptnStr=. '</li><li>'
(ptnStr E. words)<;.1 words
+--------------+--------------+--------------+------------+------------+---...
+--------------+--------------+--------------+------------+------------+---...
_3{.(ptnStr E. words)<;.1 words
+-------------+------------+---------+
|</li><li>your|</li><li>was|</li><li>|
+-------------+------------+---------+
(#ptnStr)}.&.>}:(ptnStr E. words)<;.1 words
+-----+-----+-----+---+---+-----+----+--+---+-------+---+---+------+--+--+-...
+-----+-----+-----+---+---+-----+----+--+---+-------+---+---+------+--+--+-...
\$words rplc '</li></ul></div><div class="co"><ul><li>';ptnStr
2420
words=. words rplc '</li></ul></div><div class="co"><ul><li>';ptnStr
(#ptnStr)}.&.>}:(ptnStr E. words)<;.1 words
+-----+-----+-----+---+---+-----+----+--+---+-------+---+---+------+--+--+-...
+-----+-----+-----+---+---+-----+----+--+---+-------+---+---+------+--+--+-...
#(#ptnStr)}.&.>}:(ptnStr E. words)<;.1 words
187
```

Now that we have some working code, we can combine the good lines from above to write our function:

```extractWords=: 3 : 0
locStr=. '<div class="wrapco"><div class="co"><ul>'
endStr=. '</ul></div><br />'
flnms=. 0{"1 dir 'Top*.htm'
ss=. I. ((locStr E. ])+.endStr E. ]) fl
len=. -~/ss+(#locStr),0
words=. (len{.({.ss+#locStr)}.]) fl
words=. '</li>',words,'<li>'
ptnStr=. '</li><li>'
words=. words rplc '</li></ul></div><div class="co"><ul><li>';ptnStr
(#ptnStr)}.&.>}:(ptnStr E. words)<;.1 words
)
\$allww=. ;extractWords &.> flnms
2123
```

Let's take a look at some of the words.

```   10{.allww
+-----+-----+-----+---+---+-----+----+--+---+-------+
+-----+-----+-----+---+---+-----+----+--+---+-------+
_10{.allww
+---+----+-------+----+----------+-----+-------+------+----+---+
|toy|trap|treated|tune|University|vapor|vessels|wealth|wolf|zoo|
+---+----+-------+----+----------+-----+-------+------+----+---+
```

Take some statistical measure of the word lengths.

```   load 'mystats'
allww=. tolower&.>allww
allww=. ,&.>allww          NB. Don't want single characters to be scalars.
usus szs=. #&>allww
1 14 5.5822 1.87302
I. szs = 14
1735 1769
```

The longest is 14 letters and there are two of these. What are they?

```   allww{~I. szs = 14
+--------------+--------------+
|transportation|characteristic|
+--------------+--------------+
```

Categorize Letters by the Hand with Which Each is Typed

Write and test our indicator that a word is composed of letters typed with alternating hands.

```   13 : '*./0 1*./ . =&>({.&> /:~ ]) @: (] (# ; ] #~ [: -. [)~ 1 0 \$~ #) y'
[: *./ 0 1 *./ .=&> ({.&> /:~ ])@:(] (# ; ] #~ [: -. [)~ 1 0 \$~ #)
isAlternating=: [: *./ 0 1 *./ .=&> ({.&> /:~ ])@:(] (# ; ] #~ [: -. [)~ 1 0 \$~ #)
isAlternating&.>(1 0 1 0);(0 1 0 1);(1 0 1);(0 1 0);(0 1 1 0);(1 0 0 1) NB. First 4 are good, last 2 are bad.
+-+-+-+-+-+-+
|1|1|1|1|0|0|
+-+-+-+-+-+-+
isAlternating&>(1 0 1 0);(0 1 0 1);(1 0 1);(0 1 0);(0 1 1 0);(1 0 0 1) NB. First 4 are good, last 2 are bad.
1 1 1 1 0 0
```

So, isAlternating seems to indicate when we have only alternate ones and zeros in a vector, so let's build the two lists of letters for each hand.

```   LRsets=. 'qwertasdfgzxcvb';'yuiophjklnm'
+/whAlt=. isAlternating&>(list=. <;._2 CR-.~fread 'wordsEn.txt') e.&.>{.LRsets
1530
new1s=. whAlt#list
]whAlt=. isAlternating&>(list=. <;._2 CR-.~fread 'corncob_lowercase.txt') e.&.>{.LRsets
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
+/whAlt
909
new1s=. new1s,whAlt#list
\$new1s=. /:~~.new1s
1578
new1s#~2=#&>new1s           NB. Look at two-letter words: worth keeping?
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+-...
|ah|ai|al|am|an|ay|by|cl|co|do|dp|eh|el|em|en|go|ha|he|hr|ic|id|ie|if|is|i...
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+-...
#new1s=. new1s#~2<#&>new1s  NB. No.
1506
usus szs=. #&>lra
2 13 5.1105 1.7776
lra#~szs=13
+-------------+-------------+
|dismantlement|ichthyosiform|
+-------------+-------------+
lra#~szs=10
+----------+----------+----------+----------+----------+----------+-------...
|antisocial|auditorial|cochairman|cochairmen|dickensian|enamelwork|ichthyi...
+----------+----------+----------+----------+----------+----------+-------...
usus szs=. #&>new1s
3 13 5.249 1.76259
new1s#~szs=13
+-------------+-------------+
|dismantlement|neurotoxicity|
+-------------+-------------+
new1s#~szs=10
+----------+----------+----------+----------+----------+----------+-------...
|antisocial|auditorial|cochairman|cochairmen|dickensian|enamelwork|malaysi...
+----------+----------+----------+----------+----------+----------+-------...
LRsets=. 'qwertasdfgzxcvb~`12345!@#\$%';'yuiophjklnm67890-_=+^&*();:''",./<>?'
```

We have extended the two sets to include numerals and punctuation. Our vector LRsets is two elements corresponding to whether the set of characters corresponds to the left or right hand, respectively.

```   ]whAlt=. isAlternating&>list e.&.>{.LRsets
0 0 0 1 0 0 0 0 1 0 1 1
+/whAlt
4
whAlt#list
+------+-----+----+------+
|biform|boric|both|bushel|
+------+-----+----+------+
```

Take a look at the larger list new1s from above.

```   #new1s
1506
3{.new1s
+---+----+-----+
|aha|ahem|ahems|
+---+----+-----+
_3{.new1s
+-----+------+-------+
|zowie|zurich|zygotic|
+-----+------+-------+
```

Sort primarily by length.

```   byLen=. new1s /: #&>new1s
5{.byLen
+---+---+---+---+---+
|aha|ahs|aid|air|ala|
+---+---+---+---+---+
_5{.byLen
+-----------+------------+------------+-------------+-------------+
|proficiency|authenticity|proamendment|dismantlement|neurotoxicity|
+-----------+------------+------------+-------------+-------------+
hdr=. 'Words whose letters are entered by alternating use of the left and right hands on a QWERTY keyboard. All have more than two letters and are sorted alphbetically within ascending word length.'
(hdr,LF,;LF,~&.>byLen) fwrite '../txt/LRalternatingHandsWordsBySize.txt'
9602
```

An Enhancement

Since this restriction of alternating hands is a very stringent filter, giving us fewer than 2,000 words, we subsequently loosened the requirement to allow words with duplicate adjacent letters, giving us a list of more than 3,000 words. The complete code is attached in the Materials section below.

Calculating and Showing Maximum Drawdown

Maximum drawdown is a concept in money management that quantifies the history of an investment by looking at the worst loss one would have sustained over an historical period.

As defined [from http://www.investopedia.com/terms/m/maximum-drawdown-mdd.asp here]:

DEFINITION of 'Maximum Drawdown (MDD)'
The maximum loss from a peak to a trough of a portfolio, before a new peak is attained. Maximum Drawdown (MDD) is an indicator of downside risk over a specified time period. It can be used both as a stand-alone measure or as an input into other metrics such as "Return over Maximum Drawdown" and Calmar Ratio. Maximum Drawdown is expressed in percentage terms and computed as:

(Trough Value – Peak Value) ÷ Peak Value

BREAKING DOWN 'Maximum Drawdown (MDD)'

Consider an example to understand the concept of maximum drawdown.

Assume an investment portfolio has an initial value of \$500,000. The portfolio increases to \$750,000 over a period of time, before plunging to \$400,000 in a ferocious bear market. It then rebounds to \$600,000, before dropping again to \$350,000. Subsequently, it more than doubles to \$800,000. What is the maximum drawdown?

The maximum drawdown in this case is = (\$350,000 – 750,000) / \$750,000 = –53.33%

Note the following points: • The initial peak of \$750,000 is used in the MDD calculation. The interim peak of \$600,000 is not used, since it does not represent a new high. The new peak of \$800,000 is also not used since the original drawdown began from the \$750,000 peak. • The MDD calculation takes into consideration the lowest portfolio value (\$350,000 in this case) before a new peak is made, and not just the first drop to \$400,000.

MDD should be used in the right perspective to derive the maximum benefit from it. In this regard, particular attention should be paid to the time period being considered. For instance, a hypothetical long-only U.S. fund Gamma has been in existence since 2000, and had a maximum drawdown of -30% in the period ending 2010. While this may seem like a huge loss, note that the S&P 500 had plunged more than 55% from its peak in October 2007 to its trough in March 2009. While other metrics would need to be considered to assess Gamma fund's overall performance, from the viewpoint of MDD, it has outperformed its benchmark by a huge margin.

Calculating Maximum Drawdown in J

How might we calculate this in J in an array-oriented fashion? Let’s start in the middle: assume we have the peak and trough values:

```   ((>./-<./)%>./) 2108.63 1867.61 NB. Max drawdown
0.114302
```

That’s the easy part. The harder part is finding the correct peak and trough for a given set of numbers.

Taking the numbers from the example above:

```   vals=. 10000* 50 75 40 60 35 80
(<./,>./) vals
350000 800000
```

We know these aren’t the right numbers. We need to find the peak at each point in the series. This will get us close:

```   >./\vals
500000 750000 750000 750000 750000 800000
```

Now we can find the location of each new peak:

```   ]whNewPeak=. 2</\>./\vals
1 0 0 0 1
#whNewPeak
5
#vals
6
```

There’s a problem: due to the nature of comparing pairs of items, our result will always be one shorter than our starting vector but we’d like them to line up so we have to make a decision. Fortunately, the choice seems clear: we’ll assume that the first value counts as a peak under the reasoning that it is the highest value “so far”. So,

```   #whNewPeak=. 1,whNewPeak
6
```

It seems natural, with an eye to the next step of finding the minimum in each intra-peak section, to use whNewPeak as a partition vector:

```   whNewPeak <;.1 vals
+------+---------------------------+------+
|500000|750000 400000 600000 350000|800000|
+------+---------------------------+------+
```

This allows us to find the minimum of each interval along with its corresponding peak:

```   <./&>whNewPeak<;.1 vals
500000 350000 800000
whNewPeak#vals
500000 750000 800000
```

So, finding the maximum drawdown should be relatively straightforward: we simply find the maximum difference:

```   >./(whNewPeak#vals) - <./&>whNewPeak<;.1 vals
400000
```

The Code

However, there is a complication: we have to associate this number with its corresponding peak. Also, upon reflection, it would be useful to keep track of the date associated with each of these values so we can show our work.

```maxDrawdown=: 3 : 0
whNewPeak=. 1,2</\>./\y
md=. (whNewPeak#y) - <./&>whNewPeak<;.1 ] y
nfp=. 0={:whNewPeak            NB. Not final peak (no peak at end)
md0=. (-nfp)}.md%whNewPeak#y   NB. Drop last if didn't end on new peak
wsp=. md0 i. >./md0            NB. Where's starting peak of max drawdown?
spix=. (wsp+0 1){I. whNewPeak  NB. Start, end index of max drawdown
span=. y{~(<./ + [: i. [: >: [: | -/) spix
wmin=. (<./spix)+span i. <./span NB. Where minimum was in span
md=. (>./md0);spix;wmin
NB.EG 'md whsp whmin'=. maxDrawdown 500 750 400 600 350 800
NB.EG 'md whsp whmin'=. maxDrawdown vals=. 100 150 90 120 80 200
)
```

Checking our work:

```   vals{~/:~whsp, whmin
750000 350000 800000
]'md whsp whmin'=. maxDrawdown 500 750 400 600 350 800
+--------+---+-+
|0.533333|1 5|4|
+--------+---+-+
vals{~/:~whsp, whmin
750000 350000 800000
```

For a fuller example of usage, let's pull in about a year's worth of price data for the S&P 500 index.

```'tit1 sp500ix'=: split <;._1&>TAB,&.><;._2 ] LF (] , [ #~ [ ~: [: {: ]) CR-.~0 : 0
Date	S&P 500 Index - Index Levels - Index Value - USD
01/06/2015	2002.613587
01/07/2015	2025.90105
01/08/2015	2062.143554
01/09/2015	2044.8099
…
01/04/2016	2012.659371
01/05/2016	2016.714426
01/06/2016	1990.262292
)

tit1
+----+------------------------------------------------+
|Date|S&P 500 Index - Index Levels - Index Value - USD|
+----+------------------------------------------------+
'dts vals'=. <"1 |:sp500ix
vals=. ".&>vals
]'md whsp whmin'=. maxDrawdown vals
+---------+-----+--+
|0.0364402|37 75|44|
+---------+-----+--+
sp500ix{~/:whsp,whmin
+----------+-----------+
|01/06/2015|2002.613587|
+----------+-----------+
|01/08/2015|2062.143554|
+----------+-----------+
|01/07/2015|2025.90105 |
+----------+-----------+
```

Graphing Drawdown

This is a concept that lends itself well to graphical display.

First, let’s get even more real-life data.

```   load 'dsv'                              NB. Delimiter-Separated Values
3411 7
3{.googPxs
+----------+---------+---------+---------+---------+---------+--------+
|Date      |Open     |High     |Low      |Close    |Adj Close|Volume  |
+----------+---------+---------+---------+---------+---------+--------+
|2004-08-19|49.676899|51.693783|47.669952|49.845802|49.845802|44994500|
+----------+---------+---------+---------+---------+---------+--------+
|2004-08-20|50.178635|54.187561|49.925285|53.805050|53.805050|23005800|
+----------+---------+---------+---------+---------+---------+--------+
'tit gpxs'=. split googPxs
tit
+----+----+----+---+-----+---------+------+
+----+----+----+---+-----+---------+------+
clsPxs=. ".&>gpxs{"1~tit i. <'Close'
\$clsPxs
3410
clsPxs
49.8458 53.8051 54.3465 52.0962 52.6575 53.6063 52.732 50.6754 50.8542 49.8011 50.427 49.6819 50.4618 50.8195 50.8244 52.3247 53.4027 55.3848 55.6381 56.6168 58.3654 59.2943 58.5393 58.8075 60.0196 59.5278 58.7479 63.0201 65.1165 64.3813 65.8616 67.0936 68...
```

Now that we have these thousands of closing prices, let's apply the drawdown calculation and graph it in a useful fashion.

```   load 'plot'
clsPxs=. ".&>gpxs{"1~tit i. <'Close'
'md whsp whmin'=. maxDrawdown clsPxs

md;whsp;whmin
+--------+--------+----+
|0.652948|810 2040|1075|
+--------+--------+----+

tit,gpxs{~/:~whmin,whsp
+----------+----------+----------+----------+----------+----------+--------+
|Date      |Open      |High      |Low       |Close     |Adj Close |Volume  |
+----------+----------+----------+----------+----------+----------+--------+
|2007-11-06|366.396942|368.498260|360.157532|368.498260|368.498260|16982200|
+----------+----------+----------+----------+----------+----------+--------+
|2008-11-24|133.760025|134.102783|123.700447|127.888214|127.888214|20240100|
+----------+----------+----------+----------+----------+----------+--------+
|2012-09-24|363.138123|372.596619|362.765564|372.268738|372.268738|7173800 |
+----------+----------+----------+----------+----------+----------+--------+
127.888214%368.498260     NB. Lowest in drawdown as portion of starting point.
0.347052
-.127.888214%368.498260   NB. Drawdown is how far down this took us at that point.
0.652948
'pensize 2' plot clsPxs,:>./\clsPxs
```

After drawing and writing on the result of this plot, we get the following.