NYCJUG/2012-03-14/QuickAndDirtyAnalysiswJ

From J Wiki
Jump to navigation Jump to search

[This example formed the basis of a discussion at NYCJUG about teaching J by presenting examples of simple, practical things we can do with the language.]

In this example of using J to do simple analysis of financial data, we start with a newspaper article and get data about the subject of the article (the VIX - volatility index) in a form amenable to analysis and check some of the article's assertions.

Quick and Dirty Data Analysis with J

WSJ Article

[from http://blogs.wsj.com/marketbeat/2012/03/13/vix-flirts-with-nearly-five-year-low/?mod=WSJ_markets_liveupdate]

March 13, 2012, 10:36 AM

VIX Flirts With Nearly Five-Year Low

By Steven Russolillo

Yesterday we pointed out the CBOE’s volatility index, VIX, had slumped to its lowest close since April. Today, its free-fall continues and is now flirting with its lowest level in nearly five years. VIX ..was recently down 6.2% at 14.67 and earlier dropped as low as 13.99. Any move below 14.62 would mark the lowest level since June 2007. The VIX moved back above 20 — roughly its long-term average — as recently as early last week when stocks notched their biggest one-day loss of the year. But the pickup didn’t last long; VIX is down 30% since last Tuesday.

...


Discussion

Finding data on the VIX and related, tradable instruments, we see, on the left, the symbol for the index as well as numerous ETFs by which we can achieve exposure to it. Since all these ETFs are related to the index, let’s look at its data first.

FindVIXandRelatedItemsOnYahooFinance.png VIXandRelatedItemsInfo.png

Clicking on “^VIX” shown above, then on the “Historical Prices” selection on the left pane (below) brings us to a screen like the one below.

LookingAtVixData.png

The examples on the following page show how we might use J to look at the data in the table here after we’ve downloaded it to a .csv file from the Yahoo Finance site. First, we’ll read it from the file and assign the columns in which we’re interested to some variable names. Then we’ll examine a few items of interest.

Starting Analysis in J

   load 'tables/csv'              NB. Utilities for reading in delimited files…
   'vxt vxp'=. split readcsv 'pxVIX19900102-20120312.csv'
   $vxp
5592 7
   vxt                            NB. Titles label columns
+----+----+----+---+-----+------+---------+
|Date|Open|High|Low|Close|Volume|Adj Close|
+----+----+----+---+-----+------+---------+
   vxt=. -.&' ' &.>  vxt          NB. drop spaces from Titles
   (vxt)=. <"1 |: vxp             NB. each column assigned to column label as variable name

   datatype &.>  AdjClose         NB. Check that closing price is character
+-------+-------+-------+-------+-------+-------+-------...
|literal|literal|literal|literal|literal|literal|literal...
+-------+-------+-------+-------+-------+-------+-------...

   $AdjClose=. _ ". > AdjClose    NB. Turn into simple numeric vector.
5592

We can verify that these first few prices match those in the "Adj Close" column of the table shown above.

   10{.AdjClose                      NB. Look at some values.
15.64 17.11 18.02 19.07 20.84 18.05 17.29 17.26 18.43 17.95

Using "grade-up" (/:) to give us the index into our price vector of the lowest (to highest) values so we can see the lowest this price has ever been and when that was.

   /:AdjClose                        NB. Indexes of lowest prices
4586 4585 4584 4583 1293 1334 4560 1318 1335 1276 1317 1336 1286 4587...

These indexes seem to fall roughly into three groups: about 4580, 1290, and 1330. Take a sample from each of these three and see to which dates they correspond.

   0 4 5 { /:AdjClose                NB. Pick a few from different groups.
4586 1293 1334
   Date {~ 0 4 5 { /:AdjClose        NB. See dates for these low points.
+----------+----------+----------+
|1993-12-22|2007-01-24|2006-11-21|
+----------+----------+----------+
   AdjClose {~ 0 4 5 { /:AdjClose    NB. Prices on those dates
9.31 9.89 9.9

Let's compare the most recent price to all the others to see how many it exceeds and how many exceed it.

   ({.AdjClose)+/ . < }.AdjClose       NB. Most recent is less than how many?
3892
   ({.AdjClose)+/ . >: }.AdjClose      NB. Most recent is greater than (or =) how many?
1699
   14.59 ((+/ . <) , +/ . >) AdjClose  NB. Same comparisons for price today
4212 1374

Side-tracking for a moment from the actual analysis, we look at a few ways to re-write the last J expression here to remove the apparent redundancy of the repeated summations (+/).

   +/ &> 14.59 (< ; >) AdjClose      NB. Examples of removing redundancy
4212 1374                            NB. from the preceding expression.
   +/ 14.59 (< ,. >) AdjClose        NB. Nicer because shorter, does not enclose
4212 1374

A tacit version is one without explicit names - it consists only of verbs.

   13 : '+/ x (< ,. >) y'            NB. Have J generate the tacit equivalent.
[: +/ < ,. >
   14.59 ([: +/ < ,. >) AdjClose
4212 1374
   14.59 (< ,&(+/) >) AdjClose       NB. Another tacit alternative
4212 1374

Now, check some of the claims in the article. First, we see how many times the adjusted closing price has exactly equaled the value of 14.62 mentioned in the article.

   AdjClose +/ . = 14.62             NB. Check the article’s assertion about a price
6                                    NB. below 14.62 marking the lowest level since
                                     NB. June 2007.

Since this has happened six times, we can't find the first point less than this number by looking for the number itself: we need a slightly more complex set of instructions.

   >Date{~AdjClose i. 14.62          NB. Date on which price last equalled 14.62.
2011-04-28                           NB. This is when it was last equal to 14.62 but
   >Date{~ 1 i.~ AdjClose<14.62      NB. he said “below” – so when is the most
2007-06-21                           NB. recent time it was less than 14.62?

This last expression, used to find the first instance in our series less than 14.62, deserves a more detailed explanation.

We want to find the first case the price was less than 14.62 because our prices are in date order with the most recent date first, so, by searching the vector of prices from start to end (the usual direction implicit in array operations in J), we're starting at the present and moving into the past.

The expression AdjClose<14.62 generates a boolean with zeros where the comparison is false and ones where it's true. The part of the expression 1 i.~ looks up (i.) the first occurrence of a one. The tilde reverses the order of the arguments, so instead of using parentheses like this - (AdjClose<14.62) i. 1 - we avoid them. Similarly, using the location of the first one returned by this part of the expression, we extract the corresponding member of the date vector - Date {~ - again using the tilde to avoid parentheses.

Finally, we disclose (>) the contents of the boxed date to show it more simply, without the box drawn around it.

   mean AdjClose                     NB. Claimed long-term average “about” 20
20.5552
   mean                              NB. Entering a name with no argument shows its
+/ % #                               NB. definition, here, a classic tacit expression
 -- Devon McCormick <<DateTime>>