NYCJUG/2008-10-14

From J Wiki
Jump to: navigation, search

JDB, interactive graphical interface

Location:: Heartland Brewery, 34th and 5th, NYC


Meeting Summary

We started off by talking about some figures on the biggest daily changes in the Dow Jones Industrial average and how some graphs of this might be improved and how this might be incorporated into the interactive graphics tool we're planning. Also, we talked about the new J database JDB and how we might influence and aid this effort. Finally, we talked about how these two efforts might relate to each other.

Agenda for NYCJUG of 20081014

1. Introducing J: publicizing the many packages available.

2. Show-and-tell: JDB introduction - what would be useful to have?
What would help support time-series data?

3. Advanced topics: Interactive Graphics: what features should this have?
See "Samples of Some Existing Plotting Packages.doc" for examples of
what is already out there.

4. Learning and teaching J: frustrations in finding things that you
know are there.

            +.--------------------+.

To sum up: it is wrong always, everywhere, and for anyone,
to believe anything upon insufficient evidence.
  - William Kingdon Clifford, "The Ethics of Belief"

Proceedings

To start the meeting off, we considered a timely topic much on everyone's mind these days by looking at a list I'd prepared of the biggest daily moves in the Dow Jones Industrial average.

The DJI is not the most widely used index these days but people are familiar with it because it's been around a long time. In fact, the data series I downloaded from Yahoo! Finance begins in October of 1928. This makes it almost exactly 80 years old, which is a nice number of years to consider for a number of reasons. One thing that I was looking at was how the number of big "up" days compares to the number of big "down" days and if this relation has changed over time in any easily-characterizable way.

Largest Daily Moves in the Dow Jones Industrial Average as of 10/13/2008
Losses Gains
# Date % Decline Date % Gain
1 10/19/1987 -22.6 3/15/1933 15.3
2 10/28/1929 -13.5 10/6/1931 14.9
3 10/29/1929 -11.7 10/30/1929 12.3
4 10/5/1931 -10.7 6/22/1931 11.9
5 11/6/1929 -9.9 9/21/1932 11.4
6 8/12/1932 -8.4 10/13/2008 11.1
7 1/4/1932 -8.1 10/21/1987 10.1
8 10/26/1987 -8.0 8/3/1932 9.5
9 6/16/1930 -7.9 9/5/1939 9.5
10 7/21/1933 -7.8 2/11/1932 9.5
11 10/9/2008 -7.3 11/14/1929 9.4
12 10/18/1937 -7.2 12/18/1931 9.4
13 10/27/1997 -7.2 5/6/1932 9.1
14 10/5/1932 -7.2 4/19/1933 9.0
15 9/17/2001 -7.1 10/8/1931 8.7
16 9/24/1931 -7.1 8/8/1932 8.2
17 7/20/1933 -7.1 6/10/1932 8.0
18 9/29/2008 -7.0 6/19/1933 7.6
19 10/13/1989 -6.9 6/3/1931 7.1
20 1/8/1988 -6.9 1/6/1932 7.1

One of the convenient things about this 80-year period is that it divides evenly into four 20-year periods which more-or-less coincide with important eras in the investing world. The first 20 years, from 1928 through late 1948, covers the Great Crash, the Great Depression, and World War II. The second period covers the post-war era through the culturally seminal year of 1968. The third period covers the great bear market of the early 1970s and the great crash of 1987 (which is at the very top of the list for a single day's move.) The most recent period covers the post-'87-crash, the dot-com boom and bust, to the recent turbulence.

One interesting thing to note is that the years 1929-1933 still dominate the top twenty. Another thing to notice is that there are no days in the top twenty for the years between 1939 and 1987.

Here we show the distribution of the largest daily changes upward (darker bar) versus those downward (light bar). DJITopGainsLosses-NonUniformScaling.png

They all look somewhat similar until you pay close attention to the scales on the bottom of the graphs which are quite different. However, since each individual histogram is scaled according to its own data, the graphs of these four periods are not to be simply compared to each other - they differ more than first appears.

Here we see a crude attempt to use a common scale across all four by forcing the same minimum and maximum X-value onto all the charts.

DJITopGainsLosses-MoreUniformScaling.png

This highlights the difficulty of doing this well for a few reasons. For instance, though the X-scale is the same across all four periods, the Y-scale is not. Even more importantly, the way I achieved even the minimal commonality of the maximum X-value was by cheating: I added spurious minimum and maximum values to the three series lacking the true minimums and maximums (from the 1948-1968 and 1968-1988 graphs, respectively), then manually erased the very small spurious bars from each graph after it had been rendered as a picture.

All of this points to some fairly obvious ways of better graphing that are very hard to accomplish with existing packages. In fact, I had first noticed this problem when generating graphs of multiple, related series with S-Plus, a language with highly-regarded graphing capabilities. This language is virtually the same as the freely available "R" language to which J has an interface. In fact, Thomas, who started this interactive graphics initiative, had mentioned that they use this interface specifically to generate graphs from J which it cannot do well on its own.

In fact, you may notice that these charts are slightly out-of-synch with the table of numbers because I re-ran the numbers subsequent to the big market moves in October, but have not re-done the charts. That's because it's easy to re-run the numbers but time-consuming to re-do the charts. This difficulty of updating charts was another motivation for Thomas to start work on the interactive graphics project and it's a common problem if you work with a lot of charts.

Beginner's regatta

There are getting to be quite a few packages available in J. Here is a list of them currently:

arc/zip Zip file utilities based on zlib 1.2.3 and minizip libraries.
arc/ziptrees Zips and Unzips directory trees
base library base library scripts and labs
convert/misc miscellaneous scripts
data/dbman Database manager
data/jdb JDB
data/sqlite sqlite enhanced API for J
docs/wikihtml Offline browsing of wiki sections for Grid, Plot and Project Manager
finance/actuarial Actuarial functions
finance/interest Compound interest functions
format/publish builds pdf reports from markup
games/nurikabe Nurikabe
general/dirtrees Copy and delete directory trees
general/dirutils Additional directory utilities
general/inifiles Platform neutral interface for INI files
general/jayscript J Language Active Script Connector
general/jod JOD J Object Dictionary
general/jodsource JOD Object Dictionary Source
general/pcall Pointer call to a DLL function
general/sfl Standard Function Library from iMatix, a portable function library for C/C++ programs
graphics/fvj3 Materials for Fractals, Visualization and J, 3rd edition including scripts for visualization.
graphics/gnuplot Create gnuplot graphics
graphics/graphviz Graph Visualization
graphics/jturtle Turtle graphics
graphics/treemap Displays a treemap
gui/gtk GTK API
gui/jobs Application framework to host analysis jobs
gui/monthview Displays the Microsoft Monthview calendar control
gui/util GUI utilities
math/deoptim Differential Evolution for optimization of multidimensional functions
math/fftw FFTW
math/lapack LAPACK
math/lbfgs LBFGS for unconstrained nonlinear optimization
math/misc miscellaneous scripts
media/animate Animation Utility
media/gdiplus GDI+ Library
media/image3 Utilities for accessing 24-bit jpeg, png, bmp, tga and portable anymaps in J.
media/ming Flash SWF file generator based on Ming
media/paint Bitmap image-editing application
media/platimg Platform neutral image I/O utilities
media/wav Windows WAV file creation and play
stats/base Basic statistics package
stats/dendrite Dendrite cluster analysis method
stats/r Interfaces to R statistical package
stats/rlibrary R library using Rserve interface
tables/csv Read and write CSV files and strings
tables/csvedit Grid based editor for CSV files
tables/dsv Convert delimiter-separated strings and files to/from boxed arrays
tables/excel Reads Excel files using OLE
tables/tara Platform independent system for reading and writing Excel files
web/jhp J Hypertext Processor
xml/loose Loose XML parser based on regex
xml/sax XML parser based on Expat library
xml/xslt XSL Transform tool

We need to work on using some of these, giving feedback to their creators to improve them, and publicize them. This brings us to our next topic which is one of these packages: JDB.

Show-and-tell

I've done some preliminary work with JDB. The first thing to do is to load up some different kinds of datasets to see how well it handles them. I have three datasets in mind for this preliminary investigation: the Netflix Challenge data, some options data, and some data on commodities.

These reflect my own interests and data I have readily available. Each of the three should test a different facet of JDB. The Netflix dataset is fairly large and should test JDB's capacity. The options data is a fairly complicated example of time-series data. Even today, the major databases do not have time-series handling built in - each user has to cobble it together ad-hoc.

A summary of my experiences to-date with JDB can be found here: User:Devon McCormick/JDBWithNetflixChallengeData.

Advanced topics

Learning and teaching J

Scan of Meeting Notes

NYCJUG20081014MeetNotes 40.jpg