JDB, interactive graphical interface
Location:: Heartland Brewery, 34th and 5th, NYC
We started off by talking about some figures on the biggest daily changes in the Dow Jones Industrial average and how some graphs of this might be improved and how this might be incorporated into the interactive graphics tool we're planning. Also, we talked about the new J database JDB and how we might influence and aid this effort. Finally, we talked about how these two efforts might relate to each other.
Agenda for NYCJUG of 20081014
1. Introducing J: publicizing the many packages available. 2. Show-and-tell: JDB introduction - what would be useful to have? What would help support time-series data? 3. Advanced topics: Interactive Graphics: what features should this have? See "Samples of Some Existing Plotting Packages.doc" for examples of what is already out there. 4. Learning and teaching J: frustrations in finding things that you know are there. +.--------------------+. To sum up: it is wrong always, everywhere, and for anyone, to believe anything upon insufficient evidence. - William Kingdon Clifford, "The Ethics of Belief"
To start the meeting off, we considered a timely topic much on everyone's mind these days by looking at a list I'd prepared of the biggest daily moves in the Dow Jones Industrial average.
The DJI is not the most widely used index these days but people are familiar with it because it's been around a long time. In fact, the data series I downloaded from Yahoo! Finance begins in October of 1928. This makes it almost exactly 80 years old, which is a nice number of years to consider for a number of reasons. One thing that I was looking at was how the number of big "up" days compares to the number of big "down" days and if this relation has changed over time in any easily-characterizable way.
Largest Daily Moves in the Dow Jones Industrial Average as of 10/13/2008 Losses Gains # Date % Decline Date % Gain 1 10/19/1987 -22.6 3/15/1933 15.3 2 10/28/1929 -13.5 10/6/1931 14.9 3 10/29/1929 -11.7 10/30/1929 12.3 4 10/5/1931 -10.7 6/22/1931 11.9 5 11/6/1929 -9.9 9/21/1932 11.4 6 8/12/1932 -8.4 10/13/2008 11.1 7 1/4/1932 -8.1 10/21/1987 10.1 8 10/26/1987 -8.0 8/3/1932 9.5 9 6/16/1930 -7.9 9/5/1939 9.5 10 7/21/1933 -7.8 2/11/1932 9.5 11 10/9/2008 -7.3 11/14/1929 9.4 12 10/18/1937 -7.2 12/18/1931 9.4 13 10/27/1997 -7.2 5/6/1932 9.1 14 10/5/1932 -7.2 4/19/1933 9.0 15 9/17/2001 -7.1 10/8/1931 8.7 16 9/24/1931 -7.1 8/8/1932 8.2 17 7/20/1933 -7.1 6/10/1932 8.0 18 9/29/2008 -7.0 6/19/1933 7.6 19 10/13/1989 -6.9 6/3/1931 7.1 20 1/8/1988 -6.9 1/6/1932 7.1
One of the convenient things about this 80-year period is that it divides evenly into four 20-year periods which more-or-less coincide with important eras in the investing world. The first 20 years, from 1928 through late 1948, covers the Great Crash, the Great Depression, and World War II. The second period covers the post-war era through the culturally seminal year of 1968. The third period covers the great bear market of the early 1970s and the great crash of 1987 (which is at the very top of the list for a single day's move.) The most recent period covers the post-'87-crash, the dot-com boom and bust, to the recent turbulence.
One interesting thing to note is that the years 1929-1933 still dominate the top twenty. Another thing to notice is that there are no days in the top twenty for the years between 1939 and 1987.
They all look somewhat similar until you pay close attention to the scales on the bottom of the graphs which are quite different. However, since each individual histogram is scaled according to its own data, the graphs of these four periods are not to be simply compared to each other - they differ more than first appears.
Here we see a crude attempt to use a common scale across all four by forcing the same minimum and maximum X-value onto all the charts.
This highlights the difficulty of doing this well for a few reasons. For instance, though the X-scale is the same across all four periods, the Y-scale is not. Even more importantly, the way I achieved even the minimal commonality of the maximum X-value was by cheating: I added spurious minimum and maximum values to the three series lacking the true minimums and maximums (from the 1948-1968 and 1968-1988 graphs, respectively), then manually erased the very small spurious bars from each graph after it had been rendered as a picture.
All of this points to some fairly obvious ways of better graphing that are very hard to accomplish with existing packages. In fact, I had first noticed this problem when generating graphs of multiple, related series with S-Plus, a language with highly-regarded graphing capabilities. This language is virtually the same as the freely available "R" language to which J has an interface. In fact, Thomas, who started this interactive graphics initiative, had mentioned that they use this interface specifically to generate graphs from J which it cannot do well on its own.
In fact, you may notice that these charts are slightly out-of-synch with the table of numbers because I re-ran the numbers subsequent to the big market moves in October, but have not re-done the charts. That's because it's easy to re-run the numbers but time-consuming to re-do the charts. This difficulty of updating charts was another motivation for Thomas to start work on the interactive graphics project and it's a common problem if you work with a lot of charts.
There are getting to be quite a few packages available in J. Here is a list of them currently:
arc/zip Zip file utilities based on zlib 1.2.3 and minizip libraries. arc/ziptrees Zips and Unzips directory trees base library base library scripts and labs convert/misc miscellaneous scripts data/dbman Database manager data/jdb JDB data/sqlite sqlite enhanced API for J docs/wikihtml Offline browsing of wiki sections for Grid, Plot and Project Manager finance/actuarial Actuarial functions finance/interest Compound interest functions format/publish builds pdf reports from markup games/nurikabe Nurikabe general/dirtrees Copy and delete directory trees general/dirutils Additional directory utilities general/inifiles Platform neutral interface for INI files general/jayscript J Language Active Script Connector general/jod JOD J Object Dictionary general/jodsource JOD Object Dictionary Source general/pcall Pointer call to a DLL function general/sfl Standard Function Library from iMatix, a portable function library for C/C++ programs graphics/fvj3 Materials for Fractals, Visualization and J, 3rd edition including scripts for visualization. graphics/gnuplot Create gnuplot graphics graphics/graphviz Graph Visualization graphics/jturtle Turtle graphics graphics/treemap Displays a treemap gui/gtk GTK API gui/jobs Application framework to host analysis jobs gui/monthview Displays the Microsoft Monthview calendar control gui/util GUI utilities math/deoptim Differential Evolution for optimization of multidimensional functions math/fftw FFTW math/lapack LAPACK math/lbfgs LBFGS for unconstrained nonlinear optimization math/misc miscellaneous scripts media/animate Animation Utility media/gdiplus GDI+ Library media/image3 Utilities for accessing 24-bit jpeg, png, bmp, tga and portable anymaps in J. media/ming Flash SWF file generator based on Ming media/paint Bitmap image-editing application media/platimg Platform neutral image I/O utilities media/wav Windows WAV file creation and play stats/base Basic statistics package stats/dendrite Dendrite cluster analysis method stats/r Interfaces to R statistical package stats/rlibrary R library using Rserve interface tables/csv Read and write CSV files and strings tables/csvedit Grid based editor for CSV files tables/dsv Convert delimiter-separated strings and files to/from boxed arrays tables/excel Reads Excel files using OLE tables/tara Platform independent system for reading and writing Excel files web/jhp J Hypertext Processor xml/loose Loose XML parser based on regex xml/sax XML parser based on Expat library xml/xslt XSL Transform tool
We need to work on using some of these, giving feedback to their creators to improve them, and publicize them. This brings us to our next topic which is one of these packages: JDB.
I've done some preliminary work with JDB. The first thing to do is to load up some different kinds of datasets to see how well it handles them. I have three datasets in mind for this preliminary investigation: the Netflix Challenge data, some options data, and some data on commodities.
These reflect my own interests and data I have readily available. Each of the three should test a different facet of JDB. The Netflix dataset is fairly large and should test JDB's capacity. The options data is a fairly complicated example of time-series data. Even today, the major databases do not have time-series handling built in - each user has to cobble it together ad-hoc.
A summary of my experiences to-date with JDB can be found here: User:Devon McCormick/JDBWithNetflixChallengeData.