high-dimensional graph, prime numbers, visual explanation, symbolic notation
Summary of Meeting
We talked about promoting J among those interested in both mathematics and programming. We also heard about the JOD tool for organizing J code. Finally, we looked at a number of examples of how to handle high-dimensional data visually.
Meeting Agenda for NYC JUG 20081111
1. Beginner's Regatta: MathWeb entry? See MathWeb Wiki.doc. 2. Show-and-tell: Primality tests - latest advance in J - see "Primality Tests.doc". 3. Advanced topics: Graphing high-dimensional data. We'll look at the features of a graphing package - see "Intro to VisuMap high-dimensional visualization software.doc". Also, some types of graphs and references on visualization and graphics taken from "Multidimensional Data Visualization Techniques for Financial Performance Data_TR810.pdf" - see "Examples of Graphing High-Dimensional Data.doc". 4. Learning and teaching J: "Little Big" example on teaching the dyadic transpose of a multi-dimensional array.
We discussed actions to interest people in J and what groups would be most open to what the language has to offer.
We discussed whether computer science or math people are more open to what J has to offer. One idea is to frequent mathematically-oriented sites that have a strong programming inclincation. An example of this is MathWeb. We already have a basic J page up there. [[File:MathWebJPage.jpg]
Ken was very enthusiastic about John Baker's JOD (J Object Dictionary) addon. He's been using it to assemble a dictionary of J words to address specific problem domains. Ken has found it convenient for organizing sets of functions. There's more information about it here on Googlepages; there's also an introductory slideshow.
We also discussed some of the sophistication underlying J's built-in primality testing, based on an essay on the J Wiki which describes some of the major tests and presents J functions to replicate what J presumably does internally.
VisuMap: a High-Dimensional Visualizer
To continue a popular ongoing topic, graphing high-dimensional data, we looked at some packages and methods for doing this. The first of these, the VisuMap High-Dimensional Data Visualizer, shows some colorful examples on its homepage.
It lists the following features:
- Relational Perspective Map (RPM)
- Multidimensional Scaling (Sammon Map)
- Curvilinear Component Analysis (CCA)
- Principal Component Analysis (PCA)
- MDS by SMACOF Algorithm
- Correspondence Analysis
- Stochastic Neighbor Embedding (t-SNE)
- Self-Organizing Map (Kohonen Net)
- K-Mean Clustering
- Metric Sampling
- Agglomerative Clustering
- Affinity Propagation
- Relational DB & Spreadsheet
- JPEG, GIF, SVG etc. (10 image formats)
- Scripting interface for automation
- DotNet plug-in interface for extension
- 13 built-in metrics: euclidean, mahalanobis, pearson correlation, speaman ranking, wedge- hedge, etc.
- Many interactive views: bar, curve, spectrum, parallel coordinates, table, tree, shepeard diagram, MS Offices charts, etc.
- Attribute map analysis
- Hardware accelerated 3D animation
Methods for High-Dimensional Visualization
We looked at various techniques for visualizing high-dimensional data. One of these is the "Relational Perspective Map", described as follows.
A Brief Introduction to Relational Perspective Map
Relational perspective map (RPM), developed by James X. Li, is a general purpose method to visualize distance information of data points in high dimensional spaces.
The starting point of the RPM algorithm is a set of data point s[sub "i"], i=1,...,N, and a distance matrix [delta][sub "ij"]. The matrix [delta][sub "ij"], called the relational distance, is the numeric representation of a relationship between the data points. The goal of the RPM algorithm is to map the data points s[sub "i"] into a two or three dimensional map so that Euclidean distances d[sub "ij"] between the image points visually approaches [delta][sub "ij"] as much as possible. The resulting lower dimensional map is called relational perspective map, the matrix d[sub "ij"] is called the image distance matrix. From geometric point of view, a RPM map attempts to preserve as much as possible distance information of the original dataset.
The following picture shows the RPM algorithm works to create 2D maps: it first maps data points to the surface of a torus, then unfolds the torus surface by a vertical and a horizontal cut. The second step is more or less straightforward (see here for a short video showing the transformation between rectangle and torus), so the RPM algorithm focus on how to map the data points to the torus surface so that the distances between the image points resembles the distances between data points. In order the find the best mapping RPM algorithm defines an energy function as follows:
where p is an algorithm parameter called the rigidity, dij is the geodesic block distance between two image points on the torus surface. The RPM algorithm then uses gradient descent optimization method to find a configuration with minimum energy. The rigidity parameter, which is normally a value between -1 and +1, alters the energy landscape in a global manner, so that the resulting RPM maps have different characteristics.
To better understand the RPM algorithm it is helpful to consider the image points on the torus as a force directed multi-particle system with mutual repulsive forces between them; and consider the energy Ep as a kind of total potential energy. According to physical law the repulsive force is characterized by following form:
Above form says that the repulsive force between two points is proportional to their relational distance. Thus the process to minimize the energy Ep is actually a process that simulates the dynamic system directed by the force defined by above form. Since points with larger relational distances between them correspond to larger repulsive force on the torus, their image points on the torus should be further apart from each other.
The key idea of RPM algorithm, that distinguish it from other known algorithms like those listed in the next section, is that RPM successfully exploited the property of closed manifold (the torus) to keep the configuration in balance. Whereas other non-linear methods apply, directly or indirectly, attractive force to map closely related points to closely located positions, RPM algorithm maps closely related points to closely location area by the collective repulsive force of all points. This characteristics make RPM the true, and the only (as far as we know), global mapping algorithm.
It should be noted here that RPM algorithm made a significant relaxation to the original problem setting: the resulting map is not a normal rectangle map, but a map on the torus. That means the opposite edges of the map should be considered as stuck with each other.
Visualizing high-dimensional data has been a major topic for many decades. A large group of methods target high-dimensional data with sophisticated rendering techniques like 3D landscape, special glyphs, colors, and graphics etc. Other methods target the problem by reducing the dimensionality in a generic way with little assumption about data type. The RPM algorithm belongs to the latter group. The following is a short list of methods which directly or indirectly reduce data dimensionality:
- Multidimensional Scaling: Sammon Mapping, Curvilinear Component Analysis.
- Self-Organizing Map.
- Principal Component Analysis/Singular Value Decomposition.
- Non-linear Projections: Local Linear Embedding, Isomap.
- Methods based on physical models: spring embedding model; force directed placement.
1. James Xinzhi Li: Visualization of High Dimensional Data with Relational Perspective Map. Information Visualization 2004, Vol. 3, No. 1. 49-59.
Examples of Graphing High-Dimensional Data
Parallel Coordinate Plot
Principal Component Analysis
An Example of Using High-Dimensional Data
Learning and Teaching J
Here's an example of using J to illustrate the difference between understanding things visually versus symbolically.
Scan of Meeting Notes