From J Wiki
Jump to navigation Jump to search

Questions from a Beginner

Here's (the start of) a collection of questions that beginning learners of J have asked.

date	Mar 30, 2008 10:56 PM	
subject	[Jgeneral] A few general questions from a wannabe J-er

Hi all,

I am an economist and I discovered J a few days ago. I haven't been so excited since when I was 13 and Santa brought me an 8-bit Nintendo Entertainment System. Yet before taking a week off from work to study J (just kidding) I would like to be sure it does everything I need. Here are questions in four main topics: data management, performance, actual computation, and learning. Every answer to any question is very welcome. Answers to questions marked with a (*) are particularly important to me.

Thank you in advance!

Data Management

- I import data from several sources. Not always are they in straightforward formats. Are there libraries or built-in function to import text (e.g. .csv, .tab, fixed format) and non-text (e.g. Excel, 1-2-3) data?

- (*) I often merge datasets (sort of SQL join). The other day I saw that it is possible to embed a database (SQLite) through a library. Are there interfaces to other databases? I usually use MySQL (last time I checked SQLite did not implement enough SQL for my purposes - that was probably 2 years ago). Are there in-built functions to perform similar operations? (although I'd be very happy to do all the merging in SQL).


- (***) How does J deal with very large datasets? currently I am dealing with a 65-Gb dataset. So far only software I can use is SAS. Performing an SQL query [SELECT, GROUP BY] in SAS on a dedicated server takes me six hours, of which a large part of the time is network I/O (I guess SAS's computing time would be an hour, perhaps two). The data is divided in 7 chunks of 7 to 13 Gb each. Having the same amount of data on a good computer, would I be able to perform the same operations with J? Assume plentiful RAM and speedy processor: what's the order of magnitude of the time it would take?

- I read something about memory mapping in past posts and I intuitively understand what it means but I never did it. What are the limits of memory mapping? In general, what are the techniques to deal with large datasets?


- Is there a numerical optimizer/solver? (e.g., given a certain function, find local maxima and minima; given an equation, find the zeros). I could program this one, but is there one already?

- Is there a sufficiently painless interface to Maxima (symbolic calculus toolbox)?


- What's the fastest way to learn the basics for a greedy person who learns the average C-like programming language in a week? Normally what I do is to learn "what can be done" and then start programming right away with a reference at hand. Here it does not seem so simple... right?