User:Daniel Gregoire/CSV by Hand

From J Wiki
Jump to navigation Jump to search

I had a large CSV to load into J.

I reached for tables/csv and its readcsv verb. This worked, but it took a long time to run. I assume it's all the boxing.

So I wrote these sentences to do things "by hand". I generated the source CSVs on a different system, so I had confidence they were both simple and uniform in structure (almost entirely numbers, no quotes, only comma field separators and newline row separators).

J Code

NB. === Verbs
timeIt=.(6!:2)                            NB. Run J code and return elapsed time
parseHeader=.<;._1@:(','&,)@:(#~ ' '&~:)  NB. Cut header row into boxes
parseHeader=.','&splitstring@:(#~ ' '&~:) NB. Equivalent, using a base verb
simpleCsv=.{.@:(".;._1)&(','&,)           NB. Cut data row and evaluate

NB. === Nouns
f=.'/tmp/big.csv'
raw=.'m'freads f                          NB. 'm' avoids boxing of contents
]parsedTime=.timeIt'd=. simpleCsv }.raw'  NB. Skip header row, parse CSV; capture elapsed time
$d                                        NB. check shape of data table
]headers=. parseHeader {.raw              NB. grab header row _from the raw_

The definition of parseHeader deserves a little attention.

The left argument of 'm' to freads reads the file into a matrix, which is square. This results in fill values being added to rows that aren't long enough.

The parseHeader verb expects a character vector, so the hook (#~ ' '&~:) safely removes the trailing space characters that were used as fill.

Note to myself and to other junior J programmers: hook and # go really well together. I find this pattern recurring frequently.

Moving on.

After I have my parsed CSV, I then create simple accessor functions for columns I want to analyze, because we can then write short, elegant verbs:

]started=.   (headers i. <'started')&{"1   NB. fn: get started column
]completed=. (headers i. <'completed')&{"1 NB. fn: get completed column
cycleTime=. completed-started              NB. fn: calculate cycle time

Stick a fork in it! (Sorry, I am a dad.)