Addons/tables/csv

From J Wiki
Jump to: navigation, search
User Guide | Installation | Development | Categories | Git | Build Log

tables/csv - CSV utilities

  • Provides verbs to read from and write to comma-separated-value (CSV) files or strings.
  • supports appending arrays to an existing csv file,
  • ability to convert fields to numeric type where possible
  • old code that uses the base library csv script should not need any modification

(apart from loading) to use this addon instead

  • CSV is a specific case of delimiter-separated-value (DSV) format and the verbs in this addon are covers of those in tables/dsv addon

Browse history, source and examples in SVN.


Verbs available

appendcsv v Appends an array to a csv file
fixcsv v Convert csv data into J array
makecsv v Makes a CSV string from an array
makenum v Converts cells in array of boxed literals to numeric where possible
enclose v Encloses string in quotes
readcsv v Reads csv file into a boxed array
writecsv v Writes an array to a csv file

Installation

Use JAL/Package Manager to install both the tables/csv and tables/dsv addons.

If you wish to replace the use of the base library csv script with the tables/csv addon, add the following lines to your ~config/startup.ijs script:

PUBLIC_j_=: (<<<({."1 PUBLIC_j_) i. <'csv'){PUBLIC_j_
buildpublic_j_ 0 : 0
csv       ~addons/tables/csv/csv
)

If you do this, then require 'csv' and load 'csv' will target the csv addon rather than the base library csv script.

Usage

Load csv addon with the following line

   load 'tables/csv'

Verbs are documented in the csv.ijs script.

   ]dat=: (34;'45';'hello';_5.34),: 12;'32';'goodbye';1.23
┌──┬──┬───────┬─────┐
│34│45│hello  │_5.34│
├──┼──┼───────┼─────┤
│12│32│goodbye│1.23 │
└──┴──┴───────┴─────┘
   datatype each dat
┌───────┬───────┬───────┬────────┐
│integer│literal│literal│floating│
├───────┼───────┼───────┼────────┤
│integer│literal│literal│floating│
└───────┴───────┴───────┴────────┘
   makecsv dat
34,"45","hello",-5.34
12,"32","goodbye",1.23

   dat writecsv jpath '~temp/test.csv'
47
   ]datcsv=: freads jpath '~temp/test.csv'
34,"45","hello",-5.34
12,"32","goodbye",1.23

   fixcsv datcsv
┌──┬──┬───────┬─────┐
│34│45│hello  │-5.34│
├──┼──┼───────┼─────┤
│12│32│goodbye│1.23 │
└──┴──┴───────┴─────┘
   readcsv jpath '~temp/test.csv'
┌──┬──┬───────┬─────┐
│34│45│hello  │-5.34│
├──┼──┼───────┼─────┤
│12│32│goodbye│1.23 │
└──┴──┴───────┴─────┘

Note that if you wish to use custom field and/or string delimiters, please see the tables/dsv addon (the tables/csv addon is a special case of the tables/dsv addon with the field delimiter set to ',' and the string delimiter set to '"'.

To see more samples of usage, open and inspect the test_csv.ijs script.

Comparison with `csv.ijs` script in base library

The tables/csv addon is no longer as concise (and clean) as the original csv script in the base library. However it supports more features, fixes some bugs? and, in most cases, has better performance than the original.

Most of the verbs from the base library csv script are unchanged. The structural changes can be summarised as follows:

  • The algorithm used by chopcsv to convert a line from a csv string into a

list of boxed fields has been replaced

  • the portion of writecsv used to make a csv string from a J array has been

factored out into a separate verb - makecsv

  • the algorithm used by makecsv to make a csv string from a J array has been replaced.

The new algorithm used, now depends on the type of J array

  • appendcsv was added to allow a J array to be converted to a csv string and

appended to an existing file

  • makenum was added to convert cells of arrays created with fixcsv to be

converted to numeric types where possible

Features

Feature changes from the base library csv script:

    • supports appending arrays to an existing csv file,
    • optional user-defined field delimiter and string delimiter(s) - see Addons/tables/dsv
    • only literal cells of J array are enclosed by string delimiters
    • writecsv/makecsv can handle boxed arrays with cells containing numeric

arrays, boxed or complex data

   ]tstarry=: ((34j3;2;<<4),:2;3 6;3)
┌────┬───┬───┐
│34j3│2  │┌─┐│
│    │   ││4││
│    │   │└─┘│
├────┼───┼───┤
│2   │3 6│3  │
└────┴───┴───┘
   load '~system/packages/files/csv.ijs'
   tstarry writecsv jpath '~temp/tstcsv.csv'
|domain error: writecsv
|   dat=.,each     8!:2 each x
   load 'tables/csv'
   tstarry writecsv jpath '~temp/tstcsv.csv'
19
   freads jpath '~temp/tstcsv.csv'
34j3,2,4
2,3 6,3

Fixed bugs?

    • writecsv/makecsv does not append LF to an empty string.
    • fixcsv correctly unescapes quotes embedded in fields
   tstcsv=: '"Symbol "" is Rank",38,"abc"',LF,'"Hello world",56,"efg"',LF
   load '~system/packages/files/csv.ijs'
   fixcsv tstcsv
┌─────────────────┬──┬───┐
│Symbol "" is Rank│38│abc│
├─────────────────┼──┼───┤
│Hello world      │56│efg│
└─────────────────┴──┴───┘
   load 'tables/csv'
   fixcsv tstcsv
┌────────────────┬──┬───┐
│Symbol " is Rank│38│abc│
├────────────────┼──┼───┤
│Hello world     │56│efg│
└────────────────┴──┴───┘


Performance

  • Performance of fixcsv is pretty much unchanged (a bit faster if

anything).

  • The new algorithms in makecsv are generally 3-9 times leaner, and in most

cases faster.

  • Large arrays of a single type or with columns, each of a single type, are

processed at least as fast as the old version and simple numeric arrays are over 4 times faster.

  • For small arrays containing different datatypes the new version can be up

to twice as slow as the old version, but because total time taken is small, this will not generally be practically significant.

  • Large arrays with multiple types within a column are about 80% as fast as the

old version, but use 8 times less space. See table below.

Library csv.ijs Addon csv.ijs Ratio
Data type Iterations Code Time Space Time Space Time Space
Simple numeric 100 makecsv i. 50 70 0.0153 2913090 0.0035 417344 4.422 6.980
Simple numeric (big) 1 makecsv i.5000 70 2.3214 293626000 0.5485 45474900 4.232 6.457
Boxed numeric 100 makecsv <"0 i. 50 70 0.0148 2913340 0.0092 850624 1.602 3.425
Boxed numeric (big) 1 makecsv <"0 i.5000 70 2.3212 293626000 1.9981 87621100 1.162 3.351
Simple literal (big) 1 makecsv 5000 70$'abcd' 4.5013 644609000 4.0594 645135000 1.109 0.999
Columns of single type 100 makecsv simpcol 0.0002 38272 0.0003 9792 0.619 3.908
Columns of single type (big) 1 makecsv 5000$simpcol 0.3163 45443200 0.0904 5180220 3.499 8.772
Columns of mixed type 100 makecsv mixcol 0.0003 33536 0.0004 11648 0.589 2.879
Columns of mixed type (big) 1 makecsv 5000$mixcol 0.2862 38818700 0.3302 4959490 0.867 7.827
String (small) 100 fixcsv ssimpcol 0.0002 10624 0.0002 10496 1.029 1.012
String (big) 1 fixcsv 171250$ssimpcol 0.2588 4530620 0.2400 4530690 1.078 1.000
   simpcol
┌──┬────────────────┬─┬─┬────┬────┬──┐
│12│The black dog   │1│E│9.32│54  │XL│
├──┼────────────────┼─┼─┼────┼────┼──┤
│15│likes to        │0│R│4.45│5.24│  │
├──┼────────────────┼─┼─┼────┼────┼──┤
│22│eat             │1│E│    │455 │XS│
├──┼────────────────┼─┼─┼────┼────┼──┤
│96│juicy, red bones│1│W│5.45│924 │M │
└──┴────────────────┴─┴─┴────┴────┴──┘
   mixedcol
┌────┬─────────────┬────────┬───┬────────────────┬────┐
│12  │The black dog│1       │E  │9.32            │54  │
├────┼─────────────┼────────┼───┼────────────────┼────┤
│XL  │15           │likes to│0  │R               │4.45│
├────┼─────────────┼────────┼───┼────────────────┼────┤
│5.24│             │22      │eat│1               │E   │
├────┼─────────────┼────────┼───┼────────────────┼────┤
│    │455          │XS      │96 │juicy, red bones│1   │
└────┴─────────────┴────────┴───┴────────────────┴────┘

Authors

Adapted from the base library csv script by Ric Sherlock

Suggestions and/or SVN improvements to the addon are welcome.

See Also

  • csvedit addon - GUI application for creating and editing CSV files.
  • dsv addon - general utility for any delimiter-separated-value formated string.