From J Wiki
Jump to: navigation, search

weighted averages, empty arrays, randomly-generated J phrases, large file processing

Location:: The Heartland


             Meeting Agenda for NYCJUG 20131112
1. Beginner's regatta: comparing averaging methods: see "Weighted Moving

2. Show-and-tell: see "Adventures in Random J.doc".

Iterating through a dataset too large to process at once: see

3. Advanced topics: J Conference 2014: see "J Conference 2014.doc".

See "Why Empty Arrays of Different Types are the Same.doc".

4. Learning, teaching and promoting J, et al.: report on d3.js workshop and
FinTech Hackathon: where J might aim.

See "Old schooled: You never stop learning like
a child" and "People in their 90s are Getting Smarter".

Beginner's regatta


Working with Large Files in Pieces

In order to work with a file too large to fit into memory in one piece, we develop a verb to break it into pieces and an adverb to apply an arbitrary verb across the file. In this case, our objective is break a large file into small pieces to facilitate transmission of it, then re-assemble the pieces on the target machine to re-create our original, large file.

NB.* breakUpFile: inner verb to break apart file into smaller pieces.
breakUpFile=: 4 : 0
   'curptr chsz max flnm ctr'=. 5{.y
   if. curptr>:max do. ch=. (curptr;chsz;max;flnm;'');ctr
   else. ch=. readChunk curptr;chsz;max;flnm
       x writeFilePiece (>{:ch);ctr
       ch=. ch;>:ctr
NB.EG ('pfx';'.suf')&breakUpFile ^:_ ] 0;1e6;(fsize 'big.dat');'big.dat';0

writeFilePiece=: 4 : 0
   'pfx suff'=. x [ 'ch ctr'=. y
   ch fwrite pfx,(":ctr),suff

NB.* doSomething: do something to a large file in sequential blocks.
doSomething=: 1 : 0
   'curptr chsz max flnm leftover hdr'=. 6{.y
   if. curptr>:max do. ch=. curptr;chsz;max;flnm
   else. if. 0=curptr do. ch=. readChunk curptr;chsz;max;flnm
           chunk=. leftover,CR-.~>_1{ch
           'chunk leftover'=. (>:chunk i: LF) split chunk
           'hdr body'=. (>:chunk i. LF) split chunk
           hdr=. }:hdr
       else. chunk=. leftover,CR-.~>_1{ch=. readChunk curptr;chsz;max;flnm
           'body leftover'=. (>:chunk i: LF) split chunk
       u body;<hdr
NB.EG (('PRCCD - Price - Close - Daily - USD';'$issue_id';'IDsDateRanges-Daily.txt')&accumDts2File) doSomething ^:_ ] 0;1e6;(fsize 'gvkeyIID-USD.txt');'gvkeyIID-USD.txt'
NB.EG (('PRCCD - Price - Close - Daily';'IDsDateRanges.txt')&accumDts2File) doSomething ^:_ ] 0;1e6;(fsize 'GvkeyIID.txt');'GvkeyIID.txt'

readChunk=: 3 : 0
   'curptr chsz max flnm'=. 4{.y
   if. 0<chsz2=. chsz<.0>.max-curptr do. chunk=. fread flnm;curptr,chsz2
   else. chunk=. '' end.
NB.EG chunk=. >_1{ch0=. readChunk 0;1e6;(fsize 'GvkeyIID.txt');'GvkeyIID.txt'

readChunk_egUse_=: 0 : 0
   ch0=. readChunk 0;1e6;(fsize 'GvkeyIID.txt');'GvkeyIID.txt'
   chunk=. CR-.~>_1{ch0
   'chunk leftover'=. (>:chunk i: LF) split chunk
   'hdr body'=. split <;._1&> TAB,&.><;._2 chunk
   body=. body#~-.a: e.~ body{"1~hdr i. <'PRCCD - Price - Close - Daily'
   unqids=. ~.ids=. ;&.><"1 body{"1~ hdr i. '$gvkey';'$iid'
   dts=. MDY2ymdNum&>0{"1 body
   (unqids textLine ids (<./,>./) /. dts) fappend 'IDsDateRanges.txt'

Still to Do

We need to create a batch file with the commands to re-assemble the pieces into the original file. Here's an example of doing this manually.

First, we group the assembly of the smallest pieces into intermediate files, in order.

   $nmlst=. 0{"1 dir 'Bridge*.dat'
   nmlst=. nmlst /: ".&>6}.&.>_4}.&.>nmlst   NB. Order by numeric portion
11 11 11 11 11 11 11 11 11 11 12 12 12 12 12 12 12 12 12 12 12 12 12 12...

We need to account for the length of the start of the command, its result and each of the small file names separated by plus signs - "+" is the DOS "copy" command concatenation symbol.

   #st=. 'copy /b ',end=. 'Br000.tmp'
   255-17             NB. Maximum line length is 255
   12%~255-17         NB. How many intermediate joins can we do per line?
   +/ptn=. (#nmlst)$19{.1
   bb=. ptn<;.1 nmlst
join1=: 3 : 0
   'nms outnm ctr'=. y
   nms=. >nms
   ('copy /b '),(}.;'+',&.>nms),' ',(outnm{.~outnm i. '.'),(":ctr),outnm}.~outnm i. '.'

Check that this works as we think it ought to.

   join1 (0{bb);'Br.tmp';0
copy /b Bridge0.dat+Bridge1.dat+Bridge2.dat+Bridge3.dat+Bridge4.dat+Bridge5…
   ;LF,~&.>join1 &.> (<"0 bb);&.>(<'Br.tmp');&.>i.+/ptn
copy /b Bridge0.dat+Bridge1.dat+Bridge2.dat+Bridge3.dat+…+Bridge18.dat Br0.tmp
copy /b Bridge19.dat+Bridge20.dat+Bridge21.dat+Bridge22.dat+...
copy /b Bridge95.dat+Bridge96.dat+Bridge97.dat+Bridge98.dat+Bridge99.dat Br5.tmp
   (;LF,~&.>join1 &.> (<"0 bb);&.>(<'Br.tmp');&.>i.+/ptn) fwrite 'Assemble529.bat'

Now we have to do the same thing at the next level: join together the intermediate files that are the aggregates of the smallest pieces.

   (LF,~'copy /b ',(}.;'+',&.>(<'.tmp'),~&.>(<'Br'),&.>":&.>i.6),' 5.2.9_Clarifi_BridgeInstaller.exe') fappend 'Assemble529.bat'

Again, an example from the top, for another file. First, we break down the large file named by finalNm into two million byte pieces with names of the form "PatchN.dat" where "N" is a sequence number.

   finalNm=. '5.2.9_Clarifi_PatchInstaller.exe'
   ('Patch';'.dat')&breakUpFile ^:_ ] 0;2e6;(fsize finalNm);finalNm;'';0

Now, get the list of names of the small pieces.

   nmlst=. 0{"1 dir 'Patch*.dat'

Check that the file names are in numeric order (by the number embedded in the file name).

   11{.nmlst=. nmlst /: ".&>5}.&.>_4}.&.>nmlst
   _11{.nmlst=. nmlst /: ".&>5}.&.>_4}.&.>nmlst

Check the sizes of these names and use the longest to calculate how many we can group to assemble the intermediate pieces.

10 10 10 10 10 10 10 10 10 10 11 11 11 11 11 11 11 11 11 11 11 11 11 ...

Partition the name list into sufficiently short groups so we can build the commands within the 255-character limit.

   +/ptn=. (#nmlst)$20{.1
   bb=. ptn<;.1 nmlst
2013 10 21 11 38 18.048

Generate the first level of commands to assemble the smallest files into intermediate, larger files.

   (;LF,~&.>join1 &.> (<"0 bb);&.>(<'Pa.tmp');&.>i.+/ptn) fappend 'Assemble529.bat'

Generate the final assembly of the intermediate pieces into the original file.

   (LF,~'copy /b ',(}.;'+',&.>(<'.tmp'),~&.>(<'Pa'),&.>":&.>i.+/ptn),' ',finalNm) fappend 'Assemble529.bat'

Remember to put together "send.ftp" file to transmit all the pieces over to the target machine.

Advanced Topics

Learning, teaching and promoting J


-- Devon McCormick <<DateTime>>