From J Wiki
Jump to: navigation, search

The following code supports the effort outlined here to create arbitrarily large data sets for realistic testing of parallel processing.

NB.* parallelProbSets.ijs: generate large random datasets for testing parallel programs.

load 'files dates'

NB. Handle TSV (Tab-separated values) files; 'TAB LF CR'=. 9 10 13{a.
NB.* readTSVFl: read tab-delimited file into variable.
readTSVFl=: ([:<;._1&> TAB ,&.> [:<;._2 [:(],LF#~LF~:_1{]) CR-.~fread)

NB.* getTSVInfo: apply arbitrary function to each .tsv var named.
getTSVInfo=: 1 : 'u readTSVFl y'
NB.EG lnkey=: (0&{"1) getTSVInfo&.>rrmlnms

NB.* getFlsInfo: apply arbitrary function y to each var read from file by v.
getFlsInfo=: 2 : 0
   if. nameExists 'SHOWGFI' do. if. SHOWGFI do. smoutput y,': ',":qts'' end. end.
   u v y
NB.EG lnkey=: ((0&{"1) getFlsInfo readTSVFl)&.>rrmlnms

appendTSVFl=: 4 : '(x,~readTSVFl y) writeTSVFl y'
writeTSVFl=: 4 : '(enc2TSV x) fwrite y'
enc2TSV=: 13 : ';(LF,~[:}:[:; TAB,&.>~])&.><"1 y'

NB. Case 0: present-value cashflows along different interest-rate paths.
genCFs=: 13 : '|:/:~"1]1000+100%~<.900000*(360,y)?@$0'
NB.EG cf0=. genCFs 1e4                       NB. 10,000 30-year cashflows

elimNeg=: 3 : '(100%~>:?0)+y-<./y'"1
maxRng=: 3 : 'y*(0.10+10%~?0)%>./y'"1

genIRs=: 3 : 0
   irp=. ([:+/\1000%~[: <:[:+:0?@$~360,~]) y NB. Rates change randomly
   irp=. maxRng elimNeg irp                  NB. Rates>0%, <:20%
   irp=. irp/:*/"1 >:irp                     NB. Order for neatness
NB.EG ir0=. genIRs 1e4                       NB. 10,000 30-year paths

wrCFIRFls=: 4 : 0
   (":&.>genCFs x) writeTSVFl '.tsv',~'CF0_',":y
   (":&.>genIRs x) writeTSVFl '.tsv',~'IR0_',":y
NB.EG 1e4 wrCFIRFls^:10]0     NB. Write 10 file sets w/10,000 records each

NB. Case 1: sort many records by date, movie, or user.
genDMURRecs=: 3 : '(100#.todate 70476+?y$6264),.(y,3)?@$20000 1e6 10'
NB.EG dmur0=. genDMURRecs 1e6

wrDMURFl=: 4 : '>:y[(":&.>genDMURRecs x) writeTSVFl ''.tsv'',~''DMUR0_'',":y'
NB.EG 1e6 wrDMURFl^:10]0      NB. Make 10 sets of 1 million records each