User:Matthew Brand/RAMDisk

From J Wiki
Jump to navigation Jump to search

A problem I ran into when a program inputs and outputs tens of thousands of files is that the system spent most of it's time doing Unix pdflush and kjournal and not much time actually doing the calculations! I do not think that is a J issue, it is an OS issue. Apparently I could have tuned the OS - but who in their right mind wants to do that - or ask a client to do it on each new machine!! I wrote this RAMDisk utility to compress and cache file reads and writes. It compresses the data before writing and uncompresses on reading.

It works for me. When my program runs with the RAMDisk, the CPU is at 100% and takes a relatively short time to complete. Without it, the CPU is at around 10%, the disk is crunching itself into oblivion, and the program does not complete in a reasonable amount of time.

Because the RAMDisk writes many files in a block at the end of the program (or in chuncks during depending on maxsize_RAMDisk_), without reads inbetween, the flushing of files is much quicker compared to writing the files during program execution.

To compare the two methods you can set:

comp_RAMDisk_ =: comp_utils_
ucomp_RAMDisk_ =: ucomp_utils_

which will by-pass the caching bit and io directly to the disk.

There is a one off thing-to-do, you need to run this code in a fresh J session. It creates a file which contains an empty symbol table:

load 'arc/zip/zbuffer'
typecheck =: =3!:0
isboxed =: 32&typecheck
boxifopen =: <^:(-.@isboxed)
compressData =: 3!:1@(#;1&zput)@(3!:1)
comp  =: compressData : ( comp@] (1!:2) (boxifopen@[) ) NB. compress to disk
(jpath,'~user/classes/emptySymbolTable.cmp') comp 0 s: 10

The RAMDisk program:

NB. Some utilities
cocurrent 'utils'
load 'arc/zip/zbuffer'
typecheck =: =3!:0
isboxed =: 32&typecheck
boxifopen =: <^:(-.@isboxed_utils_)
NB. compress anything. Use level 1 for speed (level 9 is similar compression ratio.)
compressData =: 3!:1@(#;1&zput)@(3!:1)
unCompressData =: 3!:2@(0&{:: zget (1&{::))@(3!:2)`(ucomp@])@.(32&=@(3!:0))
comp  =: compressData : ( comp@] (1!:2) (boxifopen@[) ) NB. compress to disk
ucomp =: (unCompressData@(1!:1)@boxifopen) : [: NB. decompress from disk
exists  =: 0&<@#@(1!:0)@boxifopen NB. does file or directory y exist
newdir  =: 1!:5@boxifopen NB. create new directory y
createPath =: newdir_utils_ ^:(-.@:exists_utils_@:(_1&}.)) NB. create a path if it does not exist
NB. Create entire tree if required without bitching. :: 0: required because
NB. it does what it should then outputs an error ... :: 0: ignores that.
createPathTree =:  (  (createPath_utils_ f.)@ ,&'/' @ ; @ ]\ @:(_1&}.) @: (<;.1) ) :: 0:



NB. The RAMDisk
cocurrent 'RAMDisk'

NB. <User parameters>
SOLESYMUSER =: 0
maxsize =: 2^27
NB. <\User parameters>

instructions =: 0 : 0
You need to create an "emptySymbolTable" file.
Start a fresh J session and execute these lines:
load 'arc/zip/zbuffer'
typecheck =: =3!:0
isboxed =: 32&typecheck
boxifopen =: <^:(-.@isboxed)
compressData =: 3!:1@(#;1&zput)@(3!:1)
comp  =: compressData : ( comp@] (1!:2) (boxifopen@[) ) NB. compress to disk
(jpath,'~user/classes/emptySymbolTable.cmp') comp 0 s: 10
)

ace =: a: "_
fromsym =: 5&s:

init =: 3 : 0
data =: '' [ y
keys =: ''
keylu =: keys&i.
resetSymbolTable ''
size =: 0
)

resetSymbolTable =: 3 : 0
if. SOLESYMUSER do.
       try.
               10 s: ucomp_utils_  jpath,'~user/classes/emptySymbolTable.cmp'
       catch.
               smoutput instructions
       end.
end.
)

ucomp =: 3 : 0
try.
       data =. (unCompressData_utils_@:>@:fromsym@:({&data)@:keylu@:s:)@:boxifopen_utils_
y
catch.
       try.
               NB. if it is not on disk then throw
               data =. ucomp_utils_@:boxifopen_utils_ y
       catch.
               throw.
       end.
end.
)

comp =: 4 : 0
ii =. keylu key =. s: boxifopen_utils_ x
'dsym dlen' =.  ((s:@:<);#) compressData_utils_ y
if. ii = # keys do.
       data =: data, dsym
       keys =: keys , key
       keylu =: keys&i.
else.
       data =: dsym ii} data
end.
size =: size + dlen
flush ''
size
)

flush =: 3 : 0
if. maxsize <: size do.
       ks =. fromsym keys
       for_i. ks do.
               createPathTree_utils_ > fpath =. i
               fpath 1!:2~ > fromsym i_index { data
       end.
       init ''
end.
)

report =: 3 : 0
( <"0 keys) ,. $&.> (5&s:) data [ y
)

cocurrent 'base'

A Simple example:

NB. Simple EXAMPLE:


NB. set parameters
SOLESYMUSER_RAMDisk_ =: 1 NB. allow program to clear the symbol table
maxsize_RAMDisk_ =: 10000 NB. will flush to disk when size reaches 10000 bytes ( use larger value in practice).
init_RAMDisk_ '' NB. clear the RAMDisk

'/tmp/fileA.cmp' comp_RAMDisk_ i.10 NB. cache i.10 in file '/tmp/fileA.cmp', output is size of the buffer
report_RAMDisk_ ''

'/tmp/fileB.cmp' comp_RAMDisk_ 5 4 3 NB. store 5 4 3 in '/tmp/fileB.cmp'
'/tmp/fileC.cmp' comp_RAMDisk_ 'some text' NB. more data...
'/tmp/e1/e2/e4/fileB.cmp' comp_RAMDisk_ <"0 i. 10 NB. more data...
report_RAMDisk_ ''

ucomp_RAMDisk_ '/tmp/fileC.cmp' NB. retrieve data (either from cache, or disk ... if not found then a:)

'/tmp/e1/e2/fileA.cmp' comp_RAMDisk_ 'some of this is text';0;0;1 NB. more data
'/tmp/e1/e2/fileA.cmp' comp_RAMDisk_ i.10000 NB. enough data to trigger a flush to disk
report_RAMDisk_ ''

ucomp_RAMDisk_ '/tmp/e1/e2/fileA.cmp' NB. retrieve some data

maxsize_RAMDisk_ =: 0 NB. force any remaining data to the disks
flush_RAMDisk_ ''
init_RAMDisk_ '' NB. clear the symbol table (and RAMDisk)

<coming soon> An example with thousands of io and comparison.