User:Devon McCormick/Code/MemoryMappedFiles

From J Wiki
Jump to navigation Jump to search

Memory-mapping files is a way to leverage the operating system's paging mechanism to handle large user files. It's very effective and useful but should be handled carefully as a small error can lock up your machine, forcing a reboot. Here's an example of using a memory-mapped file, showing the impact it has on memory usage.

   load 'jmf'
   flnm=. 'C:\Temp\Big1e6x1e2File.csv'
   JCHAR map_jmf_ 'base';flnm    NB. "base" is the mapped variable.
   $base
1061694252

Here we loaded the memory-mapping library "jmf" and mapped the variable "base" to a file. We see that the shape of the variable and the size of the file are the same.

Next we'll convert the mapped CSV file into a boxed matrix:

   6!:2 'bp=. <;._1&>'','',&.><;._2 ] LF (] , [ #~ [ ~: [: {: ]) CR-.~base'
24.016
   $bp
1068792 100
   3 3{.bp
+---+---+---+
|0  |2  |4  |
+---+---+---+
|360|364|368|
+---+---+---+
|760|764|768|
+---+---+---+

This takes about 24 seconds to create the boxed matrix "bp" on a 2933 MHz machine with 64 GB RAM.

Memory Usage

Looking at the shape and size of this new boxed array as well as the memory our J session is consuming after defining the matrix:

   $bp
1068792 100
   7!:5<'bp'    NB. How many bytes does the array use?
1.47543e10
   7!:1''       NB. How much RAM is this session using?
15206583712 19671133312
   10^.7!:1''
10.182 10.2938

Here we notice that the boxed matrix is over 14 GB, and our memory consumption reflects this. The foreign "7!:1" returns a two element vector denoting memory used and maximum memory used in this session, respectively.

This looks like the resource consumption by J is minimal over and above the size of the large array.

Danger of Mapped Files

While it's useful to be able to treat the contents of a large file as a J variable, be aware that changes to the variable will change the file and some sorts of normal J operations are not handled well. For instance, say we want to box the lines of the large file but do this directly to the mapped variable:

DontBoxMappedFile-flnm.jpg

Here we see that we crashed J by trying to box the lines of the file directly. Instead of doing this, we need to assign a new variable to the boxed lines as we did above.

   load 'jmf'
   JCHAR map_jmf_ 'Holt';flnm
   6!:2 'hl=. <;._2 ] LF (] , [ #~ [ ~: [: {: ]) CR-.~Holt'
3.37912
   $hl
4357860
   >3{.hl
11/11/2016	#N/A	CFROICHG	-1.9
11/11/2016	#N/A	CFROIKM	2.52 
11/11/2016	#N/A	PERTOBST	-50

This completes quickly and, even though the result was fairly large (625 MB), the overhead memory used by J remains minimal.

We see that the allocation below is much smaller than the one above, prior to the memory-mapping and formation of the boxed-lines version of the data.

   7!:5 <'hl'
6.25543e8

As mentioned earlier, we should note that although using a different variable name works in this case, if our file had been large enough to test the limits of our available RAM, we risk not only being unable to box its cells but also freezing up our machine so badly as to require a "hard" reboot.

Changing Mapped Variable Size

Interesting things happen also if we do things to a mapped variable that (try to) change its size.

In this next section, we try to shrink the size of the file by truncating the variable.

   Alpha_j_
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
   Alpha_j_ fwrite 'testmap.txt'
52

   JCHAR map_jmf_ 'test';'testmap.txt'
   #test
52
   test
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
   test=. 26{.test
   $test
26
   test
ABCDEFGHIJKLMNOPQRSTUVWXYZ
   JCHAR unmap_jmf_ 'test'
0

   fsize 'testmap.txt'
52
   JCHAR map_jmf_ 'test';'testmap.txt'
   #test
52

This clearly does not work as we might have hoped. If we try to make the variable larger, we get an error as seen here.

   test=. test,'0123456789'
|allocation error
|   test    =.test,'0123456789'

However, if we alter the variable in a way that keeps it the same size or smaller, it appears to work properly as far as the variable is concerned but the file may not end up as we might expect.

   #test=. 26{.test
26
   test=. test,'0123456789'
   test
ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789
   JCHAR unmap_jmf_ 'test'
0
   fsize 'testmap.txt'
52
   fread 'testmap.txt'
ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789lmnopqrstuvwxyz

We see that the characters added after the truncation show up in the middle of the file.

If we really want to truncate the data, we have to put the result somewhere else, as seen here.

   JCHAR map_jmf_ 'test';'testmap.txt'
   test=. 26{.test
   (test,'0123456789') fwrite 'newtestmap.txt'
36
   JCHAR unmap_jmf_ 'test'
0
   JCHAR map_jmf_ 'new';'newtestmap.txt'
   $new
36
   new
ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789

The good news here is that this sort of thing does not seem to blow out J's memory allocation.

   fsize flnm=. 'C:\Temp\BigFile.txt'
2086023107
   10^.fsize flnm
9.31932
   JCHAR map_jmf_ 'big';flnm
   $big
2086023107

So, if we want to cut down the size of this 2 GB file, we have to write a new file.

   ((1e9{.big),' This is only about 1 GB...') fwrite nsbflnm=. 'C:\Temp\NotSoBigFile.txt'
1000000027