From J Wiki
Jump to navigation Jump to search

An Example of Converting J Code to Use Multiple Cores

There's been a lot of talk over the past few years about the tremendous possibilities inherent in the increasingly available multi-core PCs in common use. Here we'll see an example of both how simple it is to take advantage of multiple cores and how much extra complication this can add to code.

The application I wanted to change to make full use of the dual cores on my main home laptop is one I use every day and which runs for five to twenty minutes every time I do it. It's a very simple program to rotate photos I've taken during the day to orient them topside up; most of the pictures I take are “from the hip”and so are 90 degrees from upright. The original code looks like this: first we load some standard libraries for writing .CSV files, general file utilities, and image processing.

load 'csv task filefns images'
coinsert 'fldir'
PHOTOP=: 'c:\amisc\pix\Photos\'     NB. Usual top-level directory for photos.
jpeg_quality_ima3_=: 99             NB. Maintain high quality for JPGs.

The basic routine rotates a photo:

NB.* regflipphoto: flip a .jpg pic 1/4 counterclockwise as is most common.
regflipphoto=: 3 : 0
   (1 0 2|:|."2 a.{~read_image y) write_image y

This gets called for all the .JPG files in a directory:

NB.* regflipphotos: flip all .jpg in dir 1/4 counterclockwise.
regflipphotos=: 3 : 0
   y=. endSlash y
   regflipphoto&.>(<y),&.>{."1 dir y,'*.jpg'
NB.EG 6!:2 'regflipphotos ''c:\pix\'''

endSlash=: ],'\'-.{:      NB.* endSlash: ensure terminal dir separator.

So, the only argument I supply is the name of the directory containing the photos to be “flipped” – this is an important detail as it speaks to one of the most important considerations for parallelizing code: the amount of data movement.

The parallel version of this code adds considerable complexity to this simple set of routines. The new design requires arguments more complicated than simply the directory name as required by the original version. When experimenting with proving the concept of this routine, I manually divided a group of files in a directory between two directories and ran the original code on each of the partial groups simultaneously, using the original code in two separate sessions. This changed the performance profile from this EgNonParallelCPUusageStartFinish1.png

(here we see full CPU usage, on the right, of one core and slight usage, on the left, of the other) to this EgParallel2xCPUusageStartFinish1.png

where we see both cores being fully used.

However, moving the files into separate directories to minimize changing the existing code will incur a substantial penalty of file IO. To keep this IO to a minimum, it's necessary to add complexity to the function's arguments. If we specify the file names explicitly in addition to the directory name, we need move only a very small amount of data to each independent thread.

NB.* parcelOutFlipping: run x separate threads to flip photos in dir y.
parcelOutFlipping=: 4 : 0
   fls=. 0{"1 dir '*.jpg',~y=. endSlash y
   fls=. x evenlyPartition fls     NB. Put filenames into evenly-divided lists.
   scrfls=. (<'.ijs'),~&.>(<'FlipScript'),&.>":&.>i.x
   1!:44 y [ svdir=. 1!:43 ''      NB. Move to target dir to flip photos.
   (<fread jpath '~Code/sampFlipPhotos.ijs') fwrite&.>scrfls
   (<y) appendArgsToScript&.>scrfls;&.><"0 fls
   exe=. (<'"',(jpath '~bin'),'\j.exe" -jijx "'),&.><endSlash 1!:43 ''
   1!:44 svdir
NB.EG parcelOutFlipping '"'-.~>0{info

NB.* evenlyPartition: evenly partition y into x pieces w/smaller at end.
evenlyPartition=: 4 : '(x(([:i.]) e. ([:i.[) * [:>.%~)#y)<;.1 y'
NB.* delFlIfExist: delete file if it exists.
delFlIfExist=: 3 : 'if. fexist y do. ferase y end.'

NB.* appendArgsToScript: for photo directory x, add filename info to scripts.
appendArgsToScript=: 4 : 0
   'scrflnm flnms'=. y   NB. Script file names, photo file names
   (LF,~LF,~'IPD=: <''',x,'''') fappend scrflnm   NB. Photo dir as global
   (')',~;LF,~&.>'PHFLS=: <;._2]0 : 0';flnms) fappend scrflnm
   (LF,LF,~'onlyRuntime ''''') fappend scrflnm    NB. Run code if standalone.

NB.EG Append code like this, with different "PHFLS", to multiple script files.
NB. IPD=: 'c:\amisc\pix\Photos\2010Q1\20100311\'
NB. PHFLS=: 0 : 0
NB. DSCF3837.jpg
NB. DSCF3838.jpg
NB. DSCF3839.jpg
NB. DSCF3966.jpg
NB. DSCF3967.jpg
NB. DSCF3968.jpg
NB. )

fork_jtask_=: 3 : 0
0 fork y
ph=. CreateProcess y
if. x do. Wait ph;x end.
CloseHandle ph

This code uses this file as a template to be modified with the specific global information as determined by parcelOutFlipping above.

NB.* sampFlipPhotos.ijs: sample dedicated photo flipper.

load 'task images filefns'
coinsert 'fldir'
jpeg_quality_ima3_=: 99

NB.* regflipphoto: flip a .jpg pic 1/4 counterclockwise as is most common.
regflipphoto=: 3 : 0
   (1 0 2|:|."2 a.{~read_image y) write_image y

onlyRuntime=: 3 : 0
NB.* onlyRuntime: only invoke if not loaded via interactive session.
   if. (5{.&.>'-jijx';<'-rt') +./ . e. 5&{.&.>tolower&.>ARGV_z_ do.
       1!:44 >IPD [ arg=. 1|.'""',}.;' ',&.>ARGV_z_ [ flout=. 'time.out'
       (LF,~'Start flips (for ',arg,') @ ',":qts'') fappend flout
       tm=. ":6!:2 'regflipphoto&.>IPD,&.>PHFLS'
       (LF,~'Finished flips (for ',arg,') in ',tm,' seconds @ ',":qts'') fappend flout
       2!:55 ''

Sample Run

Here we see the set-up in which the pictures are moved from the camera chip to a directory named based on the dates of the pictures on the chip.

   info,<6!:2 'info=: getFrom1Drive ''E:\'''         NB. or
|"c:\amisc\pix\Photos\2010Q2\20100413"|1 0 321|1251988446|159.29974|

Here are some estimates of how long this would take, based on the size and number of files, using the original code.

   dd=. endSlash '"'-.~>0{info
   7.8503603e_7 3.0415917*jpgSzs dd
982.85604 976.35094

The code has to be run differently than the code it replaces. Here we see an invocation followed by a look at the timing file appended to by the distinct invocations. However, note that though both threads are running, they interfered with each other by failing to append serially to the output file: there's only a single entry.

   2 parcelOutFlipping dd   NB.  2 parallel flippers

   fsize dd,'time.out'
   fread dd,'time.out'
Start flips (for "j.exe -jijx c:\amisc\pix\Photos\2010Q2\20100413\FlipScript0.ijs") @ 2010 4 13 22 25 44.031

Once this has completed, we can see how long each thread took:

   fsize dd,'time.out'
   fread dd,'time.out'
Start flips (for "j.exe -jijx c:\amisc\pix\Photos\2010Q2\20100413\FlipScript0.ijs") @ 2010 4 13 22 25 44.031
Finished flips (for "j.exe -jijx c:\amisc\pix\Photos\2010Q2\20100413\FlipScript1.ijs") in 447.78272 seconds @ 2010 4 13 22 33 11.812
Finished flips (for "j.exe -jijx c:\amisc\pix\Photos\2010Q2\20100413\FlipScript0.ijs") in 449.97146 seconds @ 2010 4 13 22 33 14.093

From this, we see how much more complicated the new code is to use.