User:Devon McCormick/Code/DirectoryParsing

From J Wiki
Jump to navigation Jump to search

This is code I use daily for backing up whatever work I've done most recently. It parses the contents of my entire drive into six vectors:

  • FLNMS - file names. The following three vectors correspond to this one.
  • FLDTS - file dates.
  • FLSZS - file sizes in bytes.
  • FLPARENT - index into DIRNMS giving the name of the directory in which a file resides.
  • DIRNMS - list of full directory path names.
  • DIRDEP - predecessor tree corresponding to DIRNMS giving index of parent of each entry.

The top-level invocation looks like this:

'FLNMS FLDTS FLSZS FLPARENT DIRNMS DIRDEP'=. PllDirInfoEG 'C:\'

In this example, we're parsing the contents of the C: drive, in parallel, into the six vectors.

---

Code

PllDirInfoEG=: 3 : 0
NB.* PllDirInfoEG: gather all dirs'' info accumulated in parallel.
   'tmpd srcd flnms fldts flszs'=. pllParseDir exeID=. 'J7Pll';y=. endSlash jpathsep y
   flszs=. ;flszs [ fldts=. cvtTS21Num&>fldts
   flparent=. 0$~#flszs
   dirdep=. _1,0$~#srcd [ dirnms=. y;srcd
   vnms=. <;._1 ' FLNMS FLDTS FLSZS FLPARENT DIRNMS DIRDEP'
   for_ii. i. #tmpd do.
       wait 2 [ checkIfDonePPD >{.exeID  NB. Win7 oddness requires these...
       vals=. (3!:2)&.>fread&.>(<endSlash >ii{tmpd),&.>vnms,&.><'.DAT'
       'flnms fldts flszs'=. (flnms;fldts;<flszs),&.>3{.vals
       flparent=. flparent,(#dirnms)+>3{vals
       dirnms=. dirnms,>4{vals
       ddep=. (#dirnms)|dirnms i. (]{.~PATHSEP_j_ i:~])&.>dirnms
       dirdep=. (ddep=i.#ddep)}ddep,:_1
   end.
   cleanupTempDirs 'PllDTmp'
   'dirnms flparent'=. adjustDirInfo dirnms;<flparent
   dirdep=. dirDependencies dirnms
   flnms;fldts;flszs;flparent;dirnms;<dirdep
NB.EG 'flnms fldts flszs flparent dirnms dirdep'=. PllDirInfoEG 'D:/'
)

The following code spins off a separate copy of the interpreter for each top-level directory on the disk. These copies are uniquely named but could be the same one. This is to distinguish the multiple processes but may be unnecessary.

For example, we could do the following to create 100 copies of the J executable prefixed by "J7Pll", e.g. "J7Pll0.exe", "J7Pll1.exe", etc.

   bindir=. 'c:/program files (x86)/j64-701/bin/'
   (fread bindir,jconsole.exe') fwrite (<bindir,'J7Pll'),&.>(":i.100),&.><'.exe'

The global variable "EXCLUDEDIRS" is a list of directories to exclude from the backup.

pllParseDir=: 3 : 0
NB.* pllParseDir: launch sub-tasks to parse each sub-directory under y.
   'exeID0 y'=. y
   if. 0=#y do. y=. BASEDSK,'/' end.
   srcDirs=. jd dir '*',~y=. endSlash y
   srcDirs=. srcDirs-.EXCLUDIRS
   'flnms fldts flszs'=. <"1|:0 1 2{"1 jfi dir '*',~y
   cleanupTempDirs tmpd=. 'PllDTmp'
   tmpDirs=. (<BASEDSK,'/Temp/',tmpd),&.>":&.>i.#srcDirs   NB. Don't end w/slash->escapes "
   args=. SUBTASK,&.>(<' SRCDIR '),&.>dq&.>(<y),&.>srcDirs  NB. One command/dir
   args=. args,&.>(<' RESULTDIR '),&.>dq&.>tmpDirs
   args=. (<'//';'/') rplc~&.>jpathsep&.>args
   exeIDs=. buildExe exeID0;#args
   IsFORKED=. 1
   fork&>exeIDs,&.>(<' ',IFJ6#'-jijx '),&.>args
NB.   watchTillDone exeID0;1;'';_1
NB.   wait 2 [ checkIfDonePPD exeID0 NB. cleanupTempDirs tmpd
   tmpDirs;srcDirs;flnms;fldts;<flszs
NB.EG 'tmpd srcd flnms fldts flszs'=. pllParseDir 'D:/'
)

cleanupTempDirs=: 3 : 0
   if. 0<#dd=. dir ttop,y,'*' [ ttop=. BASEDSK,'/Temp/' do.
       tmpdirs=. dospathsep&.>(<ttop),&.>jd dd
       shell&>(<'echo Y|del '),&.>tmpdirs,&.><'\*'
       1[shell&>(<'rmdir '),&.>tmpdirs
   end.
)

buildExe=: 3 : 0
   'pfx nn'=. y
   exe=. (<'.exe" '),~&.>(<'/',pfx),&.>":&.>i.nn
   exe=. jpathsep&.>('"',&.><jpath '~bin'),&.>exe
   assert. *./fexist&>'"'-.~&.>exe      NB. Ensure executables are there.
NB.EG exes=. buildExe 'J7Pll';2  NB. 2 strings -> execute distinctly-named J exes.
)

NB.* cvtTS21Num: convert 7-element timestamp to single number.
cvtTS21Num=: ((100 #. 3 {. ]) + 86400000 %~ 24 60 60 1000 #. 3 }. 7 {. ])"1

NB.* checkIfDonePPD: check if named processes have finished; requires (Windows) "pslist"
NB. to list running processes.
checkIfDonePPD=: 3 : 'while. -.+./''not found'' E. shell ''pslist '',y do. wait 1 end.'

NB.* adjustDirInfo: fix duplication in path separator character.
adjustDirInfo=: 3 : 0
   'dirnms flparent'=. y
   dirnms=. jpathsep&.>dirnms
   whdd=. dirnms e.~({.dirnms),&.>dirnms
   flparent=. flparent-(flparent>0)*+/whdd
   dirnms=. (<'//';'/') rplc~&.>dirnms#~-.whdd
   dirnms;<flparent
)

jpathsep=: '/'&(('\' I.@:= ])})

dirDependencies=: 3 : 0
NB.* dirDependencies: convert list of full paths to index vector form of tree
NB. showing directory and subdirectories as parent-child relations.
   ps=. '/' [ y=. jpathsep&.>y
   y=. y}.~&.>-ps={:&>y           NB. No terminal separators
   dd=. (] i. (]{.~ps i:~])&.>) y NB. Drop part of path after last separator
   dd=. (_1) (I. dd=i.#dd)}dd     NB. Dirs' dependency tree: parent indexes
   dd=. (_1) (I. dd>:#dd)}dd      NB. "_1" is root (no parent).
NB.EG dirDependencies 'D:';'D:\a1';'D:\a1\b2';'D:\a2';'D:\a2\b3';'D:\a1\b4'
NB. _1 0 1 0 3 1         NB. Parent index for each subdir; _1 for no parent.
)

The following should be customized to exclude particular files and sub-directories from being considered for backing up. These names are all included in the parse vectors but will be removed from consideration when deciding which files to copy for backup purposes.

NB.* ExcludeUsual: list of usual files and directories to exclude from backup.
ExcludeUsual=: <;._2 CR-.~jpathsep 0 : 0
[ExcludeDirs]
.emacs.d
.eshell
foo\bar
example\deeper\structure

[ExcludeFiles]   NB. Files to exclude from any directory
~
*.bz2
.history
.bash_history
.emacs~
AUTOEXEC.BAT
CONFIG.SYS
IO.SYS
MSDOS.SYS
NTDETECT.COM
NTLDR
SECURITY
SOFTWARE
SYSTEM
SYSTEM.ALT
Setup.log
Thumbs.db
USER0000.log
boot.ini
default.dlf
eventlog.log
fold0000.frm
git_shell_ext_debug.txt
hiberfil.sys
installer_debug.txt
pagefile.sys
pspbrwse.jbf
quote.flag
sysiclog.txt
)
NB.*** Need to actually use the wildcards in the file list above!!!