User:Devon McCormick/Code/DirectoryParsing
This is code I use daily for backing up whatever work I've done most recently. It parses the contents of my entire drive into six vectors:
- FLNMS - file names. The following three vectors correspond to this one.
- FLDTS - file dates.
- FLSZS - file sizes in bytes.
- FLPARENT - index into DIRNMS giving the name of the directory in which a file resides.
- DIRNMS - list of full directory path names.
- DIRDEP - predecessor tree corresponding to DIRNMS giving index of parent of each entry.
The top-level invocation looks like this:
'FLNMS FLDTS FLSZS FLPARENT DIRNMS DIRDEP'=. PllDirInfoEG 'C:\'
In this example, we're parsing the contents of the C: drive, in parallel, into the six vectors.
---
Code
PllDirInfoEG=: 3 : 0 NB.* PllDirInfoEG: gather all dirs'' info accumulated in parallel. 'tmpd srcd flnms fldts flszs'=. pllParseDir exeID=. 'J7Pll';y=. endSlash jpathsep y flszs=. ;flszs [ fldts=. cvtTS21Num&>fldts flparent=. 0$~#flszs dirdep=. _1,0$~#srcd [ dirnms=. y;srcd vnms=. <;._1 ' FLNMS FLDTS FLSZS FLPARENT DIRNMS DIRDEP' for_ii. i. #tmpd do. wait 2 [ checkIfDonePPD >{.exeID NB. Win7 oddness requires these... vals=. (3!:2)&.>fread&.>(<endSlash >ii{tmpd),&.>vnms,&.><'.DAT' 'flnms fldts flszs'=. (flnms;fldts;<flszs),&.>3{.vals flparent=. flparent,(#dirnms)+>3{vals dirnms=. dirnms,>4{vals ddep=. (#dirnms)|dirnms i. (]{.~PATHSEP_j_ i:~])&.>dirnms dirdep=. (ddep=i.#ddep)}ddep,:_1 end. cleanupTempDirs 'PllDTmp' 'dirnms flparent'=. adjustDirInfo dirnms;<flparent dirdep=. dirDependencies dirnms flnms;fldts;flszs;flparent;dirnms;<dirdep NB.EG 'flnms fldts flszs flparent dirnms dirdep'=. PllDirInfoEG 'D:/' )
The following code spins off a separate copy of the interpreter for each top-level directory on the disk. These copies are uniquely named but could be the same one. This is to distinguish the multiple processes but may be unnecessary.
For example, we could do the following to create 100 copies of the J executable prefixed by "J7Pll", e.g. "J7Pll0.exe", "J7Pll1.exe", etc.
bindir=. 'c:/program files (x86)/j64-701/bin/' (fread bindir,jconsole.exe') fwrite (<bindir,'J7Pll'),&.>(":i.100),&.><'.exe'
The global variable "EXCLUDEDIRS" is a list of directories to exclude from the backup.
pllParseDir=: 3 : 0 NB.* pllParseDir: launch sub-tasks to parse each sub-directory under y. 'exeID0 y'=. y if. 0=#y do. y=. BASEDSK,'/' end. srcDirs=. jd dir '*',~y=. endSlash y srcDirs=. srcDirs-.EXCLUDIRS 'flnms fldts flszs'=. <"1|:0 1 2{"1 jfi dir '*',~y cleanupTempDirs tmpd=. 'PllDTmp' tmpDirs=. (<BASEDSK,'/Temp/',tmpd),&.>":&.>i.#srcDirs NB. Don't end w/slash->escapes " args=. SUBTASK,&.>(<' SRCDIR '),&.>dq&.>(<y),&.>srcDirs NB. One command/dir args=. args,&.>(<' RESULTDIR '),&.>dq&.>tmpDirs args=. (<'//';'/') rplc~&.>jpathsep&.>args exeIDs=. buildExe exeID0;#args IsFORKED=. 1 fork&>exeIDs,&.>(<' ',IFJ6#'-jijx '),&.>args NB. watchTillDone exeID0;1;'';_1 NB. wait 2 [ checkIfDonePPD exeID0 NB. cleanupTempDirs tmpd tmpDirs;srcDirs;flnms;fldts;<flszs NB.EG 'tmpd srcd flnms fldts flszs'=. pllParseDir 'D:/' ) cleanupTempDirs=: 3 : 0 if. 0<#dd=. dir ttop,y,'*' [ ttop=. BASEDSK,'/Temp/' do. tmpdirs=. dospathsep&.>(<ttop),&.>jd dd shell&>(<'echo Y|del '),&.>tmpdirs,&.><'\*' 1[shell&>(<'rmdir '),&.>tmpdirs end. ) buildExe=: 3 : 0 'pfx nn'=. y exe=. (<'.exe" '),~&.>(<'/',pfx),&.>":&.>i.nn exe=. jpathsep&.>('"',&.><jpath '~bin'),&.>exe assert. *./fexist&>'"'-.~&.>exe NB. Ensure executables are there. NB.EG exes=. buildExe 'J7Pll';2 NB. 2 strings -> execute distinctly-named J exes. ) NB.* cvtTS21Num: convert 7-element timestamp to single number. cvtTS21Num=: ((100 #. 3 {. ]) + 86400000 %~ 24 60 60 1000 #. 3 }. 7 {. ])"1 NB.* checkIfDonePPD: check if named processes have finished; requires (Windows) "pslist" NB. to list running processes. checkIfDonePPD=: 3 : 'while. -.+./''not found'' E. shell ''pslist '',y do. wait 1 end.' NB.* adjustDirInfo: fix duplication in path separator character. adjustDirInfo=: 3 : 0 'dirnms flparent'=. y dirnms=. jpathsep&.>dirnms whdd=. dirnms e.~({.dirnms),&.>dirnms flparent=. flparent-(flparent>0)*+/whdd dirnms=. (<'//';'/') rplc~&.>dirnms#~-.whdd dirnms;<flparent ) jpathsep=: '/'&(('\' I.@:= ])}) dirDependencies=: 3 : 0 NB.* dirDependencies: convert list of full paths to index vector form of tree NB. showing directory and subdirectories as parent-child relations. ps=. '/' [ y=. jpathsep&.>y y=. y}.~&.>-ps={:&>y NB. No terminal separators dd=. (] i. (]{.~ps i:~])&.>) y NB. Drop part of path after last separator dd=. (_1) (I. dd=i.#dd)}dd NB. Dirs' dependency tree: parent indexes dd=. (_1) (I. dd>:#dd)}dd NB. "_1" is root (no parent). NB.EG dirDependencies 'D:';'D:\a1';'D:\a1\b2';'D:\a2';'D:\a2\b3';'D:\a1\b4' NB. _1 0 1 0 3 1 NB. Parent index for each subdir; _1 for no parent. )
The following should be customized to exclude particular files and sub-directories from being considered for backing up. These names are all included in the parse vectors but will be removed from consideration when deciding which files to copy for backup purposes.
NB.* ExcludeUsual: list of usual files and directories to exclude from backup. ExcludeUsual=: <;._2 CR-.~jpathsep 0 : 0 [ExcludeDirs] .emacs.d .eshell foo\bar example\deeper\structure [ExcludeFiles] NB. Files to exclude from any directory ~ *.bz2 .history .bash_history .emacs~ AUTOEXEC.BAT CONFIG.SYS IO.SYS MSDOS.SYS NTDETECT.COM NTLDR SECURITY SOFTWARE SYSTEM SYSTEM.ALT Setup.log Thumbs.db USER0000.log boot.ini default.dlf eventlog.log fold0000.frm git_shell_ext_debug.txt hiberfil.sys installer_debug.txt pagefile.sys pspbrwse.jbf quote.flag sysiclog.txt ) NB.*** Need to actually use the wildcards in the file list above!!!