ShareMyScreen/AdventOfCode/2022/07/NoSpaceLeftOnDeviceJP

Entire solution

This one is quite a bit longer than the previous days:

i7=: ];._2 freads '07.txt'
NB. parse directory tree; returns par-child list
tree=:{{
tr=. 2 1$a: NB. tree; empty box is root
cwd=. a:    NB. current working directory is root at the moment
sz=. 1$0    NB. sizes; dirs initially empty
for_l. }.y do. NB. all lines but initial "cd /"
  l=.deb l  NB. remove extra blanks from line
  if. '$ cd ..'-:7{.l do. NB. up
    cwd=. tr ({.@[ {~ {:@[ i. ]) cwd
  elseif. '$ cd'-:4{.l do. NB. down (only relative, depth 1)
    NB. add new dir nn if not in tree
    if. ({:tr) -.@e.~ nn =. cwd ,&.(>`a:)'/',5}.l do.
      tr =. tr,. cwd,nn NB. add and cd down
      sz =. sz, 0 NB. dirsize is 0
    end.
    cwd=.nn NB. set new dir as current dir
  elseif. ('dir ',:'$ ls') -.@e.~ 4{.l do. 
    NB. nop for dir and ls
    NB. handle files
    'szf nnf'=.<;._1 ' ',l
    if. ({:tr) -.@e.~ nn=. cwd ,&.(>`a:)'/',nnf do.
      tr=. tr,. cwd,nn
      sz=. sz,".szf
    end.
  end.
end.
tr;sz       NB. return tree & sizes
}}
NB. complete sizes for directories; y tree;sz
dirsz=: {{
  NB. x tree; y sizes
  srt=. \: depth=. +/"1@:=&'/' >{:x NB. depth of children
  'par chd'=. >tr=. srt {"1 x       NB. depth sorted parent-child
  sz=.osz=.srt{y                    NB. sirted sizes+backup
  depth=. srt{depth                 NB. sorted item depths
  for_d. ~. depth do.  NB. for each unique depth
    di =. I. depth = d NB. indices of children @ depth rows
    NB.  sum sz per parent@d    update parent record in next level, i.e. where the parent is child.
    sz =. (par +//.&(di&{) sz) (chd i. ~.par{~di)} sz
  end.
  NB. sizes of only dirs (orig sz = 0)
  sz #~ 0=osz
}}&>/@tree

part1=:+/@(#~ 100000&>:)@dirsz
NB. Find smallest dir to be deleted to get 3e7 free space
'tot req'=: 7e7 3e7
NB. sortUP tk 1st + dif  req -(tot-used)
NB. note that sizes are ordered from low to high depth
part2=: (/:~ ([{~1 i.~ >:) req-tot-{:)@dirsz
(part1;part2)i7

Data exploration

After reading part 1, it's clear this is a parsing-heavy day, where the important part is, at least for part 1, the directory sizes.

Let's have a look at the data! We're given the following as testing data (as for the actual input, I just split it by line using cut):

tst=: ];._2 {{)n
$ cd /
$ ls
dir a
14848514 b.txt
8504156 c.dat
dir d
$ cd a
$ ls
dir e
29116 f
2557 g
62596 h.lst
$ cd e
$ ls
584 i
$ cd ..
$ cd ..
$ cd d
$ ls
4060174 j
8033020 d.log
5626152 d.ext
7214296 k
}}

I use the direct definition with the ")n" switch to form a noun. As described we have four types of lines in the input:

cd lines, starting with $ cd changing current working directory
ls lines, starting with $ ls that indicate the following lines list a directory
lines listing directory names
lines listing file sizes and file names

Going through each type, we can formulate some assumptions, and verify whether they hold for our input as well (important, as we will see...). Looking at the test data, we start at the root folder / and it looks like cd lines are always one step up or down, using only bare directory names (i.e. not containing any /, except for the very first cd /). When keeping only lines starting with $ cd from tst and i7, the count of '/' is one (Note that J makes writing such assumption tests REALLY fast due to its conciseness):

   '/' +/@:= , (#~ ('$ cd'-:4&{.)"1) tst
1
   '/' +/@:= , (#~ ('$ cd'-:4&{.)"1) i7
1

Knowing this, we know we don't have to complicate our code with multilevel steps or absolute paths.

One check I did not do, and which could have saved me a lot of time is whether cd operation lines going down are unique. Based on the test data it would seem so (showing number of cd lines and unique cd lines, when excluding $ cd .. lines):

   (,&#~.)(#~ (('$ cd ..'-.@-:7&{.)*.('$ cd'-:4&{.))"1) tst
4 4

However, our input looks different:

    (,&#~.)(#~ (('$ cd ..'-.@-:7&{.)*.('$ cd'-:4&{.))"1) i7
204 152

It would thus be wise to check for double entries and listings before adding them. Having missed this, my counts came out far too high initially.

Lastly, based on the test data, it looks like ls lines do not take directory names, that is, only the current working directory is listed so we don't have to care about $ ls <dirname>:

   ~. (#~ ('$ ls'-:4&{.)"1) tst 
$ ls
   ~. (#~ ('$ ls'-:4&{.)"1) i7 
$ ls

Part 1

Part 1 asks for the sum of the sizes of all directories that have a size of at most 100000. It seems directory sizes are the important bit, but part of the fun is not knowing what part 2 will ask, so one has to balance between effort and generality. This is the reason I chose for building the entire tree (verb tree) and calculating directory sizes (dirsz) separately.

Tree parsing

tree=:{{
tr=. 2 1$a: NB. tree; empty box is root
cwd=. a:    NB. current working directory is root at the moment
sz=. 1$0    NB. sizes; dirs initially empty
for_l. }.y do. NB. all lines but initial "cd /"
  l=.deb l  NB. remove extra blanks from line
  NB. cd UP in the tree
  if. '$ cd ..'-:7{.l do.
    NB. Find the parent of cwd, and make it cwd
    cwd=. tr ({.@[ {~ {:@[ i. ]) cwd
  NB. cd DOWN in the tree
  elseif. '$ cd'-:4{.l do.
    NB. nn being the absolute dir name to be added
    NB. add new dir nn if not in tree
    if. ({:tr) -.@e.~ nn =. cwd ,&.(>`a:)'/',5}.l do.
      tr =. tr,. cwd,nn NB. add and cd down
      sz =. sz, 0 NB. dirsize is 0
    end.
    cwd=.nn NB. set new dir as current dir
  elseif. ('dir ',:'$ ls') -.@e.~ 4{.l do. 
    NB. if not dir and ls: handle files
    'szf nnf'=.<;._1 ' ',l NB. size and name of file
    NB. add to tree and sizes if not present
    if. ({:tr) -.@e.~ nn=. cwd ,&.(>`a:)'/',nnf do.
      tr=. tr,. cwd,nn
      sz=. sz,".szf
    end.
  end.
end.
tr;sz       NB. return tree & sizes
}}

I'll go step by step through the tree verb shown above. This verb takes the input lines, and outputs two boxes (see line 30): the tree (as a 2 row matrix with boxed absolute names for parent/child pairs) and the corresponding sizes of the children. The reason I chose absolute file/dir names is that it's easy to see the depth, and prevents problems if directories with the same name occur in different places of the tree, as each will have a unique absolute name even if relative names would be the same. I'll leave the determination of sizes for directories for later, so in the output of tree, directory sizes will be 0. Lines 2-4 do the initialization of the tree and size array. Root will be represented as an empty boxed string (so it's easy to generate names for its children by appending '/',name), and has itself as child (turns out handy later on). The root is also set as the current working directory cwd.

Lines 5-29 iterate over all rows to parse the filesystem, except the first (since we already have set / as current directory). As we asserted before, lines starting with $ ls don't contain information, and can be ignored. Lines starting with dir can also be ignored: as long as they are not entered and their content listed, they are empty, and won't contribute to the directory sizes anyhow (initially I had included them, in case they would matter for part 2). Lines starting with $ cd are important as they change the current working directory. I treat the two cases, up (l. 7-10) and down navigation (l. 11-19) separately, as the latter prompts adding a directory to the tree if not yet present. If the line is not an ls or a dir line, it must be a file line, which is handled in lines 20-27.

Clearly, I didn't get everything right at once. A good way of debugging such explicit function is through the... debugger. As I did most of AoC on Android, I used the command line debugger, that can be started using dbr 1. Now, in a language like J that aims to allow a lot of functionality (e.g. non-scalar nouns as result of T-blocks of if statements), a bug does not always trigger an error, or does so in a spot where the actual bug can no longer be seen. A way that I find handy to set a quick debug stop is trigger an error myself, e.g. by adding a line reading +'a' at a place where I'd like to inspect state. More details on the debugger can be found on this page or, for the command line version only by entering dbhelp on the J prompt.

After iterating over all lines, all directories and files in the filesystem should be known, and the tree and sizes (of the children) are returned.

In retrospect, I could have used string representation of only the children, given that absolute names are used, but as this is Advent of Code, and the solution eventually works in a fraction of a second, I'm not rewriting it.

Directory sizes

The heavy lifting being over, we now have the tree and file sizes (all files being children). The directory sizes can be calculated knowing all that is (hierarchically) below them. I took a simple bottom-up approach (but one could imagine a recursive alternative):

Determine the sort order required to sort the tree and sizes by depth of the children using grade down on the number of slashes in the children's absolute names;
Then split the sorted tree in parent (par) and child (chd) variables using Multiple Assignment;
Sort the sizes and keep the originals (for selecting directories before output; remember size was 0 for directories?);
Sort the depths just calculated (glanced over the definition? Look at the first code line: definitions in J can occur in the middle of a sentence)
As the tree, sizes and depths have been sorted from high to low, i.e. the leaf nodes of the tree coming first, loop over each depth level, finding indices of children at each depth, and add their sizes, summed by parent to their parent's entry in the size array;
Lastly, only sizes of directories (i.e. where osz = 0) are selected to be returned.

dirsz=: {{
  NB. x tree; y sizes
  srt=. \: depth=. +/"1@:=&'/' >{:x NB. depth of children
  'par chd'=. >tr=. srt {"1 x       NB. depth sorted parent-child
  sz=.osz=.srt{y                    NB. sorted sizes+backup
  depth=. srt{depth                 NB. sorted item depths
  for_d. ~. depth do.  NB. for each unique depth
    di =. I. depth = d NB. indices of children @ depth rows
    NB.  sum sz per parent@d    update parent record in next level, i.e. where the parent is child.
    sz =. (par +//.&(di&{) sz) (chd i. ~.par{~di)} sz
  end.
  NB. sizes of only dirs (orig sz = 0)
  sz #~ 0=osz
}}&>/@tree NB. note: tree executed first; tr dirsz sz afterwards

The last line in the loop body needs some explanation: In the tree, files are children only. However, directories are both in the children as well as in the parents. For calculating the size of a directory, we need to sum the sizes of its children, and then store that size in the sz array for doing the same for the parent of this directory, i.e., at the index where the directory is a child. All this is done in the last line of the loop body: it's an amend updating the sz equivaltently to:

sz=. what where}sz

where is an array of indices where the parents of the children at depth d (i.e. indexed by di) are children themselves (with ~. ensuring every parent has one size): chd i. ~. par{~di;
what is an array of each directory having children at depth d, calculated by par +//.&(di&{) sz. This uses key, which applies, in this case, +/ to each set of sizes having the same parent. Doing this only for children at the current depth level can be done by selecting only these from both parents and sizes, using compose: &(di&{).

Now that the dirsz verb is defined, we can piece together the solution: the sum of directory sizes, selecting those of at most 100000, which translates literally to J as:

part1=:+/@(#~ 100000&>:)@dirsz

Part 2

Luckily, part 2 uses the same directory sizes as part 1. The question is: Knowing the total size of the device and the required free space for the update, what is the size of the smallest directory to be removed that would free up at least the required space?

The free space is the total space minus the space occupied by the root node in our tree. By construction, the size of / is the last item in the dirsz result (because of the sorting from high depth to low depth). The minimum space to be deleted is thus req-tot-{:dirsz i7. We're asked to delete the smallest possible directory, so we sort them up, and select the first that is larger or equal to our minimum space, and select it from the sorted sizes.

Putting this in a fork gives the following solution for part 2:

'tot req'=: 7e7 3e7 NB. total an. required space
part2=: (/:~ ([{~1 i.~ >:) req-tot-{:)@dirsz