ShareMyScreen/AdventOfCode/2022/07/NoSpaceLeftOnDevice

From J Wiki
Jump to navigation Jump to search

The Problem << >>

Now a problem of substance. I will have to write a program for this one, so I use File>New temp in the menu bar to open a script. I start with the commentary:

NB. AOC 2022 Problem 7
proclog =: {{   NB. x is the threshold, y is the boxed lines from the log
NB. Process lines one by one, producing list of path;filename;length

NB. Remove duplicate files

NB. Calculate size of each directory

NB. Return the total of sizes below threshold
}}

To begin with I want info on the individual files. I can't hope to execute the lines from the log, because they might not even be valid J words. I will have to process them as text. The only lines that matter are the cd lines and the lines with size and name. I will store the path as a list of boxes, updated by cd.

First I'll read the lines in and box them:

   ] lines =. < onaoclines
+------+----+-----+--------------+-------------+--
|$ cd /|$ ls|dir a|14848514 b.txt|8504156 c.dat|di...
+------+----+-----+--------------+-------------+--

I'll write a loop to process the lines. How dirty is the input? I will remove extra blanks. Suppose there is no cd to start with? I'll make that an error. Could a directory be listed twice? Just in case I'll keep all the info so I can delete duplicates.

require 'strings'
NB. AOC 2022 Problem 7
proclog =: {{   NB. x is the threshold, y is the boxed lines from the log
NB. Process lines one by one, producing list of path;filename;length
flist =. 0 3$a:  NB. will be path;filename;length
for_line. y do.
  ltext =. deb > line  NB. text of line with surplus blanks removed
  NB. Handle the forms we recognize: $ cd dir   and  number filename
  if. '$ cd' -: 4 {. ltext do.
    dir =. 5 }. ltext  NB. the directory string
    if. dir -: ,'/' do. currdir =. 0$a:  NB. reset to top level
    elseif. dir -: '..' do. currdir =. }: currdir  NB. back up one level
    else. currdir =. currdir , <dir  NB. new dir: down one level
    end.
  elseif. '0123456789' e.~ {. ltext do.  NB. size filename
    'num name' =. (ltext i. ' ') ({. ; }.) ltext  NB. size and filename (with leading space)
    flist =. flist , currdir ; name ; 0 ". num   NB. add entry for file
  end.  NB. other forms ignored
end.
NB. Remove duplicate files

NB. Calculate size of each directory

NB. Return the total of sizes below threshold
flist
}}

I'll test that.

   ] f =. proclog lines
+-----+------+--------+
|     | b.txt|14848514|
+-----+------+--------+
|     | c.dat|8504156 |
+-----+------+--------+
|+-+  | f    |29116   |
||a|  |      |        |
|+-+  |      |        |
+-----+------+--------+
|+-+  | g    |2557    |
||a|  |      |        |
|+-+  |      |        |
+-----+------+--------+
|+-+  | h.lst|62596   |
||a|  |      |        |
|+-+  |      |        |
+-----+------+--------+
|+-+-+| i    |584     |
||a|e||      |        |
|+-+-+|      |        |
+-----+------+--------+
|+-+  | j    |4060174 |
||d|  |      |        |
|+-+  |      |        |
+-----+------+--------+
|+-+  | d.log|8033020 |
||d|  |      |        |
|+-+  |      |        |
+-----+------+--------+
|+-+  | d.ext|5626152 |
||d|  |      |        |
|+-+  |      |        |
+-----+------+--------+
|+-+  | k    |7214296 |
||d|  |      |        |
|+-+  |      |        |
+-----+------+--------+

That matches the problem's result, so I will move on to calculating the size of each directory.

I need to combine the sizes of all the files in each directory, then move up one level and combine those into the higher-level directory, and so on. I'll have to be careful not to apply a file more than once. Maybe I'll remove files after they are processed, or perhaps loop, processing only the files at the bottom of the directory tree...

No, no, there's a better way. Every file counts in each of its containing directories. I'll simply create a record for each containing directory and combine all the files. Each file will create many records, but on this size of problem that's OK.

The list of directories for a file is all the prefixes of the file directories: that's a job for prefix: (u/. y). Let me refresh myself about that:

   <\ ;:'a b c'
+---+-----+-------+
|+-+|+-+-+|+-+-+-+|
||a|||a|b|||a|b|c||
|+-+|+-+-+|+-+-+-+|
+---+-----+-------+

[Good that I checked: prefix is u\, not u/. as I had misremembered.] I will need to add on an empty list too, to include the top directory. I'll bench-test the lines that add up the directories:

   f =. 0 2 {"1 ~. f   NB. discard duplicates and lose the filename
   ] fdirs =. (a: , <\)&.> {."1 f  NB. all directories for each name
+--+--+------+------+------+------------+------+------+------+------+
|++|++|++---+|++---+|++---+|++---+-----+|++---+|++---+|++---+|++---+|
|||||||||+-+||||+-+||||+-+||||+-+|+-+-+||||+-+||||+-+||||+-+||||+-+||
|++|++||||a||||||a||||||a||||||a|||a|e||||||d||||||d||||||d||||||d|||
|  |  |||+-+||||+-+||||+-+||||+-+|+-+-+||||+-+||||+-+||||+-+||||+-+||
|  |  |++---+|++---+|++---+|++---+-----+|++---+|++---+|++---+|++---+|
+--+--+------+------+------+------------+------+------+------+------+

There they are, the multiple paths for each file. They will be easy to combine: I'll use the workhorse key x u/. y where x is the directories (i. e. ; fdirs) and y is the list of file lengths, repeated for each directory. The verb u will be +/ y, which will add up all the lengths for each directory separately.

   (; fdirs) +//. (#@> fdirs) # > {:"1 f
48381165 94853 584 24933642

That's the right answer and I can finish the script:

require 'strings'
NB. AOC 2022 Problem 7
proclog =: {{   NB. x is the threshold, y is the boxed lines from the log
NB. Process lines one by one, producing list of path;filename;length
flist =. 0 3$a:  NB. will be path;filename;length
for_line. y do.
  ltext =. deb > line  NB. text of line with surplus blanks removed
  NB. Handle the forms we recognize: $ cd dir   and  number filename
  if. '$ cd' -: 4 {. ltext do.
    dir =. 5 }. ltext  NB. the directory string
    if. dir -: ,'/' do. currdir =. 0$a:  NB. reset to top level
    elseif. dir -: '..' do. currdir =. }: currdir  NB. back up one level
    else. currdir =. currdir , <dir  NB. new dir: down one level
    end.
  elseif. '0123456789' e.~ {. ltext do.  NB. size filename
    'num name' =. (ltext i. ' ') ({. ; }.) ltext  NB. size and filename (with leading space)
    flist =. flist , currdir ; name ; 0 ". num   NB. add entry for file
  end.  NB. other forms ignored
end.
NB. Remove duplicate files
flist =. 0 2 {"1 ~. flist
NB. Calculate size of each directory
fdirs =. (a: , <\)&.> {."1 flist  NB. all directories for each name
dsizes =. (; fdirs) +//. (#@> fdirs) # > {:"1 flist  NB. total filesize for each directory
NB. Return the total of sizes below threshold
+/ (x >: dsizes) # dsizes  NB. Result: total size of small directories
}}

For Part 2 I will need to modify the script to return the sizes of all directories. That's trivial and I won't write it out. Once I get the value, I need to find the smallest directory with large-enough size. I'll just check that against the example data:

   sz =. 48381165 94853 584 24933642
   sz =. /:~ sz  NB. sort into order
   need =. 30000000 - 70000000 - {: sz  NB. number of additional bytes needed
   index =. sz I. need  NB. index of first file big enough
   index { sz  NB. size of first file big enough
24933642

It took me a time or two to get need right.