Scripts/JavascriptCruncher

From J Wiki
Jump to navigation Jump to search

I recently had occasion to "crunch" some javascript files. That is -- stripping out unnecessary content from the files to reduce file size (comments, etc).

Unfortunately, every existing javascript "cruncher" I could find had problems. As a general rule, many of them did not deal properly with // in strings and/or regular expressions. Also, some did not properly deal with otherwise legal treatment of end-of-lines.

So, I wrote my own.

My first effort 'crunch' was a bit quirky. For example, while it collapsed most sequences of blank lines to a single newline character, it left in newlines that were preceded by the // single line comment delimiter. This would be easy to eliminate (crunch^:_) but I decided I liked the effect.

More generally, I decided that maintaining some newlines in the code were important to readability (and, thus, the maintainability of the crunched code). I decided that I'd leave a single blank line in place wherever the // effect left them -- those would tend to be heavily commented places in the code and almost coincdentally those blank lines help make the crunched code more readable. I also decided that instead of removing newlines I'd delete semicolons which preceed newlines. Again, this helps maintain the readability of the crunched code.

Here's what the code currently looks like:

cl=:~:~a.
class=:4 :'cl=:cl>.x*a.e.y'
 1 class LF NB. end of line
 2 class 9 11 12 13 32{a. NB. white space
 3 class '''' NB. single quote
 4 class '"' NB. double quote
 5 class '\' NB. escape character
 6 class '/' NB. beginning of comment
 7 class '*' NB. multi-line comment indicator
 8 class '[' NB. begining of regexp character class
 9 class ']' NB. end of regexp character class
10 class '0123456789abcdefghijklmnopqrstuvwxyz' NB. word forming
10 class 'ABCDEFGHIJKLMNOPQRSTUVWXYZ_$' NB. more word forming
10 class 128}.a. NB. treat unicode as word forming

Note'states'
 0: skipping redundant white space
 1: saw a significant token
 2: in a single quoted string
 3: \ in a single quoted string
 4: in a double quoted string
 5: \ in a double quoted string
 6: found a /
 7: found a //, waiting for end of line
 8: found a /*, waiting for *
 9: found /*...*
10: found a / waiting for closing /
...

character classes:
    +,  LF,    ,   ',   ",   \,   /,   *,   [,   ],  wordforming
)

states=:0 10#:10*".;._2(0 :0)
  1.1  0    0    2.1  4.1  1.1  6.1  1.1  1.1  1.1 14.1   NB. 0:-space after LF
  1.2 17.2 18.3  2.2  4.2  1.2  6.2  1.2  1.2  1.2 14.2   NB. 1: +
  2.2 17.2  2.2  1.2  2.2  3.2  2.2  2.2  2.2  2.2  2.2   NB. 2: '
  2.2 17.2  2.2  2.2  2.2  2.2  2.2  2.2  2.2  2.2  2.2   NB. 3: '\
  4.2 17.2  4.2  4.2  1.2  5.2  4.2  4.2  4.2  4.2  4.2   NB. 4: "
  4.2 17.2  4.2  4.2  4.2  4.2  4.2  4.2  4.2  4.2  4.2   NB. 5: "\
 10.2 17.2 10.2 10.2 10.2 10.2  7    8   11.2 10.2 10.2   NB. 6: /
  7   17.1  7    7    7    7    7    7    7    7    7     NB. 7: //
  8    8    8    8    8    8    8    9    8    8    8     NB. 8: /*
  8    8    8    8    8    8    0    9    8    8    8     NB. 9: /*..*
 10.2 17.2 10.2 10.2 10.2 12.2  1.2 10.2 11.2 10.2 10.2   NB.10: /.
 11.2 17.2 11.2 11.2 11.2 13.2 11.2 11.2 11.2 10.2 11.2   NB.11: /[
 10.2 10.2 10.2 10.2 10.2 10.2 10.2 10.2 10.2 10.2 10.2   NB.12: /\
 11.2 11.2 11.2 11.2 11.2 11.2 11.2 11.2 11.2 11.2 11.2   NB.13: /[\
  1.2 17.2 15.2  2.2  4.2  1.2  6.2  1.2  1.2  1.2 14.2   NB.14: word
  1.1 17.1 15.1  2.1  4.1  1.1  6.2  1.1  1.1  1.1 14.2   NB.15: space right after word
  1.1 17.1 16    2.1  4.1  1.2  6.1  1.1  1.1  1.1 14.1   NB.16:-further space before LF
  1.2  0.2  0.2  2.2  4.2  1.2  6.2  1.2  1.2  1.2 14.2   NB.17: first LF
  1.1 17.1 18    2.1  4.1  1.1  6.1  1.1  1.1  1.1 14.1   NB.18:-space after token
)
crunch=: (1;states;cl)&;:
ctrace=: (5;states;cl)&;:

fixup=:3 :0
 NB. delete some newlines and semicolons which aren't very significant for readability
 >dellast^:_&.>/ ('{',LF);delfirst^:_&.>/(';',LF);';}';(LF,'}');(3#LF);y
)
delfirst=: -.@E. # ]
dellast=: (-.@#@[ |. -.@E.) # ]

But I use this from the command line, so here's the bit that makes that possible:

rd=:1!:1
wr=:1!:2

3 :0^:(4=#ARGV)''
NB. crunch^:_ would get rid of more LineFeeds, but gain is trivial
NB. and it's a bit more readable with them...
 (fixup crunch -.&CR rd _2{2}.ARGV) wr _1{ARGV
 2!:55]0
)

3 :0^:(1 e. 'jconsole' E.;ARGV)''
 1!:2&2 'Usage: c:\j601\jconsole.exe c:\j601\addons\usat\crunch.ijs INFILE OUTFILE'
 2!:55]1
)

Note that the script assumes it's running from the command line if it's executed from jconsole and otherwise just defines its terms. This works rather well for me.

Note that additional rules might be added. For example, if I wanted to delete newlines which preceded '.' or ')' I'd add a couple more delfirst rules --

(LF,'.');(LF,')');...

But, really -- how many lines like that are there? (And, in fact, all the "fixups" could be skipped with only a minor impact on the ultimate file size.)

-- Raul Miller <<DateTime(2007-03-01T23:05:52Z)>>