User:Raul Miller/RegularExpressions
Jump to navigation
Jump to search
Collected definitions for a forum post I made on implementing regular expressions in J
NB. regular expression: NB. initial state; state transitions NB. initial state is one bit per recognized state NB. state transitions are: NB. dim 0: current state NB. dim 1: character (256) NB. dim 2: next state NB. match will mean last state is present after last character NB. a regular fragment is like a regular expression NB. except that its final state does not yet exist NB. utility dp=: +/ .* checksize=: -: _1 256 0 + (,.1 0 1) dp ] and=: * or=: and&.-. disp=: <@(#&a.)"1@(1&|:)`I.@.(1=#@$)L:0 NB. create regular fragments NB. character fragment NB. y: set of characters this transition allows chf=: 1 0; ,:@,.~&0@e.~&a. NB. sequence fragment NB. x, y: fragments to sequence seqf=:4 :0 'IX TX'=. x assert. TX checksize&$ IX 'IY TY'=. y assert. TY checksize&$ IY I=. (}:IX),({:IX)and IY T=. ((}:"1 TX),"1({:"1 TX)and/IY),(-($TY)+0 0,<:{:$TX){. TY I;T ) NB. one fragment or the other NB. x, y: fragments to sequence orf=:4 :0 'IX TX'=. x assert. TX checksize&$ IX 'IY TY'=. y assert. TY checksize&$ IY I=. (}:IX),(}:IY),IX or&{: IY T=. TX, (-($TY)+0 0,<:{:$TX){. TY I;T ) NB. fragment is optional optf=:3 :0 'I T'=. y assert. T checksize&$ I (I+.1 {.~-#I);T ) NB. fragment may repeat repf=:3 :0 'I T'=. y assert. T checksize&$ I I;T+.({:"1 T) and/ I ) NB. string to fragment stringf=: [: > [: seqf&.>/ chf&.> NB. turn a regular fragment into a regular expression complete=: ({.~ (>.|.)@$)L:0 match=:4 :0 'STATE NEXT'=. complete y for_CHAR. a.i.x do. STATE=. STATE or/ .and CHAR {"2 NEXT end. {:STATE ) NB. tests (note: if representation changes, some of these will have to change) assert (disp chf'a')-:,L:0]0;<,:'';'a' assert (disp 'ab' seqf&chf 'cd')-:,L:0]0;<('';'ab'),:'';'';'cd' assert (disp 'ab' orf&chf 'cd')-:,L:0]0 1;<('';'ab'),:'';'';'cd' assert (disp optf chf 'ab')-:,L:0]0 1;<,:'';'ab' assert (disp repf chf 'ab')-:,L:0]0;<,:'ab';'ab' assert (stringf 'abcd') -: 'ab' seqf&stringf 'cd' assert 0 1 -: (stringf 'abcd') = 'ab' orf&stringf 'cd' assert 'abc' match stringf 'abc' assert 'aaab' match (repf chf 'a') seqf chf 'b' assert 'aaab' match (repf optf chf 'a') seqf chf 'b' assert 'b' match (repf optf chf 'a') seqf chf 'b' assert -.'aaabb' match (repf optf chf 'a') seqf chf 'b'