User:Raul Miller/RegularExpressions

From J Wiki
Jump to navigation Jump to search

Collected definitions for a forum post I made on implementing regular expressions in J

NB. regular expression:
NB. initial state; state transitions
NB. initial state is one bit per recognized state
NB. state transitions are:
NB.  dim 0: current state
NB.  dim 1: character (256)
NB.  dim 2: next state
NB. match will mean last state is present after last character

NB. a regular fragment is like a regular expression
NB. except that its final state does not yet exist

NB. utility
dp=: +/ .*
checksize=: -: _1 256 0 + (,.1 0 1) dp ]
and=: *
or=: and&.-.
disp=: <@(#&a.)"1@(1&|:)`I.@.(1=#@$)L:0

NB. create regular fragments

NB. character fragment
NB.  y: set of characters this transition allows
chf=: 1 0; ,:@,.~&0@e.~&a.

NB. sequence fragment
NB.  x, y: fragments to sequence
seqf=:4 :0
  'IX TX'=. x assert. TX checksize&$ IX
  'IY TY'=. y assert. TY checksize&$ IY
  I=. (}:IX),({:IX)and IY
  T=. ((}:"1 TX),"1({:"1 TX)and/IY),(-($TY)+0 0,<:{:$TX){. TY
  I;T
)

NB. one fragment or the other
NB. x, y: fragments to sequence
orf=:4 :0
  'IX TX'=. x assert. TX checksize&$ IX
  'IY TY'=. y assert. TY checksize&$ IY
  I=. (}:IX),(}:IY),IX or&{: IY
  T=. TX, (-($TY)+0 0,<:{:$TX){. TY
  I;T
)

NB. fragment is optional
optf=:3 :0
  'I T'=. y assert. T checksize&$ I
  (I+.1 {.~-#I);T
)

NB. fragment may repeat
repf=:3 :0
  'I T'=. y assert. T checksize&$ I
  I;T+.({:"1 T) and/ I
)

NB. string to fragment
stringf=: [: > [: seqf&.>/ chf&.>

NB. turn a regular fragment into a regular expression
complete=: ({.~ (>.|.)@$)L:0

match=:4 :0
  'STATE NEXT'=. complete y
  for_CHAR. a.i.x do.
      STATE=. STATE or/ .and CHAR {"2 NEXT
  end.
  {:STATE
)

NB. tests (note: if representation changes, some of these will have to change)
assert (disp chf'a')-:,L:0]0;<,:'';'a'
assert (disp 'ab' seqf&chf 'cd')-:,L:0]0;<('';'ab'),:'';'';'cd'
assert (disp 'ab' orf&chf 'cd')-:,L:0]0 1;<('';'ab'),:'';'';'cd'
assert (disp optf chf 'ab')-:,L:0]0 1;<,:'';'ab'
assert (disp repf chf 'ab')-:,L:0]0;<,:'ab';'ab'
assert (stringf 'abcd') -: 'ab' seqf&stringf 'cd'
assert 0 1 -: (stringf 'abcd') = 'ab' orf&stringf 'cd'
assert 'abc' match stringf 'abc'
assert 'aaab' match (repf chf 'a') seqf chf 'b'
assert 'aaab' match (repf optf chf 'a') seqf chf 'b'
assert 'b' match (repf optf chf 'a') seqf chf 'b'
assert -.'aaabb' match (repf optf chf 'a') seqf chf 'b'