Essays/Substring Replacement

From J Wiki
Jump to navigation Jump to search

Substring Replacement

x replace p;q replaces all occurrences of p by q in x .

replace=: 4 : 0
 'p q'=. y
 j=. p nosindx x
 if. ''-:j do. x return. end.
 d=. p-&#q
 if. d do.
  b=. 1$~#x
  if. d>0 do.
   b=. 0 (,j+/i.d)} b
  else.
   b=. (1 j. -d) j} b
  end.
  x=. b # x
  j=. j-d*i.#j
 end.
 k=. <^:2"_1 j+/i.#q
 x=. q k} x
)

nosindx=: 4 : 0
 s=. x I.@E. y
 i=. s I. s+#x
 (i.&_1 {. ]) (s,_1) {~ (i,_1) {~^:a: 0
)

replace uses nosindex , adapted from Non-Overlapping Substrings to find the indices of non-overlapping substring occurrences. nosindex is equivalent to I.@E. if substring occurrences do not overlap (i.e. if no non-trivial prefix of p is a suffix of p ).

Examples of replace in use:

   x=: 'of the people, by the people, for the people'

   x replace 'the';'a'
of a people, by a people, for a people

   x replace 'the';'one'
of one people, by one people, for one people

   x replace 'the';'them there'
of them there people, by them there people, for them there people

Program Logic

Let d=. p-&#q be the difference between the length of p (the target string) and the length of q (the replacement string). The processing depends on whether d is positive, zero, or negative:

  • If d is positive ( p is longer than q ), x is first shrinked to drop excessive elements and then amended by q .
  • If d is zero ( p has the same length as q ), x is simply amended by q .
  • If d is negative ( p is shorter than q ), x is first expanded to make room and then amended by q .

Let j=. x nosindex p be the indices of non-overlapping occurrences of p in x . If p and q differ in length, then j should be adjusted: j=. j-d*i.#j . The indices to be amended are k=. j+/i.#q . An additional post-processing to k is required to avoid scatter-modify, so the final expression will be k=. <^:2"_1 j+/i.#q .

The following examples illustrate the three cases:

   x=: 'of the people, by the people, for the people'
   p=: 'the'
   ] j=: p nosindx x
3 18 34

p is longer than q (d>0)

   q=: 'a'
   ] d=: p-&#q
2
   j
3 18 34
   ,j+/i.d
3 4 18 19 34 35
   x ,: '.1' {~ 0 (,j+/i.d)} b=. 1$~#x
of the people, by the people, for the people
111..1111111111111..11111111111111..11111111
   ] x=. (0 (,j+/i.d)} b) # x
of e people, by e people, for e people
   ] j=. j-d*i.#j
3 16 30
   ] k=. <^:2"_1 j +/ i. # q
+---+----+----+
|+-+|+--+|+--+|
||3|||16|||30||
|+-+|+--+|+--+|
+---+----+----+
   ] x=. q k} x
of a people, by a people, for a people

p has the same length as q (d=0)

   q=: 'one'
   ] d=: p-&#q
0
   j
3 18 34
   ] k=. <^:2"_1 j +/ i. # q
+-------+----------+----------+
|+-----+|+--------+|+--------+|
||3 4 5|||18 19 20|||34 35 36||
|+-----+|+--------+|+--------+|
+-------+----------+----------+
   ] x=. q k} x
of one people, by one people, for one people

p is shorter than q (d<0)

   q=: 'them there'
   ] d=: p-&#q
_7
   j
3 18 34
   1 j. -d
1j7
   (t # x) ,: '.1' {~ 1 #~ t=. (1 j. - d) j} b=. 1$~#x
of t       he people, by t       he people, for t       he people
1111.......111111111111111.......1111111111111111.......111111111
   ] x=. t # x
of t       he people, by t       he people, for t       he people
   ] j=. j - d * i. # j
3 25 48
   ] k=. <^:2"_1 j +/ i. # q
+------------------------+-------------------------------+-------------------------------+
|+----------------------+|+-----------------------------+|+-----------------------------+|
||3 4 5 6 7 8 9 10 11 12|||25 26 27 28 29 30 31 32 33 34|||48 49 50 51 52 53 54 55 56 57||
|+----------------------+|+-----------------------------+|+-----------------------------+|
+------------------------+-------------------------------+-------------------------------+
   ] x=. q k} x
of them there people, by them there people, for them there people

Non-Overlapping Occurrences

As stated above, nosindx finds the indices of non-overlapping substring occurrences. It is the same as I.@E. if substring occurrences do not overlap. What if they do overlap?

   ] x=: 40$'abcabcabc_'
abcabcabc_abcabcabc_abcabcabc_abcabcabc_
   x replace 'abcabc';'syzygy'
syzygyabc_syzygyabc_syzygyabc_syzygyabc_

   ] j0=: 'abcabc' I.@E.   x
0 3 10 13 20 23 30 33
   ] j1=: 'abcabc' nosindx x
0 10 20 30

   x
abcabcabc_abcabcabc_abcabcabc_abcabcabc_
   '1' j0} '.'$~#x
1..1......1..1......1..1......1..1......
   '1' j1} '.'$~#x
1.........1.........1.........1.........

   'syzygy' (j0+/i.6)} x
syzsyzygy_syzsyzygy_syzsyzygy_syzsyzygy_
   'syzygy' (j1+/i.6)} x
syzygyabc_syzygyabc_syzygyabc_syzygyabc_
   x replace 'abcabc';'syzygy'
syzygyabc_syzygyabc_syzygyabc_syzygyabc_



Contributed by Roger Hui, with further contribution by Igor Zhuravlov.