Essays/Substring Replacement

From J Wiki
Jump to: navigation, search

Substring Replacement

x replace p;q replaces all occurrences of p by q in x .

replace=: 4 : 0
 'p q'=. y
 j=. p nosindx x
 if. ''-:j do. x return. end.
 d=. p-&#q
 k=. (j+(0>.-d)*i.#j)+/i.#q
 select. *d
  case.  1 do. (0 (j+/(#q)+i.d)}1$~#x) # q k}x
  case.  0 do. q k}x
  case. _1 do. q k} (0 (d{."1 k)}1$~(#x)+(#j)*|d) #^:_1 x
 end.
)

nosindx=: 4 : 0
 s=. x I.@E. y
 i=. s I. s+#x
 (i.&_1 {. ]) (s,_1) {~ (i,_1) {~^:a: 0
)

replace uses nosindex , adapted from Non-Overlapping Substrings to find the indices of non-overlapping substring occurrences. nosindex is equivalent to I.@E. if substring occurrences do not overlap (i.e. if no non-trivial prefix of p is a suffix of p ).

Examples of replace in use:

   x=: 'of the people, by the people, for the people'

   x replace 'the';'a'
of a people, by a people, for a people

   x replace 'the';'one'
of one people, by one people, for one people

   x replace 'the';'them there'
of them there people, by them there people, for them there people

Program Logic

Let d=. p-&#q be the difference between the length of p (the target string) and the length of q (the replacement string). The processing depends on whether d is positive, zero, or negative:

  • If d is positive ( p longer than q ), some characters will be removed from x after it is amended by q .
  • If d is zero ( p has the same length as q ), x is simply amended by by q .
  • If d is negative ( p is shorter than q ), x is first expanded to make room and then amended by q .

If j=. x nosindex p are the indices of non-overlapping occurrences of p in x , then k=. (j+(0>.-d)*i.#j)+/i.#q are the indices to be amended. If d is not negative, i.e. if p is not shorter than q , k simplifies to j+/i.#q ; if d is negative ( p is shorter than q ), the indices of p in the expanded string are j+(-d)*i.#j .

The following examples illustrate the three cases:

   x=: 'of the people, by the people, for the people'
   p=: 'the'
   ] j=: p nosindx x
3 18 34

p is longer than q (d>0)

   q=: 'a'
   ] d=: p-&#q
2
   j
3 18 34
   j+(0>.-d)*i.#j
3 18 34
   ] k=. (j+(0>.-d)*i.#j)+/i.#q
 3
18
34
   q k} x
of ahe people, by ahe people, for ahe people
   '.1' {~ b=. 0 (j+/(#q)+i.d)}1$~#x
1111..1111111111111..11111111111111..1111111
   b # q k} x
of a people, by a people, for a people

p has the same length as q (d=0)

   q=: 'one'
   ] d=: p-&#q
0
   j
3 18 34
   j+(0>.-d)*i.#j
3 18 34
   ] k=. (j+(0>.-d)*i.#j)+/i.#q
 3  4  5
18 19 20
34 35 36
   q k} x
of one people, by one people, for one people

p is shorter than q (d<0)

   q=: 'them there'
   ] d=: p-&#q
_7
   j
3 18 34
   j+(0>.-d)*i.#j
3 25 48
   ] k=. (j+(0>.-d)*i.#j)+/i.#q
 3  4  5  6  7  8  9 10 11 12
25 26 27 28 29 30 31 32 33 34
48 49 50 51 52 53 54 55 56 57
   d{."1 k
 6  7  8  9 10 11 12
28 29 30 31 32 33 34
51 52 53 54 55 56 57
   '.1' {~ b=: 0 (d{."1 k)}1$~(#x)+(#j)*|d
111111.......111111111111111.......1111111111111111.......1111111
   b #^:_1 x
of the        people, by the        people, for the        people
   q k} b#^:_1 x
of them there people, by them there people, for them there people

Non-Overlapping Occurrences

As stated above, nosindx finds the indices of non-overlapping substring occurrences. It is the same as I.@E. if substring occurrences do not overlap. What if they do overlap?

   ] x=: 40$'abcabcabc_'
abcabcabc_abcabcabc_abcabcabc_abcabcabc_
   x replace 'abcabc';'syzygy'
syzygyabc_syzygyabc_syzygyabc_syzygyabc_

   ] j0=: 'abcabc' I.@E.   x
0 3 10 13 20 23 30 33
   ] j1=: 'abcabc' nosindx x
0 10 20 30

   x
abcabcabc_abcabcabc_abcabcabc_abcabcabc_
   '1' j0} '.'$~#x
1..1......1..1......1..1......1..1......
   '1' j1} '.'$~#x
1.........1.........1.........1.........

   'syzygy' (j0+/i.6)} x
syzsyzygy_syzsyzygy_syzsyzygy_syzsyzygy_
   'syzygy' (j1+/i.6)} x
syzygyabc_syzygyabc_syzygyabc_syzygyabc_
   x replace 'abcabc';'syzygy'
syzygyabc_syzygyabc_syzygyabc_syzygyabc_



Contributed by Roger Hui.