Essays/Non-Overlapping Substrings

From J Wiki
Jump to navigation Jump to search

x E. y finds all occurrences of x as a substring in y . For example:

   x=: 'our'
   y=: 'fourscore and ten years ago, our fathers'

   x E. y
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
   y ,: ' 1' {~ x E. y
fourscore and ten years ago, our fathers
 1                           1

Substrings can overlap if a non-trivial prefix of x occurs as a suffix of x . (A trivial prefix is the empty string or the entire of x .) Suppose only the non-overlapping substrings (NOS) are to be found? In the following example, the only 1s we want are those at indices 2 11 24 33 46 .

   x=: 'abcabcabc'
   y=: '--',60$(20$x),'--'

   (|:":,.i.#y) , y ,: ' 1' {~ x E. y
          1111111111222222222233333333334444444444555555555566
01234567890123456789012345678901234567890123456789012345678901
--abcabcabcabcabcabcab--abcabcabcabcabcabcab--abcabcabcabcabca
  1  1  1  1            1  1  1  1            1  1  1

Let s=: x I.@E. y be the indices of the (possibly overlapping) occurrences of x in y . If j in s is the index of an NOS, the index of the next NOS can be found through the use of the dyad I. (interval index).

   ] s=: x I.@E. y
2 5 8 11 24 27 30 33 46 49 52
   ] i=: s I. s+#x
3 4 4 4 7 8 8 8 11 11 11
   s ,: i{s,_1
 2  5  8 11 24 27 30 33 46 49 52
11 24 24 24 33 46 46 46 _1 _1 _1

If 2 is (the index of) an NOS, then the next NOS is 11; if 5 is an NOS, the next NOS is 24; if 8 is an NOS, the next NOS is 24; etc. Since the first element of s is the index of an NOS, the indices of all the NOSes can be found by following the path that starts there; that is, by computing the transitive closure of that first element.

2 5 8 11 24 27 30 33 46 49 52 _1
11 24 24 24 33 46 46 46 _1 _1 _1 _1
nos=: 4 : 0
 s=. x I.@E. y
 i=. s I. s+#x
 (i.#y) e. (s,_1) {~ (i,_1) {~^:a: 0
)

   y ,: ' 1'{~ x nos y
--abcabcabcabcabcabcab--abcabcabcabcabcabcab--abcabcabcabcabca
  1        1            1        1            1

   t ,: ' 1'{~ 'aaa' nos t=: 59$'a'
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1

   t ,: ' 1'{~ 'no overlaps' nos t=: 59$'_no overlaps '
_no overlaps _no overlaps _no overlaps _no overlaps _no ove
 1            1            1            1

The function can be written tacitly as follows:

tc  =: ,&_1 {~^:a: 0:
nos1=: i.@#@] e. #@[ (tc@(]I.+) { _1,~]) I.@E.



See also



Contributed by Roger Hui.