Essays/String Indexing

From J Wiki
Jump to navigation Jump to search

A string is a sequence of characters (or, in some contexts: a sequence of numbers). This is a rich field, with too many infinities, but which includes some fundamental issues which we must deal with when handling text.

In J, we often use segmented strings, which might be described as a sequence of substrings each uniquely prefixed (or suffixed) by a delimiting character.

   example=: '/this/is/a/segmented/string'
   <;.1 example
┌─────┬───┬──┬──────────┬───────┐
│/this│/is│/a│/segmented│/string│
└─────┴───┴──┴──────────┴───────┘

A string index would be able to select one or more segments from a segmented string. In other words:

ssndx=: {{ ;y {L:0~ <@(+i.)/"1 x }}

And, if we had a verb to build a directory of segmented string indices, we could use that to manipulate our string. In other words, with

ssdir=: (={.) ({.,#);.1 i.@#

we can operate on string segment indices and build the resulting string:

   (|.ssdir example) ssndx example
/string/segmented/a/is/this

We can also perform merge operations, by concatenating two strings, combining their directories (adding the length of the first string to what were originally offsets into the second string) and then selecting first from the combined directories and then from the concatenated strings.

Note also, that while ssdir is useful for when working with segmented strings, we might want to work with other kinds of strings. Here, we might use I.@E. to locate occurrences of substrings of interest (we might also append the length of that substring to form a directory, if that's relevant). (Or, we might use I.@= to locate occurrences of characters of interest, such as newlines. Or, we might use {."2 pattern rxmatches string to build a directory of regular expression matches.)

When working with substrings in this fashion, we might also wish to find the locations of preceding substrings. Here, an expression like starts (I. { [) ends would let us find locations of substring starts which precede substring ends. Other mechanisms are possible.

With these tools, we might model a search and replace mechanism, such as:

replacewith=: {{
  mlens=. #@> m=. <^:(0=L.) m
  nlens=. #@> n=. <^:(0=L.) n
  assert. mlens -:&# nlens
  starts=. m I.@E.L:0 y,{:a.
  before=. ;starts,.each mlens
  after=. ;(}:+/\(#y),nlens),.each (#@>starts) #each nlens
  keep=. 1,.~(i.#y) -. ;<@(+i.)/"1 before
  ((keep,after)/:keep,before) ssndx y,;n
}}

Which behaves like this:

   'is' replacewith 'are' example
/thare/are/a/segmented/string
   (;:'t is bad') replacewith (;:'X are good') example
/Xhare/are/a/segmenXed/sXring
   ('';'') replacewith('+';'-') 'ab'
+-a+-b+-