Phrases/Strings

From J Wiki
Jump to: navigation, search

String operations are collected here.


strjoin

Alternates formatted (":) items of y with string in x.

NB. join boxed list y with x; see strsplit
strjoin=: #@[ }. <@[ ;@,. ]

Examples

   '/' strjoin ;:'one two three'
one/two/three

   LF strjoin ',' <@strjoin"1 ":each i.2 4
0,1,2,3
4,5,6,7

   ]a=. ']}>',~'<{[',(']}',LF,'{[') strjoin '][' <@strjoin"1 ": each i.2 4
<{[0][1][2][3]}
{[4][5][6][7]}>

   require'strings'

   a rplc cut'[ <td> ] </td> { <tr> } </tr> < <table> > </table> '
<table><tr><td>0</td><td>1</td><td>2</td><td>3</td></tr>
<tr><td>4</td><td>5</td><td>6</td><td>7</td></tr></table>

strsplit

Simpler form of split which does not track non-overlapping strings.

NB. strsplit y by substring x; see join
strsplit=: #@[ }.each [ (E. <;.1 ]) ,

Examples

  ',' strjoin ' ' strsplit '1 2 3 one two three'
1,2,3,one,two,three

  '<' strjoin ' of ' strsplit 'a of b of c'
a<b<c

  '[' }.@strsplit ']' (strjoin ,&a:) '[a';'[b';'[c'
+--+--+--+
|a]|b]|c]|
+--+--+--+

nossplit

Splitting with account for non-overlapping strings. Good for repeating separator like ||.

NB. Non-overlapping variant of E.
nos=: i.@#@] e. #@[ ({~^:a:&0@(,&_1)@(]I.+) { _1,~]) I.@E.

NB. split y by non-overlapping substrings x
nossplit=: #@[ }.each [ (nos <;.1 ]) ,

Examples

   '||' nos 'abc||def||cd'
0 0 0 1 0 0 0 0 1 0 0 0

   '||' nossplit 'abc||def|||cd'
+---+---+---+
|abc|def||cd|
+---+---+---+
   '||' strsplit 'abc||def|||cd'
+---+---++--+
|abc|def||cd|
+---+---++--+

See Essays/Non-Overlapping Substrings for the detail of nos.

cut and dltb

Cutting(or splitting) texts with delimiters is used often, and deleting the leading and trailing blanks are useful in this case. When the delimiter is only one letter, you can use cut and dltb from strings library.

   a=. 'Ken Iverson, Roger Hui, Eric Iverson, Clifford Reiter, Henry Rich'
   ,. /:~  dltb each ',' cut a
+---------------+
|Clifford Reiter|
+---------------+
|Eric Iverson   |
+---------------+
|Henry Rich     |
+---------------+
|Ken Iverson    |
+---------------+
|Roger Hui      |
+---------------+

Of course you may simply use ;.

   ,. <@dltb;._1 ',',a
+---------------+
|Ken Iverson    |
+---------------+
|Roger Hui      |
+---------------+
|Eric Iverson   |
+---------------+
|Clifford Reiter|
+---------------+
|Henry Rich     |
+---------------+

Alphabets

LATIN_UC=:'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
LATIN_LC=:'abcdefghijklmnopqrstuvwxyz'
RUSSIAN_UC=:'АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ'
RUSSIAN_LC=:'абвгдеёжзийклмнопрстуфхцчшщъыьэюя'

Slicing with Regex

Using regular expressions to define tokens is very convenient and powerful.

Start with loading the Regex library, and defining additional functions.

   load 'regex'
   rxgroups=: }.@rxmatch rxfrom ]    NB. like rxall but for match groups

A set of different position-specific tokens.

A leading non-space followed by any space, and the rest.

   '(\S+)\s+(.+)' rxgroups '12    3456 789'
+--+--------+
|12|3456 789|
+--+--------+

A set of same type tokens.

Space separated tokens.

   '\S+' rxall '12    3456 789'
+--+----+---+
|12|3456|789|
+--+----+---+

See Also