Guides/WorkingWithText

From J Wiki
Jump to navigation Jump to search

Some suggestions for processing text (character data) in J.

How Do I Enter Text?

Direct (Inline)

The simplest way to do this in a J session is to cut-and-paste text into a (named) noun, like this: first typing "txt=. 0 : 0" into the J session, cutting-and-pasting the Lorem ipsum text into the session, then entering a closing ")" followed immediately by line-feed (Enter key).

txt=. 0 : 0
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.
)

Now, the noun txt contains the characters starting from the line after the "0 : 0" to the line preceding the lone ")" (including terminal line-feed).

What Can I do with Text?

Simple Things

We can check the shape of this character vector:

   $txt
888

We can count the number of periods in the text, or the number of spaces:

   +/'.'=txt
9
   +/' '=txt
149

We can sort the text - not very interesting in itself - and we can break it up into "words" (by the J definition) and count how many there are.

   /:~txt

                                                                                                                                                     ,,,,,,,,,,,,.........AAALLLLLLSSSaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbb...
   ;:txt
+-----+-----+-----+---+----+-+----------+----------+-----+-+---+----+------+------+------+--------+--+------+--+------+-----+--------+----+-+---+----+---------+--+----+---+--+-------+--+-----+---+-------+--+--+------+----+-----+----+---------+-+--+---+----...
|Lorem|ipsum|dolor|sit|amet|,|consetetur|sadipscing|elitr|,|sed|diam|nonumy|eirmod|tempor|invidunt|ut|labore|et|dolore|magna|aliquyam|erat|,|sed|diam|voluptua.|At|vero|eos|et|accusam|et|justo|duo|dolores|et|ea|rebum.|Stet|clita|kasd|gubergren|,|no|sea|taki...
+-----+-----+-----+---+----+-+----------+----------+-----+-+---+----+------+------+------+--------+--+------+--+------+-----+--------+----+-+---+----+---------+--+----+---+--+-------+--+-----+---+-------+--+--+------+----+-----+----+---------+-+--+---+----...
   $;:txt
163

The sort shows blank lines at the beginning because the LFs occur first in the sort sequence.