Vocabulary/Words

From J Wiki
Jump to navigation Jump to search

Back to: Vocabulary

Strings, Numbers and Names are all Words

The characters used to form words and sentences in J are:

  • alphabetic: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
  • apostrophe: (')
  • graphic:  !"#$%&*+,-./:;<=>?@[\]^_`{|}~
  • numeric: 0123456789
  • period: (.)
  • underscore: (_)
  • whitespace: includes sequences of space (ASCII 32) and/or TAB (ASCII 9), but excludes newline/linefeed/LF (ASCII 10)

Primitive objects in J, listed in NuVoc, are words of 1 to 3 characters, denoting various operations (verbs), quantities and/or texts (nouns), and related actions. Users may assign a name to an object which they create, according to these rules:

  • It is a string of characters,
  • Beginning with an upper or lower-case letter,
  • Possibly followed by other letters and/or digits,
  • And may include embedded single underscores (not consecutive and not trailing),

Each name is defined in one of several computation environments (or namespaces) known in J as Locales, which are identified by similar sorts of names, or by assigned numbers. Objects' names may be qualified by an attached suffix which is bracketed by underscores, designating a particular locale of operation. Absence of a name between the underscores implies the locale 'base'.

When a sentence is executed, J first converts it into words. Use the verb Words (;:) to separate the words in a given sentence:

   sentence=: 0 : 0
 'abc ''d'' efg' -:<name , 0 1 2+(z-1)  NB. (comment)
)

   ;: sentence
+---------------+--+-+----+-+-----+-+-+-+-+-+-+-------------+
|'abc ''d'' efg'|-:|<|name|,|0 1 2|+|(|z|-|1|)|NB. (comment)|
+---------------+--+-+----+-+-----+-+-+-+-+-+-+-------------+

Summary of word formation:

  • Although J executes a sentence right-to-left (<--), it first splits a code string into words left-to-right (-->).
  • An apostrophe (') starts a string which continues to the next apostrophe. A string becomes a single word, even if it contains a space. After the first apostrophe (i.e. inside a string), two consecutive apostrophes stand for one apostrophe character, as in the example above.
  • A graphic such as (+) is a word in itself, unless it is inflected, e.g. (+:).
  • An alphabetic starts a name, which contains all the ensuing consecutive alphabetics, numerics and underscores, or it may begin an inflected primtive, e.g. (a:).

    A name may not end in a single underscore unless it is a valid locative.

  • A numeric or underscore starts a number, which contains all the ensuing consecutive alphabetics, numerics, underscores and periods.
    • J recognises these character-sequences as valid number-words:  
      _. __: _9: _8: _7: _6: _5: _4: _3: _2: _1: _0: 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: _:
    • Apart from these exceptions, a numeric may not be inflected.
  • A character outside the printable ASCII range (32 thru 127) is a single word.

There is one tremendously important, counter-intuitive, special case:

  • A sequence of numbers separated by whitespace is treated as a single word (as in the example above).
    • Note that a whitespace-separated sequence including names is not treated as a single word, even if the names have numeric values. J collects consecutive numbers into a single word before it examines the values of any names.

The process of recognizing words, summarized above, can be seen in more detail at JDic:d332.


Sentences

The sentence is the executable element in a J script. Basically, it is one line of code.

Sentences cannot span more than one line. There is no "continuation character" as in Basic, and no "statement delimiter" as in Java or C.

The executable line is treated as a single sentence, unless it contains control words, when the control words and sections of the line between control words are each treated as separate sentences.

if. T do. sentence1 else. sentence2 end.

T (above) is called a T-block. Typically a T-block is a single "test" sentence, e.g.  0=#y , but it can consist of several sentences, i.e. a block of code.

The primitive (NB.) starts a comment. The leftmost NB. in a line causes the rest of the line to be ignored by the interpreter. Whatever stands to the left of it is the executable line.

   1 + 1
2
   1 + 1 NB. The executable line is the first 1 + 1
2
   NB. You can use NB. to "comment out" a sentence inside a script, like this ...
   NB. 1 + 1
   NB. ... J ignored that!

J Code, Strings, Lines, Sentences and Textfiles

Definition:

  • CR stands for the ASCII byte: 13 — Carriage Return
  • LF stands for the ASCII byte: 10 — Linefeed
  • CRLF stands for the pair of ASCII bytes: 13 10 — Carriage Return Linefeed.

(assert CRLF -: CR,LF)

Textfiles are the most elementary kind of files used by today's computers to store data on disk. But there are many different standards for the simple textfile. Of these, just two are in common use:

  • Windows — separates lines using CRLF
  • UNIX (including Linux and Apple Macintosh) — separates lines using just LF.

CRLF, CR, LF are

  • Standard Library word(s) residing in the 'z'-locale
  • Defined in the factory script stdlib.ijs which is located in  ~system/main/stdlib.ijs
  • View the definition(s) in a JQt session by entering:  open '~system/main/stdlib.ijs'

Internally, J uses the UNIX standard (LF-separated).

  • Even if your platform is Windows, you should expect textfiles, in particular J scripts, to have been converted into LF-separated string of bytes.
  • This ensures that J code, system or user, is cross-platform, and needs no special code to handle CR and CRLF.
  • The files library does, however, have verbs to retain CR when importing a file, in case you actually need to use that.

A noun containing J Code is typically a string separated into lines by "newline" ("linefeed") characters (LF). This holds true also for verb definitions which you've asked J to convert into a noun, e.g. by using Foreign (5!:5).

More precisely, J Code is a list of bytes, separated into individual lines by LF.

If you ask J to give you the definition of a verb as a string, e.g. using Foreign (5!:5), the result is a LF-separated string, i.e. a string of bytes, with each code line terminated by LF:

out=. 3 : 0
smoutput '--- executing: y'
tryexec y
smoutput '... (datatype y) is: ' , (datatype y)
smoutput '--- executing: uucp y'
tryexec uucp y
)
   ] jcode=: 5!:5 <'out' NB. using the tool defined above as an example
3 : 0
smoutput '--- executing: y'
tryexec y
smoutput '... (datatype y) is: ' , (datatype y)
smoutput '--- executing: uucp y'
tryexec uucp y
)
   $ jcode                  NB. jcode is a string, i.e. a list of rank 1
141
   datatype jcode           NB. The precision of jcode is: byte
literal
   CR e. jcode              NB. (Even on a Windows platform) it doesn't contain CR
0
   LF e. jcode              NB. but it does contain LF
1

Define another tool (showLF) to emphasize the whereabouts of LF in the noun jcode:

   showLF=: verb define
require 'strings'
z=. quote y
z rplc LF ; ''' , LF , '''
)

   showLF jcode
'3 : 0' , LF , 'smoutput ''--- executing: y''' , LF , 'tryexec y' , LF , 'smoutput ''... (datatype y) is: '' , (datatype y)' , LF , 'smoutput ''--- executing: uucp y''' , LF , 'tryexec uucp y' , LF , ')'

Let's summarize the situation with J code, strings, lines, sentences and textfiles:

  • J code is a string of bytes.
  • LF indicates separation of the string into lines.
  • Typically the overall string has a trailing LF (but that's not essential).
  • Each line is a J sentence, which may include, or be, a comment, or be blank or empty.
  • The above is true for code returned by Foreign (5!:5<'verbname').
  • On UNIX-type platforms (including Apple Macintosh) this is precisely the format of a verb definition inside a J script (i.e. a textfile).
  • On Windows platforms, J routinely converts textfiles, especially J scripts, into LF-separated form. This means that CR is routinely eliminated and no special code is required to handle CR and CRLF.

Inflection

An ASCII character or a name can be extended with inflections in any order:

  • one or more period (.) characters,
  • one or more colon (:) characters.

The inflected word is a single word. It is different from, and completely independent of, the uninflected word.

Inflected words are used only to designate primitives (including control words), not user-assigned names.


Primitives

Primitives are words whose meaning is built into J, naming its pre-defined nouns, verbs, action modifiers, and controls.

All graphics, inflected graphics, and inflected names are reserved for primitives.


Control Words

Some inflected names, such as if., do., and end., are control words. They are used to control program flow. These words are treated as sentences in themselves. They break up the executable line into a number of separate sentences -- the control words, and the phrases between the control words.

Control words are allowed only inside the body of an explicit definition.


Paired Characters

J executes a sentence from right to left. Parentheses: ( and ) are the only words that change this order.

Parentheses are the only paired characters in J.

We exclude the Apostrophe, or single-quote ('), which some would consider to be "paired" with itself.

The characters <, >, {, }, [, ] and their inflected forms are, counter-intuitively, not paired: they are the names of individual primitives.


Parts Of Speech

Every word in a J sentence, and every value produced during execution of a sentence, has a part of speech. The parts of speech are:

The primary parts of speech are noun, verb, adverb, and conjunction.


The Result Of A Sentence

The result of a sentence is the result of the last execution within the sentence. Normally this is the execution of a verb, which produces a noun.

If the sentence has been typed-in from the keyboard, J will print the result of the sentence on the console (in a J IDE, this will appear before the next prompt for input) except when the last execution was an assignment.


Defining A Name

A name is defined when it appears in an assignment, i.e. when the name appears just before a copula (=.) or (=:). The act of assignment attaches a value to a name to produce a definition.

When a name is assigned, it takes as its value the result of the sentence to the right of the copula. This value may be any part of speech: it is possible to create named verbs by assigning verb values in exactly the same way that assigning noun values creates named nouns.

The phrase  4!:0 <y reports the part of speech of the name y according to this table:

The Result of  4!:0 <y
_2 invalid name <!>
_1 name is valid but undefined
0 noun
1 adverb
2 conjunction
3 verb

<!> An invalid name, in the table above, is a name which

  • does not begin with an alphabetic,
  • contains an invalid character, or one not valid within a name e.g. TAB,
  • or violates the underscore rules.
   4!:0 <'123'           NB. Invalid name, starts with number
_2
   4!:0 <'x123'          NB. Valid name, but undefined
_1
   4!:0 <'x',TAB,'123'   NB. Invalid name, has invalid character
_2
   4!:0 <'x123_'         NB. Invalid name, ends in a single underscore
_2
   a=: 5                 NB. A noun
   4!:0 <'a'
0
   slash=: /             NB. An adverb
   4!:0 <'slash'
1
   atsign=: @            NB. A conjunction
   4!:0 <'atsign'
2
   plus=: +              NB. A verb
   4!:0 <'plus'
3

Illustration

The ;: verb will show you how J breaks a sentence into words. However, in some contexts it might be more convenient show the sentence with spaces between each word. For example, with the definition:

jwords=: 3 : 0
  bchars=: a.{~ 16+i.11
  boxy=. bchars-:9!:6''
  indent=. ' '#~+/*/\' '=y
  9!:7]11#' '
  r=. indent,deb 1{":;:y
  bchars 9!:7@([^:boxy)'+++++++++|-'
  r
)

we get:

   jwords '+/%#'
+ / % #
   
   jwords '/:~'
/: ~
   
   jwords@>LF cut 5!:5<'jwords'
3 : 0                                        
  bchars =: a. { ~ 16 + i. 11                
  boxy =. bchars -: 9 !: 6 ''                
  indent =. ' ' # ~ + / * / \ ' ' = y        
  9 !: 7 ] 11 # ' '                          
  r =. indent , deb 1 { ": ;: y              
  bchars 9 !: 7 @ ( [ ^: boxy ) '+++++++++|-'
  r                                          
)    

One thing to remember, though, when using this routine: number lists are single words which include embedded spaces:

   ;:'1 2 3'
+-----+
|1 2 3|
+-----+