Vocabulary/Words

From J Wiki
Jump to: navigation, search

Back to: Vocabulary

Words

Sentences

The sentence is the executable element in a J script. Basically, it is one line of code.

Sentences cannot span more than one line. There is no "continuation character" as in Basic, and no "statement delimiter" as in Java or C.

The executable line is treated as a single sentence, unless it contains control words. If the line contains control words, each part of the line between control words is treated as a separate sentence.

if. T do. sentence1 else. sentence2 end.

T (above) is called a T-block. Typically a T-block is a single sentence, e.g.  0=#y . But it can be several sentences, i.e. a block of code.

The primitive (NB.) starts a comment. The leftmost NB. in a line causes the rest of the line to be ignored by the interpreter. Whatever stands to the left of it is the executable line.

   1 + 1
2
   1 + 1 NB. The executable line is the first 1 + 1
2
   NB. You can use NB. to "comment out" a sentence inside a script, like this ...
   NB. 1 + 1
   NB. ... J ignored that!

Strings, Numbers and Names are all Words

When a sentence is executed, J first converts it into words.

The rules for word formation are complicated. But you can see them at JDic:d332 .

Use Words (;:) to see the words in a given sentence

   sentence=: 0 : 0
 'abc' -:<name , 0 1 2+(z-1)  NB. (comment)
)

   ;: sentence
+-----+--+-+----+-+-----+-+-+-+-+-+-+-------------+
|'abc'|-:|<|name|,|0 1 2|+|(|z|-|1|)|NB. (comment)|
+-----+--+-+----+-+-----+-+-+-+-+-+-+-------------+

Summary of word formation:

  • Although J executes a sentence right-to-left (<--), it splits a code string into words left-to-right (-->)
  • An apostrophe (') starts a string which continues to the next apostrophe. A string becomes a single word, even if it contains a space

After the first apostrophe (i.e. inside a string), two consecutive apostrophes stand for one apostrophe character, as in the example above

  • A graphic such as (+) is a word in itself. However it may be inflected, e.g. (+:)
  • An alphabetic starts a name, which contains all the ensuing consecutive alphabetics, numerics and underscores. It may be inflected, e.g. (a:)

A name may not end in a single underscore unless it is a valid locative.

  • A numeric or underscore starts a number, which contains all the ensuing consecutive alphabetics, numerics, underscores and periods.
    • J recognises these character-sequences as valid words:  _: _. 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: _0: _1: _2: _3: _4: _5: _6: _7: _8: _9:
    • Apart from these exceptions, a number may not be inflected.
  • A character outside the printable ASCII range (32 thru 127) is a single word.

There is one tremendously important special case:

  • A sequence of numbers separated by whitespace is treated as a single word (as in the example above).
    • Note that a sequence of names is not treated as a single word, even if the names have numeric values.

J collects consecutive numbers into a single word before it can examine the values of any names.

> The terms used above for characters:

  • alphabetic: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
  • apostrophe: (')
  • graphic:  !"#$%&*+,-./:;<=>?@[\]^_`{|}~
  • numeric: 0123456789
  • period: (.)
  • underscore: (_)
  • whitespace: here includes sequences of space (ASCII 32) and TAB (ASCII 9), but excludes newline/linefeed/LF (ASCII 10)


J Code, Strings, Lines, Sentences and Textfiles

Definition:

  • CR stands for the ASCII byte: 13 — Carriage Return
  • LF stands for the ASCII byte: 10 — Linefeed
  • CRLF stands for the pair of ASCII bytes: 13 10 — Carriage Return Linefeed.

(assert CRLF -: CR,LF)

Textfiles are the most elementary kind of files used by today's computers to store data on disk. But there are many different standards for the simple textfile. Of these, just two are in common use:

  • Windows — separates lines using CRLF
  • UNIX (including Linux and Apple Macintosh) — separates lines using just LF .

CRLF, CR, LF are

  • Standard Library word(s) residing in the 'z'-locale.
  • Defined in the factory script stdlib.ijs.
  • View definition(s) by entering in the J session:  open'stdlib'

Internally, J uses the UNIX standard (LF-separated).

  • Even if your platform is Windows, you should expect textfiles, in particular J scripts, to have been converted into LF-separated string of bytes.
  • This ensures that J code, system or user, is cross-platform, and needs no special code to handle CR and CRLF.
  • The files library does however have verbs to retain CR when importing a file, if you actually need to do that.

A noun containing J Code is typically a string separated into lines by newline characters (LF). This holds true also for verb definitions which you've asked J to convert into a noun, e.g. by using Foreign (5!:5).

More precisely, J Code is a list of bytes, separated into individual lines by LF .

If you ask J to give you the definition of a verb as a string, e.g. using Foreign (5!:5), the result is a LF-separated string, i.e. a string of bytes, with each line terminated by LF

   ] jcode =: 5!:5 <'out'   NB. use the tool defined above as an example
3 : 0
smoutput '--- executing: y'
tryexec y
smoutput '... (datatype y) is: ' , (datatype y)
smoutput '--- executing: uucp y'
tryexec uucp y
)
   $ jcode                  NB. jcode is a string, i.e. a list of rank 1
141
   datatype jcode           NB. The precision of jcode is: byte
literal
   CR e. jcode              NB. (Even on a Windows platform) it doesn't contain CR
0
   LF e. jcode              NB. but it does contain LF
1

Define another tool (showLF) to show you the whereabouts of LF in the noun: jcode

   showLF=: verb define
require 'strings'
z=. quote y
z rplc LF ; ''' , LF , '''
)

   showLF jcode
'3 : 0' , LF , 'smoutput ''--- executing: y''' , LF , 'tryexec y' , LF , 'smoutput ''... (datatype y) is: '' , (datatype y)' , LF , 'smoutput ''--- executing: uucp y''' , LF , 'tryexec uucp y' , LF , ')'

Template:Tick

  • J code is a string of bytes
  • LF separates the string into lines
  • Typically the overall string has a trailing LF (but that's not essential)
  • Each line is a J sentence
  • The above is true for code returned by Foreign (5!:5<'verbname')
  • On UNIX-type platforms (including Apple Macintosh) this is precisely the format of a verb definition inside a J script (i.e. a textfile).
  • On Windows platforms, J routinely converts textfiles, especially J scripts, into LF-separated form. This means that CR is routinely eliminated and no special code is required to handle CR and CRLF.

Inflection

An ASCII character or a name can be extended with inflections in any order:

  • one or more period (.) characters
  • a colon (:) character.

The inflected word is a single word. It is different from, and completely independent of, the uninflected word.

Inflected words are only used to designate primitives (including control words). Not user-assigned names.


Primitives

Primitives are words whose meaning is built into J.

All graphics, inflected graphics, and inflected names are reserved for primitives.


Control Words

Some inflected names, such as if., do., and end., are control words. They are used to control program flow. These words are treated as sentences in themselves. They break up the executable line into a number of separate sentences -- the control words, and the phrases between the control words.

Control words are allowed only inside the body of an explicit definition.


Paired Characters

J executes a sentence from right to left. Parentheses: ( and ) are the only words that change this order.

Parentheses are the only paired characters in J.

We exclude the Apostrophe, or single-quote ('), which some would consider to be "paired" with itself.

The characters <, >, {, }, [, and ] are not paired: they are the names of individual primitives.


Parts Of Speech

Every word in a J sentence, and every value produced during execution of a sentence, has a part of speech. The parts of speech are:

The primary parts of speech are noun, verb, adverb, and conjunction.


The Result Of A Sentence

The result of a sentence is the result of the last execution within the sentence. Normally this is the execution of a verb, which produces a noun.

If the sentence has been typed-in from the keyboard, J will print the result of the sentence on the console (in a J IDE, this will appear before the next prompt for input) except when the last execution was an assignment.


Defining A Name

A name is defined when it appears in an assignment, i.e. when the name appears just before a copula (=.) or (=:). The act of assignment attaches a value to a name to produce a definition.

When a name is assigned, it takes as its value the result of the sentence to the right of the copula. This value may have any part of speech: it is possible to create named verbs by assigning verb values in exactly the same way that assigning noun values creates named nouns.

The phrase  4!:0 <y gives the part of speech of the name y according to this table:

The Result of  4!:0 <y
_2 invalid name <!>
_1 name is valid but undefined
0 noun
1 adverb
2 conjunction
3 verb

<!> An invalid name, in the table above, is a name which

  • does not begin with an alphabetic
  • contains an invalid character, or one not valid within a name e.g. TAB
  • violates the underscore rules.
   4!:0 <'123'           NB. Invalid name, starts with number
_2
   4!:0 <'x123'          NB. Valid name, but undefined
_1
   4!:0 <'x',TAB,'123'   NB. Invalid name, has invalid character
_2
   4!:0 <'x123_'         NB. Invalid name, ends in a single underscore
_2
   a=: 5                 NB. A noun
   4!:0 <'a'
0
   slash=: /             NB. An adverb
   4!:0 <'slash'
1
   atsign=: @            NB. A conjunction
   4!:0 <'atsign'
2
   plus=: +              NB. A verb
   4!:0 <'plus'
3