From J Wiki
Jump to navigation Jump to search

Back to: Vocabulary

Output to the screen

Conversion to a character array

To display a result J first applies ": y, which produces an array of bytes after making the following conversions:

  • numeric types are converted to displayable ASCII digits
  • character types are converted to bytes using 8 u:"1 y which:
    • leaves bytes unchanged (could be ASCII or non-ASCII)
    • converts unicodes to bytes using UTF-8 encoding
  • symbol type is converted to ` followed by the characters the symbol represents, converted as above
  • boxes first convert the contents of the box as above, then enclose the resulting array in boxing characters
   tohex =: [: ,"2 ' ' (,"1) '0123456789ABCDEF' {~ 16 16 #: a. i. ]  NB. display in hex

   9!:6''     NB. Using the special boxing characters
   tohex 9!:6 ''  NB. This is their hex form
 10 11 12 13 14 15 16 17 18 19 1A

   u: 177   NB. A single unicode character
   tohex ": u: 177  NB. displayable form is 2 bytes
 C2 B1
   tohex ": 5 ; u: 177  NB. displayable form is 3x6 array of bytes
 10 1A 11 1A 1A 12
 19 35 19 C2 B1 19
 16 1A 17 1A 1A 18
   5 ; u: 177   NB. The boxed display, explained below

In the hex display of the converted array, the boxing characters are bytes numbered from 10-1A; '5' is represented as the ASCII code 35, and the unicode character '±' has become the two-byte UTF-8 sequence C2 B1.

Sending the character array to the screen

If the character value calculated above is an atom or a list, it is made into a table by adding leading axes of length 1.

The character array is sent to the screen, one row at a time, starting a new line after each row. If the rank of the array exceeds 2, a blank line is inserted between the last cell of each axis and the following cell 0 for that axis. This leaves one blank row between 2-cells, 2 blank rows between 3-cells, etc.

If the character array has no rows, nothing is displayed. This is why i. 0 0 is used to produce a totally empty display

   $ ": i. 0 0
0 0
   $ ": i. 0
   ": i. 0 0   NB. no blank line
   ": i. 0

   NB. there was a blank line

As each row is displayed, a set of translations is applied to the characters:

  • NUL characters (0{a.) produce no output.
  • A new line is started after each CR (13{a.), and also each LF (10{a.) not immediately preceded by CR
  • Byte indexes above 127 are assumed to indicate UTF-8 encoding. Each multi-byte sequence is consumed and converted to a single Unicode character. If an invalid UTF-8 sequence is encountered (an invalid starting byte, or an incomplete sequence), the invalid byte(s) are replaced by Unicode character U+FFFD (�), and decoding continues with the character after the invalid sequence. NUL characters encountered in the middle of a UTF-8 sequence are ignored.
  • Bytes with indexes 16-26 (hex 10-1A) are replaced according to the following table:
From (hex) To From (hex) To From (hex) To From (hex) To
10 U+250C 13 U+251C 16 U+2514 19 U+2502
11 U+252C 14 U+253C 17 U+2534 20 U+2500
12 U+2510 15 U+2524 18 U+2518
  • Other bytes are sent to the display unchanged, and display according to their ASCII interpretation. ASCII bytes with indexes below 32 are control codes and their display is system-dependent. ASCII indexes in the range 32-127 are graphic characters whose display is familiar.


1. The display of boxed nouns containing non-ASCII characters is ragged. This is because the boxing characters are installed before the bytes are converted to Unicode characters, which leaves the boxes too big for the contents:

   5 ; u: 177   NB. ± is converted to 2 bytes, boxed, and then converted to 1 Unicode character for display

2. CR, LF and {.a. (here assigned as NUL) can affect the display of any character noun containing them

   NUL=: {.a.

Displaying (a.)

a. is 256 bytes but it doesn't represent 256 characters, because there are only 128 ASCII characters. Thus, it isn't pretty to look at. We will display it in sections. The displays all follow the rules given above.

The Control Characters

The display of control characters (indexes 0-31) is system-dependent, but you can be sure that line-breaks will be added after LF and CR (indexes 10 and 13), and that indexes 16-26 will be replaced with the boxing characters.

The Graphic Characters

The graphic characters have well-defined displays.

   _16 ]\ 32 }. 128 {. a.  NB. Display 16 chars per line

The non-ASCII Bytes

The non-ASCII bytes are all invalid UTF-8 sequences, so displaying them directly does not produce a meaningful display

   _16 ]\ 128}.a.

But you may have documents with non-ASCII characters. Before the advent of Unicode, these superasciis (128}.a.) coded for extended Latin characters. Nowadays this character set is known as "Latin-1". To display them, turn them into unicode precision using Unicode u: like this:

(Some characters may not display properly in the font used by your browser to display this page)

    _16 ]\ u: 128}. a.

Input from the screen

Lines entered from the screen are read as bytes and are not encoded into Unicode characters even if they contain non-ASCII bytes. This means that Unicode characters that are pasted into a J sentence are multiple bytes, not single unicodes. Such characters are valid only inside comments, quoted strings or explicitly defined nouns.

   wd 'clipcopy *',16bc3 16b85{a.  NB. This puts a Unicode character on the clipboard
   wd 'clippaste'   NB. You can paste it in.  These examples all use <PASTE> to enter the character
   $ 'Å'   NB. It can appear in a string, but it's 2 bytes, not 1 unicode
   Å =. 45   NB. It can't appear outside of quotes
|spelling error
      0 : 0  NB. It can appear in a defined noun...
   3 : 0   NB. ...but not a defined verb
|spelling error