Vocabulary/ScreenOutputInput
Output to the screen
Conversion to a character array
To display a result J first applies ": y, which produces an array of bytes after making the following conversions:
- numeric types are converted to displayable ASCII digits
- character types are converted to bytes using 8 u:"1 y which:
- leaves bytes unchanged (could be ASCII or non-ASCII)
- converts unicodes to bytes using UTF-8 encoding
- symbol type is converted to ` followed by the characters the symbol represents, converted as above
- boxes first convert the contents of the box as above, then enclose the resulting array in boxing characters
tohex =: [: ,"2 ' ' (,"1) '0123456789ABCDEF' {~ 16 16 #: a. i. ] NB. display in hex 9!:6'' NB. Using the special boxing characters ┌┬┐├┼┤└┴┘│─ tohex 9!:6 '' NB. This is their hex form 10 11 12 13 14 15 16 17 18 19 1A u: 177 NB. A single unicode character ± tohex ": u: 177 NB. displayable form is 2 bytes C2 B1 tohex ": 5 ; u: 177 NB. displayable form is 3x6 array of bytes 10 1A 11 1A 1A 12 19 35 19 C2 B1 19 16 1A 17 1A 1A 18 5 ; u: 177 NB. The boxed display, explained below ┌─┬──┐ │5│±│ └─┴──┘
In the hex display of the converted array, the boxing characters are bytes numbered from 10-1A; '5' is represented as the ASCII code 35, and the unicode character '±' has become the two-byte UTF-8 sequence C2 B1.
Sending the character array to the screen
If the character value calculated above is an atom or a list, it is made into a table by adding leading axes of length 1.
The character array is sent to the screen, one row at a time, starting a new line after each row. If the rank of the array exceeds 2, a blank line is inserted between the last cell of each axis and the following cell 0 for that axis. This leaves one blank row between 2-cells, 2 blank rows between 3-cells, etc.
If the character array has no rows, nothing is displayed. This is why i. 0 0 is used to produce a totally empty display
$ ": i. 0 0 0 0 $ ": i. 0 0 ": i. 0 0 NB. no blank line ": i. 0 NB. there was a blank line
As each row is displayed, a set of translations is applied to the characters:
- NUL characters (0{a.) produce no output.
- A new line is started after each CR (13{a.), and also each LF (10{a.) not immediately preceded by CR
- Byte indexes above 127 are assumed to indicate UTF-8 encoding. Each multi-byte sequence is consumed and converted to a single Unicode character. If an invalid UTF-8 sequence is encountered (an invalid starting byte, or an incomplete sequence), the invalid byte(s) are replaced by Unicode character U+FFFD (�), and decoding continues with the character after the invalid sequence. NUL characters encountered in the middle of a UTF-8 sequence are ignored.
- Bytes with indexes 16-26 (hex 10-1A) are replaced according to the following table:
From (hex) To From (hex) To From (hex) To From (hex) To 10 U+250C ┌ 13 U+251C ├ 16 U+2514 └ 19 U+2502 │ 11 U+252C ┬ 14 U+253C ┼ 17 U+2534 ┴ 20 U+2500 ─ 12 U+2510 ┐ 15 U+2524 ┤ 18 U+2518 ┘
- Other bytes are sent to the display unchanged, and display according to their ASCII interpretation. ASCII bytes with indexes below 32 are control codes and their display is system-dependent. ASCII indexes in the range 32-127 are graphic characters whose display is familiar.
Repercussions
1. The display of boxed nouns containing non-ASCII characters is ragged. This is because the boxing characters are installed before the bytes are converted to Unicode characters, which leaves the boxes too big for the contents:
5 ; u: 177 NB. ± is converted to 2 bytes, boxed, and then converted to 1 Unicode character for display +-+--+ |5|±| +-+--+
2. CR, LF and {.a. (here assigned as NUL) can affect the display of any character noun containing them
'alpha',LF,'bravo' alpha bravo 'alpha',CR,'bravo' alpha bravo NUL=: {.a. 'alpha',NUL,'bravo' alphabravo
Displaying (a.)
a. is 256 bytes but it doesn't represent 256 characters, because there are only 128 ASCII characters. Thus, it isn't pretty to look at. We will display it in sections. The displays all follow the rules given above.
The Control Characters
The display of control characters (indexes 0-31) is system-dependent, but you can be sure that line-breaks will be added after LF and CR (indexes 10 and 13), and that indexes 16-26 will be replaced with the boxing characters.
The Graphic Characters
The graphic characters have well-defined displays.
_16 ]\ 32 }. 128 {. a. NB. Display 16 chars per line !"#$%&'()*+,-./ 0123456789:;<=>? @ABCDEFGHIJKLMNO PQRSTUVWXYZ[\]^_ `abcdefghijklmno pqrstuvwxyz{|}~?
The non-ASCII Bytes
The non-ASCII bytes are all invalid UTF-8 sequences, so displaying them directly does not produce a meaningful display
_16 ]\ 128}.a. ���������������� ...
But you may have documents with non-ASCII characters. Before the advent of Unicode, these superasciis (128}.a.) coded for extended Latin characters. Nowadays this character set is known as "Latin-1". To display them, turn them into unicode precision using Unicode u: like this:
(Some characters may not display properly in the font used by your browser to display this page)
_16 ]\ u: 128}. a. ¡¢£¤¥¦§¨©ª«¬®¯ °±²³´µ¶·¸¹º»¼½¾¿ ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏ ÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß àáâãäåæçèéêëìíîï ðñòóôõö÷øùúûüýþÿ
Input from the screen
Lines entered from the screen are read as bytes and are not encoded into Unicode characters even if they contain non-ASCII bytes. This means that Unicode characters that are pasted into a J sentence are multiple bytes, not single unicodes. Such characters are valid only inside comments, quoted strings or explicitly defined nouns.
wd 'clipcopy *',16bc3 16b85{a. NB. This puts a Unicode character on the clipboard wd 'clippaste' NB. You can paste it in. These examples all use <PASTE> to enter the character Å $ 'Å' NB. It can appear in a string, but it's 2 bytes, not 1 unicode 2 Å =. 45 NB. It can't appear outside of quotes |spelling error 0 : 0 NB. It can appear in a defined noun... ÅÅ ) ÅÅ 3 : 0 NB. ...but not a defined verb Å ) |spelling error