Guides/General FAQ/Puzzling unicode

From J Wiki
Jump to navigation Jump to search

Why do some non-ASCII characters misbehave under "i." and "$"?

If I paste a non-ascii character (eg an accented vowel or an APL primitive) into a string, it behaves like several characters.

Thus: $'abc⌹e' returns the value: 7, not 5.

Further puzzling behaviour:

   $z=: 'abc⌹e'
7
   z i. 'ce'
2 6
   z i. '⌹'
3 4 5
   3 5 $z
abc�
�eabc
⌹ea

What must I do to make 'abc⌹e' behave like a string of 5 characters, with '⌹' behaving like a single character occupying position 3?

The answer involves the u: primitive, Unicode, utf-8 and converting to so-called "wide characters" (wchar).

See: Guides/UnicodeGettingStarted for an extremely simple explanation.