Unicode Escapes
Currently, there isn’t a large installed base of Unicode text editors. There’s an even smaller installed base of machines with full Unicode fonts installed. Therefore, it’s essential that all valid Java programs can be written using nothing more than ASCII characters.
All Java keywords and operators as well as the names of all the
classes, methods, and fields in the core API may be written in pure
ASCII. This is by deliberate design on the part of JavaSoft. However,
Unicode characters are explicitly allowed in comments, string and
char literals, and identifiers. The following, the
opening line from Homer’s Odyssey,
should be legal Java:
To enable statements like that in Java source, non-ASCII characters
are embedded through Unicode escape sequences. The escape sequence
for a character is a backslash ( \ ) followed by a small
u, followed by the four-digit hexadecimal code for
the character. For example:
char tab = '\u0009'; char softHyphen = '\u00AD'; char sigma = '\u03C3'; char squareKeesu = '\u30B9';.
Using Unicode escapes, the opening line from Homer’s Odyssey would be rendered as:
/* \u039F\u03B4\u03C5\u03C3\u03C3\u03B5\u03B9\u03B1 */ String \u03B1\u03C1\u03C7\u03B7 = "\u0386\u03BD\u03B4\u03C1\u03B1 \u03BC\u03BF\u03B9 " + "\u03AD\u03BD\u03BD\u03B5\u03C0\u03B5, " + "\u039C\u03BF\u03C5\u03C3\u03B1, " + " \u03BF\u03C2 \u03BC\u03AC\u03BB\u03B1 \u03C0\u03BF\u03BB\u03BB\u03B1"; ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access