Unicode Escapes
Currently, there isn’t a large installed base of Unicode text editors. There’s an even smaller installed base of machines with full Unicode fonts installed. Therefore, it’s essential that all valid Java programs can be written using nothing more than ASCII characters.
All Java keywords and operators as well as the names of all the
classes, methods, and fields in the core API may be written in pure
ASCII. This is by deliberate design on the part of JavaSoft. However,
Unicode characters are explicitly allowed in comments, string and
char
literals, and identifiers. The following, the
opening line from Homer’s Odyssey,
should be legal Java:
To enable statements like that in Java source, non-ASCII characters
are embedded through Unicode escape sequences. The escape sequence
for a character is a backslash ( \ ) followed by a small
u
, followed by the four-digit hexadecimal code for
the character. For example:
char tab = '\u0009'; char softHyphen = '\u00AD'; char sigma = '\u03C3'; char squareKeesu = '\u30B9';.
Using Unicode escapes, the opening line from Homer’s Odyssey would be rendered as:
/* \u039F\u03B4\u03C5\u03C3\u03C3\u03B5\u03B9\u03B1 */ String \u03B1\u03C1\u03C7\u03B7 = "\u0386\u03BD\u03B4\u03C1\u03B1 \u03BC\u03BF\u03B9 " + "\u03AD\u03BD\u03BD\u03B5\u03C0\u03B5, " + "\u039C\u03BF\u03C5\u03C3\u03B1, " + " \u03BF\u03C2 \u03BC\u03AC\u03BB\u03B1 \u03C0\u03BF\u03BB\u03BB\u03B1"; ...
Get Java I/O now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.