Unicode Escapes

Currently, there isn’t a large installed base of Unicode text editors. There’s an even smaller installed base of machines with full Unicode fonts installed. Therefore, it’s essential that all valid Java programs can be written using nothing more than ASCII characters.

All Java keywords and operators as well as the names of all the classes, methods, and fields in the core API may be written in pure ASCII. This is by deliberate design on the part of JavaSoft. However, Unicode characters are explicitly allowed in comments, string and char literals, and identifiers. The following, the opening line from Homer’s Odyssey, should be legal Java:

Unicode Escapes

To enable statements like that in Java source, non-ASCII characters are embedded through Unicode escape sequences. The escape sequence for a character is a backslash ( \ ) followed by a small u, followed by the four-digit hexadecimal code for the character. For example:

char tab = '\u0009';
char softHyphen = '\u00AD';
char sigma = '\u03C3';
char squareKeesu = '\u30B9';.

Using Unicode escapes, the opening line from Homer’s Odyssey would be rendered as:

/* \u039F\u03B4\u03C5\u03C3\u03C3\u03B5\u03B9\u03B1 */ String \u03B1\u03C1\u03C7\u03B7 = "\u0386\u03BD\u03B4\u03C1\u03B1 \u03BC\u03BF\u03B9 " + "\u03AD\u03BD\u03BD\u03B5\u03C0\u03B5, " + "\u039C\u03BF\u03C5\u03C3\u03B1, " + " \u03BF\u03C2 \u03BC\u03AC\u03BB\u03B1 \u03C0\u03BF\u03BB\u03BB\u03B1"; ...

Get Java I/O now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.