O'Reilly logo

Java I/O by Elliotte Rusty Harold

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Unicode Escapes

Currently, there isn’t a large installed base of Unicode text editors. There’s an even smaller installed base of machines with full Unicode fonts installed. Therefore, it’s essential that all valid Java programs can be written using nothing more than ASCII characters.

All Java keywords and operators as well as the names of all the classes, methods, and fields in the core API may be written in pure ASCII. This is by deliberate design on the part of JavaSoft. However, Unicode characters are explicitly allowed in comments, string and char literals, and identifiers. The following, the opening line from Homer’s Odyssey, should be legal Java:

Unicode Escapes

To enable statements like that in Java source, non-ASCII characters are embedded through Unicode escape sequences. The escape sequence for a character is a backslash ( \ ) followed by a small u, followed by the four-digit hexadecimal code for the character. For example:

char tab = '\u0009';
char softHyphen = '\u00AD';
char sigma = '\u03C3';
char squareKeesu = '\u30B9';.

Using Unicode escapes, the opening line from Homer’s Odyssey would be rendered as:

/* \u039F\u03B4\u03C5\u03C3\u03C3\u03B5\u03B9\u03B1 */ String \u03B1\u03C1\u03C7\u03B7 = "\u0386\u03BD\u03B4\u03C1\u03B1 \u03BC\u03BF\u03B9 " + "\u03AD\u03BD\u03BD\u03B5\u03C0\u03B5, " + "\u039C\u03BF\u03C5\u03C3\u03B1, " + " \u03BF\u03C2 \u03BC\u03AC\u03BB\u03B1 \u03C0\u03BF\u03BB\u03BB\u03B1"; ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required