O'Reilly logo

Java I/O by Elliotte Rusty Harold

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Other Encodings

Although Unicode is the most advanced and comprehensive character set yet designed on this planet, it has not taken the world by storm. Compared to the vast quantities of ASCII data, there are virtually no Unicode files on today’s computers. Although Unicode support is growing, there will doubtless be legacy data in other encodings that must be read for centuries to come. A lot of it is in the Unicode subsets ASCII and ISO Latin-1, but a lot of it is also in less popular encoding schemes like EBCDIC and MacRoman. Those only cover English and a few Western European languages. There are multiple encodings in use for Arabic, Turkish, Hebrew, Greek, Cyrillic, Chinese, Japanese, Korean, and many other languages and scripts. The Reader and Writer classes (discussed in the next chapter) allow you to read and write data in these different character sets. The String class also has a number of methods that convert between different encodings (though a String object itself is always represented in Unicode). Furthermore, the JDK includes a character mode tool based on these classes called native2ascii that performs such conversions on existing files.

The name native2ascii is a misnomer. Rather than converting to ASCII, it converts to ISO Latin-1 with Unicode characters embedded with Unicode escape sequences like \u020F. It can also work in reverse, converting an ISO Latin-1 file with embedded Unicode to a native character set. For example, to copy the contents of the file macdata.txt ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required