O'Reilly logo

Java Cookbook by Ian F. Darwin

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Reading/Writing a Different Character Set

Problem

You need to read or write a text file using a particular encoding.

Solution

Convert the text to or from internal Unicode by specifying a converter when you construct an InputStreamReader or PrintWriter.

Discussion

Classes InputStreamReader and OutputStreamWriter are the bridge from byte-oriented Streams to character-based Readers. These classes read or write bytes and translate them to or from characters according to a specified character encoding. The Unicode character set used inside Java (char and String types) is a 16-bit character set. But most character sets, such as ASCII, Swedish, Spanish, Greek, Turkish, and many others, use only a small subset of that. In fact, many European language character sets fit nicely into 8-bit characters. Even the larger character sets (script-based and pictographic languages) don’t all use the same bit values for each particular character. The encoding , then, is a mapping between Unicode characters and a particular external storage format for characters drawn from a particular national or linguistic character set.

To simplify matters, the InputStreamReader and OutputStreamWriter constructors are the only places where you can specify the name of an encoding to be used in this translation. If you do not, the platform’s (or user’s) default encoding will be used. PrintWriters, BufferedReaders, and the like all use whatever encoding the InputStreamReader or OutputStreamWriter class uses. Since ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required