Reading/Writing a Different Character Set
Problem
You need to read or write a text file using a particular encoding.
Solution
Convert the
text to or from internal
Unicode by specifying a converter when you construct an
InputStreamReader
or
PrintWriter.
Discussion
Classes InputStreamReader
and
OutputStreamWriter
are the bridge from byte-oriented
Stream
s to character-based
Reader
s. These classes read or write bytes and
translate them to or from characters according to a specified
character encoding. The Unicode
character set used inside Java (char
and
String
types) is a 16-bit character set. But most
character sets, such as ASCII, Swedish, Spanish, Greek, Turkish, and
many others, use only a small subset of that. In fact, many European
language character sets fit nicely into 8-bit characters. Even the
larger character sets (script-based and pictographic languages)
don’t all use the same bit values for each particular
character. The
encoding
, then, is a mapping between Unicode
characters and a particular external storage format for characters
drawn from a particular national or linguistic character set.
To simplify matters, the
InputStreamReader
and
OutputStreamWriter
constructors are the only
places where you can specify the name of an encoding to be used in
this translation. If you do not, the platform’s (or
user’s) default encoding will be used.
PrintWriters
, BufferedReaders
,
and the like all use whatever encoding the
InputStreamReader
or
OutputStreamWriter
class uses. Since ...
Get Java Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.