Character Encodings
Text representation has traditionally been one of the
most difficult problems of internationalization. Java, however, solves
this problem quite elegantly and hides the difficult issues. Java uses
Unicode internally, so it can represent essentially any character in
any commonly used written language. As I noted earlier, the remaining
task is to convert Unicode to and from locale-specific encodings. Java
includes quite a few internal byte-to-char and char-to-byte converters
that handle converting locale-specific character encodings to Unicode
and vice versa. Although the converters themselves are not public
, they are accessible through the
InputStreamReader
and OutputStreamWriter
classes, which are
character streams included in the java.io
package.
Any program can automatically handle locale-specific encodings
simply by using these character stream classes to do their textual
input and output. Note that the FileReader
and FileWriter
classes use these streams to
automatically read and write text files that use the platform’s
default encoding.
Example 8-2
shows a simple program that works with character encodings. It
converts a file from one specified encoding to another by converting
from the first encoding to Unicode and then from Unicode to the second
encoding. Note that most of the program is taken up with the mechanics
of parsing argument lists, handling exceptions, and so on. Only a few
lines are required to create the InputStreamReader
and OutputStreamWriter ...
Get Java Examples in a Nutshell, 3rd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.