Reading and Writing Text
Because of the difficulties caused by different
character sets, reading and writing text is one of the trickiest
things you can do with streams. Most of the time, text should be
handled with readers and writers, a subject we’ll take up in
Chapter 15. However, the
DataInputStream
and
DataOutputStream
classes do provide methods a Java
program can use to read and write text that another Java program will
understand. The text format used is a compressed form
of Unicode called UTF-8. It’s unlikely that other, non-Java
programs will understand this format unless they’ve been
specially coded to interoperate with text data written by Java,
especially since Java’s UTF-8 differs slightly from the
standard UTF-8 used in XML and elsewhere.
The UTF-8 Format
Java strings and
char
s are Unicode. However, Unicode isn’t
particularly efficient. Most files of English text contain almost
nothing but ASCII characters. Thus, using two bytes for these
characters is really overkill. UTF-8 solves this problem by encoding
the ASCII characters in a single byte at the expense of having to use
three bytes for many more of the less common characters. For the
purposes of this chapter, UTF-8 provides a more efficient way to read
and write strings; it is used by the
readUTF()
and
writeUTF()
methods implemented by the DataInputStream
and
DataOutputStream
classes. For a full description
of UTF-8, see Chapter 14.
The variant form of UTF-8 that these classes use is intended for string literals ...
Get Java I/O now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.