O'Reilly logo

Java I/O by Elliotte Rusty Harold

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Reading and Writing Text

Because of the difficulties caused by different character sets, reading and writing text is one of the trickiest things you can do with streams. Most of the time, text should be handled with readers and writers, a subject we’ll take up in Chapter 15. However, the DataInputStream and DataOutputStream classes do provide methods a Java program can use to read and write text that another Java program will understand. The text format used is a compressed form of Unicode called UTF-8. It’s unlikely that other, non-Java programs will understand this format unless they’ve been specially coded to interoperate with text data written by Java, especially since Java’s UTF-8 differs slightly from the standard UTF-8 used in XML and elsewhere.

The UTF-8 Format

Java strings and chars are Unicode. However, Unicode isn’t particularly efficient. Most files of English text contain almost nothing but ASCII characters. Thus, using two bytes for these characters is really overkill. UTF-8 solves this problem by encoding the ASCII characters in a single byte at the expense of having to use three bytes for many more of the less common characters. For the purposes of this chapter, UTF-8 provides a more efficient way to read and write strings; it is used by the readUTF() and writeUTF() methods implemented by the DataInputStream and DataOutputStream classes. For a full description of UTF-8, see Chapter 14.

The variant form of UTF-8 that these classes use is intended for string literals ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required