Most programmers have a bad habit of writing code as if all text were ASCII or, at the least, in the native encoding of the platform. While some older, simpler network protocols, such as daytime, quote of the day, and chargen, do specify ASCII encoding for text, this is not true of HTTP and many other more modern protocols, which allow a wide variety of localized encodings, such as K0I8-R Cyrillic, Big-5 Chinese, and ISO 8859-2, for most Central European languages. When the encoding is no longer ASCII, the assumption that bytes and chars are essentially the same things also breaks down. Java’s native character set is the 2-byte Unicode character set. Consequently, Java provides an almost complete mirror of the input and output stream class hierarchy that’s designed for working with characters instead of bytes.
In
this mirror image hierarchy, two abstract superclasses define the
basic API for reading and writing characters. The
java.io.Reader
class specifies the API by which
characters are read. The
java.io.Writer
class specifies the API by which
characters are written. Wherever input and output streams use bytes,
readers and writers use Unicode characters. Concrete subclasses of
Reader
and Writer
allow
particular sources to be read and targets to be written. Filter
readers and writers can be attached to other readers and writers to
provide additional services or interfaces.
The most important concrete subclasses of Reader
and Writer
are the
InputStreamReader
and the
OutputStreamWriter
classes. An
InputStreamReader
contains an underlying input
stream from which it reads raw bytes. It translates these bytes into
Unicode characters according to a specified encoding. An
OutputStreamWriter
receives Unicode characters
from a running program. It then translates those characters into
bytes using a specified encoding and writes the bytes onto an
underlying output stream.
In addition to these two classes, the
java.io
package also includes
several raw reader and writer classes that read characters without
directly requiring an underlying input stream. These include:
FileReader
FileWriter
StringReader
StringWriter
CharArrayReader
CharArrayWriter
The first two work with files and the last four work internally to Java, so they won’t be of great use for network programming. However, aside from different constructors, they do have pretty much the same public interface as all the other reader and writer classes.
The Writer
class
mirrors the java.io.OutputStream
class. It’s
abstract and has two protected constructors. Like
OutputStream,
the Writer
class
is never used directly, only polymorphically through one of its
subclasses. It has five write( )
methods as well
as a flush( )
and a close( )
method:
protected Writer( ) protected Writer(Object lock) public abstract void write(char[] text, int offset, int length) throws IOException public void write(int c) throws IOException public void write(char[] text) throws IOException public void write(String s) throws IOException public void write(String s, int offset, int length) throws IOException public abstract void flush( ) throws IOException public abstract void close( ) throws IOException
The write(char[] text, int offset, int length)
method is the base method in terms of
which the other four write( )
methods are
implemented. A subclass must override at least this method as well as
flush( )
and close( )
, though
most will override some of the other write( )
methods as well to provide more efficient implementations. For
example, given a Writer
object
w
, you can write the string “Network”
like this:
char[] network = {'N', 'e', 't', 'w', 'o', 'r', 'k'}; w.write(network, 0, network.length);
The same task can be accomplished with these other methods as well:
w.write(network); for (int i = 0; i < network.length; i++) w.write(network[i]); w.write("Network"); w.write("Network", 0, 7);
Assuming that they use the same Writer
object
w
, all of these are different ways of expressing
the same thing. Which you use in any given situation is mostly a
matter of convenience and taste. However, how many and which bytes
are written by these lines depends on the encoding
w
uses. If it’s using big-endian Unicode,
then it will write these 14 bytes (shown here in hexadecimal) in this
order:
00 4E 00 65 00 74 00 77 00 6F 00 72 00 6B
On the other hand, if w
uses little-endian
Unicode, this sequence of 14 bytes is written:
4E 00 65 00 74 00 77 00 6F 00 72 00 6B 00
If w
uses Latin-1, UTF-8, or MacRoman, this
sequence of seven bytes is written:
4E 65 74 77 6F 72 6B
Other encodings may write still different sequences of bytes. The exact output depends on the encoding.
Writers may be buffered, either directly by being chained to a
BufferedWriter
or indirectly because their
underlying output stream is buffered. To force a write to be
committed to the output medium, invoke the flush( )
method:
w.flush( );
The close( )
method behaves similarly to the
close( )
method of
OutputStream
. This flushes the writer, then closes
the underlying output stream and releases any resources associated
with it:
public abstract void close( ) throws IOException
Once a writer has been closed, further writes will throw
IOException
s.
OutputStreamWriter
is the most important concrete
subclass of Writer
. An
OutputStreamWriter
receives Unicode characters
from a Java program. It converts these into bytes according to a
specified encoding and writes them onto an underlying output stream.
Its constructor specifies the output stream to write to and the
encoding to use:
public OutputStreamWriter(OutputStream out, String encoding) throws UnsupportedEncodingException public OutputStreamWriter(OutputStream out)
Valid encodings are listed in the documentation for Sun’s native2ascii tool included with the JDK and available from http://java.sun.com/products/jdk/1.2/docs/tooldocs/win32/native2ascii.html. If no encoding is specified, the default encoding for the platform is used. (In the United States, the default encoding is ISO Latin-1 on Solaris and Windows, MacRoman on the Mac.) For example, this code fragment writes the string in the Cp1253 Windows Greek encoding:
OutputStreamWriter w = new OutputStreamWriter( new FileOutputStream("OdysseyB.txt"), "Cp1253"); w.write(");
Other than the constructors, OutputStreamWriter
has only the usual Writer
methods (which are used
exactly as they are for any Writer
class) and one
method to return the encoding of the object:
public String getEncoding( )
The Reader
class mirrors the
java.io.InputStream
class. It’s abstract
with two protected constructors. Like InputStream
and Writer
, the Reader
class is
never used directly, only polymorphically through one of its
subclasses. It has three read( )
methods as well
as skip( )
, close( )
,
ready( )
, mark( )
,
reset( )
, and markSupported( )
methods:
protected Reader( ) protected Reader(Object lock) public abstract int read(char[] text, int offset, int length) throws IOException public int read( ) throws IOException public int read(char[] text) throws IOException public long skip(long n) throws IOException public boolean ready( ) public boolean markSupported( ) public void mark(int readAheadLimit) throws IOException public void reset( ) throws IOException public abstract void close( ) throws IOException
The read(char[] text, int offset, int length)
method is the fundamental method
through which the other two read( )
methods are
implemented. A subclass must override at least this method as well as
close( )
, though most will override some of the
other read( )
methods as well in order to provide
more efficient implementations.
Most of these methods are easily understood by analogy with their
InputStream
counterparts. The read( )
method returns a single Unicode character as an
int
with a value from
to 65,535 or -1 on end of stream. The read(char[]
text)
method tries to fill the array
text
with characters and returns the actual number
of characters read or -1 on end of stream. The read(char[] text, int
offset, int length)
method
attempts to read length
characters into the
subarray of text
beginning at
offset
and continuing for
length
characters. It also returns the actual
number of characters read or -1 on end of stream. The
skip(long n)
method skips n
characters. The mark( )
and reset( )
methods allow some readers to reset
back to a marked position in the character sequence. The
markSupported( )
method tells you whether this
reader supports marking and resetting. The close( )
method closes the reader and any underlying input stream
so that further attempts to read from it will throw
IOException
s.
The exception to the rule of similarity is ready( )
, which has the same general purpose as
available( )
but not quite the same semantics,
even modulo the byte-to-char conversion. Whereas available( )
returns an int
specifying a minimum
number of bytes that may be read without blocking, ready( )
returns only a boolean
indicating
whether the reader may be read without blocking. The problem is that
some character encodings such as UTF-8 use different numbers of bytes
for different characters. Thus it’s hard to tell how many
characters are waiting in the network or filesystem buffer without
actually reading them out of the buffer.
InputStreamReader
is the most important concrete
subclass of Reader
. An
InputStreamReader
reads bytes from an underlying
input stream such as a FileInputStream
or
TelnetInputStream
. It converts these into
characters according to a specified encoding and returns them. The
constructor specifies the input stream to read from and the encoding
to use:
public InputStreamReader(InputStream in) public InputStreamReader(InputStream in, String encoding) throws UnsupportedEncodingException
If no encoding is specified, the default encoding for the platform is
used. If an unknown encoding is specified, then an
UnsupportedEncodingException
is thrown.
For example, this method reads an input stream and converts it all to one Unicode string using the MacCyrillic encoding:
public static String getMacCyrillicString(InputStream in) throws IOException { InputStreamReader r = new InputStreamReader(in, "MacCyrillic"); StringBuffer sb = new StringBuffer( ); int c; while ((c = r.read( )) != -1) sb.append((char) c); r.close( ); return sb.toString( ); }
The InputStreamReader
and
OutputStreamWriter
classes act as decorators on
top of input and output streams that change the interface from a
byte-oriented interface to a character-oriented interface. Once this
is done, additional character-oriented filters can be layered on top
of the reader or writer using the
java.io.FilterReader
and
java.io.FilterWriter
classes. As with filter
streams, there are a variety of subclasses that perform specific
filtering, including:
BufferedReader
BufferedWriter
LineNumberReader
PushbackReader
PrintWriter
The
BufferedReader
and BufferedWriter
classes are the character-based equivalents of the byte-oriented
BufferedInputStream
and
BufferedOutputStream
classes. Where
BufferedInputStream
and
BufferedOutputStream
use an internal array of
bytes as a buffer, BufferedReader
and
BufferedWriter
use an internal array of chars.
When a program reads from a BufferedReader
, text
is taken from the buffer rather than directly from the underlying
input stream or other text source. When the buffer empties, it is
filled again with as much text as possible, even if not all of it is
immediately needed. This will make future reads much faster.
When a program writes to a BufferedWriter
, the
text is placed in the buffer. The text is moved to the underlying
output stream or other target only when the buffer fills up or when
the writer is explicitly flushed. This can make writes much faster
than would otherwise be the case.
Both BufferedReader
and
BufferedWriter
have the usual methods associated
with readers and writers, like read( )
,
ready( )
, write( )
, and
close( )
. They each have two constructors used to
chain the BufferedReader
or
BufferedWriter
to an underlying reader or writer
and to set the size of the buffer. If the size is not set, then the
default size of 8,192 characters is used:
public BufferedReader(Reader in, int bufferSize) public BufferedReader(Reader in) public BufferedWriter(Writer out) public BufferedWriter(Writer out, int bufferSize)
For example, the earlier getMacCyrillicString( )
example was less than efficient because it read characters one at a
time. Since MacCyrillic is a 1-byte character set, this also meant it
read bytes one at a time. However, it’s straightforward to make
it run faster by chaining a BufferedReader
to the
InputStreamReader
like this:
public static String getMacCyrillicString(InputStream in) throws IOException { Reader r = new InputStreamReader(in, "MacCyrillic"); r = new BufferedReader(r, 1024); StringBuffer sb = new StringBuffer( ); int c; while ((c = r.read( )) != -1) sb.append((char) c); r.close( ); return sb.toString( ); }
All that was needed to buffer this method was one additional line of
code. None of the rest of the algorithm had to change, since the only
InputStreamReader
methods used were the
read( )
and close( )
methods
declared in the Reader
superclass and shared by
all Reader
subclasses, including
BufferedReader
.
The BufferedReader
class also has a
readLine( )
method that reads a single line of
text and returns it as a string:
public String readLine( ) throws IOException
This method is supposed to replace the deprecated readLine( )
method in DataInputStream
, and it has
mostly the same behavior as that method. The big difference is that
by chaining a BufferedReader
to an
InputStreamReader
, you can correctly read lines in
character sets other than the default encoding for the platform.
Unfortunately, this method shares the same bugs as the
readLine( )
method in
DataInputStream
, discussed before. That is, it
will tend to hang its thread when reading streams where lines end in
carriage returns, such as is commonly the case when the streams
derive from a Macintosh or a Macintosh text file. Consequently, you
should scrupulously avoid this method in network programs.
It’s not all that difficult, however, to write a safe version
of this class that cor- rectly implements the readLine( )
method. Example 4.1 is such a
SafeBufferedReader
class. It has exactly the same
public interface as BufferedReader
. It just has a
slightly different private implementation. I’ll use this class
in future chapters in situations where it’s extremely
convenient to have a readLine( )
method.
Example 4-1. The SafeBufferedReader Class
package com.macfaq.io; import java.io.*; public class SafeBufferedReader extends BufferedReader { public SafeBufferedReader(Reader in) { this(in, 1024); } public SafeBufferedReader(Reader in, int bufferSize) { super(in, bufferSize); } private boolean lookingForLineFeed = false; public String readLine( ) throws IOException { StringBuffer sb = new StringBuffer(""); while (true) { int c = this.read( ); if (c == -1) { // end of stream return null; } else if (c == '\n') { if (lookingForLineFeed) { lookingForLineFeed = false; continue; } else { return sb.toString( ); } } else if (c == '\r') { lookingForLineFeed = true; return sb.toString( ); } else { lookingForLineFeed = false; sb.append((char) c); } } } }
The BufferedWriter( )
class also adds one new
method not included in its superclass, and this method is also geared
toward writing lines. That method is newLine( )
:
public void newLine( )
throws IOException
This method inserts a platform-dependent line-separator string into
the output. The line.separator
system property
determines exactly what this string is. It will probably be a
linefeed on Unix, a carriage return on the Macintosh, and a carriage
return/linefeed pair on Windows. Since network protocols generally
specify the required line terminator, you should not use this method
for network programming. Instead, you should explicitly write the
line terminator the protocol requires.
The
LineNumberReader
class replaces the deprecated
LineNumberInputStream
class from Java 1.0.
It’s a subclass of BufferedReader
that keeps
track of the current line number being read. This can be retrieved at
any time with the getLineNumber( )
method:
public int getLineNumber( )
By default, the first line number is 0. However, the number of the
current line and all subsequent lines can be changed with the
setLineNumber( )
method:
public void setLineNumber(int lineNumber)
This method adjusts only the line numbers that
getLineNumber( )
reports. It does not change the
point at which the stream is read.
The LineNumberReader
’s readLine( )
method shares the same bug as
BufferedReader
’s and
DataInputStream
’s, and thus is not suitable
for network programming. However, the line numbers are also tracked
if you use only the regular read( )
methods, and
these do not share that bug. Besides these methods and the usual
Reader
methods,
LineNumberReader
has only these two constructors:
public LineNumberReader(Reader in) public LineNumberReader(Reader in, int bufferSize)
Since LineNumberReader
is a subclass of
BufferedReader
, it does have an internal character
buffer whose size can be set with the second constructor. The default
size is 8,192 characters.
The
PushbackReader
class is the mirror image of the
PushbackInputStream
class. As usual, the main
difference is that it pushes back chars rather than bytes. It
provides three unread( )
methods that push characters onto the
reader’s input buffer:
public void unread(int c) throws IOException public void unread(char[] cbuf) throws IOException public void unread(char[] cbuf, int offset, int length) throws IOException
The first unread( )
method pushes a single
character onto the reader. The second pushes an array of characters.
The third pushes the specified subarray of characters starting with
cbuf[offset]
and continuing through cbuf [offset+length-1]
.
By default, the size of the pushback buffer is only one character. However, this can be adjusted in the second constructor:
public PushbackReader(Reader in) public PushbackReader(Reader in, int bufferSize)
Trying to unread more characters than the buffer will hold throws an
IOException
.
The
PrintWriter
class is a replacement for Java
1.0’s PrintStream
class that properly
handles multibyte character sets and international text. Sun
originally planned to deprecate PrintStream
in
favor of PrintWriter
but backed off when it
realized this would invalidate too much existing code, especially
code that depended on System.out
. Nonetheless, new
code should use PrintWriter
instead of
PrintStream
.
Aside from the constructors, the PrintWriter
class
has an almost identical collection of methods to
PrintStream
. These include:
public PrintWriter(Writer out) public PrintWriter(Writer out, boolean autoFlush) public PrintWriter(OutputStream out) public PrintWriter(OutputStream out, boolean autoFlush) public void flush( ) public void close( ) public boolean checkError( ) protected void setError( ) public void write(int c) public void write(char[] text, int offset, int length) public void write(char[] text) public void write(String s, int offset, int length) public void write(String s) public void print(boolean b) public void print(char c) public void print(int i) public void print(long l) public void print(float f) public void print(double d) public void print(char[] text) public void print(String s) public void print(Object o) public void println( ) public void println(boolean b) public void println(char c) public void println(int i) public void println(long l) public void println(float f) public void println(double d) public void println(char[] text) public void println(String s) public void println(Object o)
Most of these methods behave the same for
PrintWriter
as they do for
PrintStream
. The exceptions are that the four
write( )
methods write characters rather than
bytes and that if the underlying writer properly handles character
set conversion, then so do all the methods of the
PrintWriter
. This is an improvement over the
noninternationalizable PrintStream
class, but
it’s still not good enough for network programming.
PrintWriter
still has the problems of platform
dependency and minimal error reporting that plague
PrintStream
.
It isn’t hard to write a PrintWriter
class
that does work for network programming. You simply have to require
the programmer to specify a line separator and let the
IOException
s fall where they may. Example 4.2 demonstrates. Notice that all the constructors
require an explicit line-separator string to be provided.
Example 4-2. SafePrintWriter
/* * @(#)SafePrintWriter.java 1.0 99/07/10 * * Written 1999 by Elliotte Rusty Harold, * Placed in the public domain * No rights reserved. */ package com.macfaq.io; import java.io.*; /** * @version 1.0, 99/07/10 * @author Elliotte Rusty Harold * @since Java Network Programming, 2nd edition */ public class SafePrintWriter extends Writer { protected Writer out; private boolean autoFlush = false; private String lineSeparator; private boolean closed = false; public SafePrintWriter(Writer out, String lineSeparator) { this(out, false, lineSeparator); } public SafePrintWriter(Writer out, char lineSeparator) { this(out, false, String.valueOf(lineSeparator)); } public SafePrintWriter(Writer out, boolean autoFlush, String lineSeparator) { super(out); this.out = out; this.autoFlush = autoFlush; this.lineSeparator = lineSeparator; } public SafePrintWriter(OutputStream out, boolean autoFlush, String encoding, String lineSeparator) throws UnsupportedEncodingException { this(new OutputStreamWriter(out, encoding), autoFlush, lineSeparator); } public void flush( ) throws IOException { synchronized (lock) { if (closed) throw new IOException("Stream closed"); out.flush( ); } } public void close( ) throws IOException { try { this.flush( ); } catch (IOException e) { } synchronized (lock) { out.close( ); this.closed = true; } } public void write(int c) throws IOException { synchronized (lock) { if (closed) throw new IOException("Stream closed"); out.write(c); } } public void write(char[] text, int offset, int length) throws IOException { synchronized (lock) { if (closed) throw new IOException("Stream closed"); out.write(text, offset, length); } } public void write(char[] text) throws IOException { synchronized (lock) { if (closed) throw new IOException("Stream closed"); out.write(text, 0, text.length); } } public void write(String s, int offset, int length) throws IOException { synchronized (lock) { if (closed) throw new IOException("Stream closed"); out.write(s, offset, length); } } public void print(boolean b) throws IOException { if (b) this.write("true"); else this.write("false"); } public void println(boolean b) throws IOException { if (b) this.write("true"); else this.write("false"); this.write(lineSeparator); if (autoFlush) out.flush( ); } public void print(char c) throws IOException { this.write(String.valueOf(c)); } public void println(char c) throws IOException { this.write(String.valueOf(c)); this.write(lineSeparator); if (autoFlush) out.flush( ); } public void print(int i) throws IOException { this.write(String.valueOf(i)); } public void println(int i) throws IOException { this.write(String.valueOf(i)); this.write(lineSeparator); if (autoFlush) out.flush( ); } public void print(long l) throws IOException { this.write(String.valueOf(l)); } public void println(long l) throws IOException { this.write(String.valueOf(l)); this.write(lineSeparator); if (autoFlush) out.flush( ); } public void print(float f) throws IOException { this.write(String.valueOf(f)); } public void println(float f) throws IOException { this.write(String.valueOf(f)); this.write(lineSeparator); if (autoFlush) out.flush( ); } public void print(double d) throws IOException { this.write(String.valueOf(d)); } public void println(double d) throws IOException { this.write(String.valueOf(d)); this.write(lineSeparator); if (autoFlush) out.flush( ); } public void print(char[] text) throws IOException { this.write(text); } public void println(char[] text) throws IOException { this.write(text); this.write(lineSeparator); if (autoFlush) out.flush( ); } public void print(String s) throws IOException { if (s == null) this.write("null"); else this.write(s); } public void println(String s) throws IOException { if (s == null) this.write("null"); else this.write(s); this.write(lineSeparator); if (autoFlush) out.flush( ); } public void print(Object o) throws IOException { if (o == null) this.write("null"); else this.write(o.toString( )); } public void println(Object o) throws IOException { if (o == null) this.write("null"); else this.write(o.toString( )); this.write(lineSeparator); if (autoFlush) out.flush( ); } public void println( ) throws IOException { this.write(lineSeparator); if (autoFlush) out.flush( ); } }
This class actually extends Writer
rather than
FilterWriter
, as does
PrintWriter
. It could extend
FilterWriter
instead. However, this would save
only one field and one line of code, since this class needs to
override every single method in FilterWriter
(close( )
, flush( )
, and all
three write( )
methods). The reason for this is
twofold. First, the PrintWriter
class has to be
much more careful about synchronization than the
FilterWriter
class is. Second, some of the classes
that may be used as an underlying Writer
for this
class, notably CharArrayWriter
, do not implement
the proper semantics for close( )
and allow
further writes to take place even after the writer is closed.
Consequently, we have to handle the checks for whether the stream is
closed in this class rather than relying on the underlying
Writer
out
to do it for us.
Note
This chapter has been a whirlwind tour of the java.io
package, covering the bare minimum you need to know to write network, programs. For a more detailed and comprehensive look, with many more examples, you should check out my previous bOok, Java I/O (O’Reilly & Associates, Inc., 1999).
Get Java Network Programming, Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.