Readers and Writers

Most programmers have a bad habit of writing code as if all text were ASCII or, at the least, in the native encoding of the platform. While some older, simpler network protocols, such as daytime, quote of the day, and chargen, do specify ASCII encoding for text, this is not true of HTTP and many other more modern protocols, which allow a wide variety of localized encodings, such as K0I8-R Cyrillic, Big-5 Chinese, and ISO 8859-2, for most Central European languages. When the encoding is no longer ASCII, the assumption that bytes and chars are essentially the same things also breaks down. Java’s native character set is the 2-byte Unicode character set. Consequently, Java provides an almost complete mirror of the input and output stream class hierarchy that’s designed for working with characters instead of bytes.

In this mirror image hierarchy, two abstract superclasses define the basic API for reading and writing characters. The java.io.Reader class specifies the API by which characters are read. The java.io.Writer class specifies the API by which characters are written. Wherever input and output streams use bytes, readers and writers use Unicode characters. Concrete subclasses of Reader and Writer allow particular sources to be read and targets to be written. Filter readers and writers can be attached to other readers and writers to provide additional services or interfaces.

The most important concrete subclasses of Reader and Writer are the InputStreamReader and the OutputStreamWriter classes. An InputStreamReader contains an underlying input stream from which it reads raw bytes. It translates these bytes into Unicode characters according to a specified encoding. An OutputStreamWriter receives Unicode characters from a running program. It then translates those characters into bytes using a specified encoding and writes the bytes onto an underlying output stream.

In addition to these two classes, the java.io package also includes several raw reader and writer classes that read characters without directly requiring an underlying input stream. These include:

  • FileReader

  • FileWriter

  • StringReader

  • StringWriter

  • CharArrayReader

  • CharArrayWriter

The first two work with files and the last four work internally to Java, so they won’t be of great use for network programming. However, aside from different constructors, they do have pretty much the same public interface as all the other reader and writer classes.

Writers

The Writer class mirrors the java.io.OutputStream class. It’s abstract and has two protected constructors. Like OutputStream, the Writer class is never used directly, only polymorphically through one of its subclasses. It has five write( ) methods as well as a flush( ) and a close( ) method:

protected Writer(  )
protected Writer(Object lock)
public abstract void write(char[] text, int offset, int length) 
 throws IOException
public void write(int c) throws IOException
public void write(char[] text) throws IOException
public void write(String s) throws IOException
public void write(String s, int offset, int length) throws IOException
public abstract void flush(  ) throws IOException
public abstract void close(  ) throws IOException

The write(char[] text, int offset, int length) method is the base method in terms of which the other four write( ) methods are implemented. A subclass must override at least this method as well as flush( ) and close( ), though most will override some of the other write( ) methods as well to provide more efficient implementations. For example, given a Writer object w, you can write the string “Network” like this:

char[] network = {'N', 'e', 't', 'w', 'o', 'r', 'k'};
w.write(network, 0, network.length);

The same task can be accomplished with these other methods as well:

w.write(network);
for (int i = 0;  i < network.length;  i++) w.write(network[i]);
w.write("Network");
w.write("Network", 0, 7);

Assuming that they use the same Writer object w, all of these are different ways of expressing the same thing. Which you use in any given situation is mostly a matter of convenience and taste. However, how many and which bytes are written by these lines depends on the encoding w uses. If it’s using big-endian Unicode, then it will write these 14 bytes (shown here in hexadecimal) in this order:

00 4E 00 65 00 74 00 77 00 6F 00 72 00 6B

On the other hand, if w uses little-endian Unicode, this sequence of 14 bytes is written:

4E 00 65 00 74 00 77 00 6F 00 72 00 6B 00

If w uses Latin-1, UTF-8, or MacRoman, this sequence of seven bytes is written:

4E 65 74 77 6F 72 6B

Other encodings may write still different sequences of bytes. The exact output depends on the encoding.

Writers may be buffered, either directly by being chained to a BufferedWriter or indirectly because their underlying output stream is buffered. To force a write to be committed to the output medium, invoke the flush( ) method:

w.flush(  );

The close( ) method behaves similarly to the close( ) method of OutputStream. This flushes the writer, then closes the underlying output stream and releases any resources associated with it:

public abstract void close(  ) throws IOException

Once a writer has been closed, further writes will throw IOExceptions.

OutputStreamWriter

OutputStreamWriter is the most important concrete subclass of Writer. An OutputStreamWriter receives Unicode characters from a Java program. It converts these into bytes according to a specified encoding and writes them onto an underlying output stream. Its constructor specifies the output stream to write to and the encoding to use:

public OutputStreamWriter(OutputStream out, String encoding) 
 throws UnsupportedEncodingException
public OutputStreamWriter(OutputStream out)

Valid encodings are listed in the documentation for Sun’s native2ascii tool included with the JDK and available from http://java.sun.com/products/jdk/1.2/docs/tooldocs/win32/native2ascii.html. If no encoding is specified, the default encoding for the platform is used. (In the United States, the default encoding is ISO Latin-1 on Solaris and Windows, MacRoman on the Mac.) For example, this code fragment writes the string OutputStreamWriter in the Cp1253 Windows Greek encoding:

OutputStreamWriter w = new OutputStreamWriter(
 new FileOutputStream("OdysseyB.txt"), "Cp1253");
w.write("OutputStreamWriter);

Other than the constructors, OutputStreamWriter has only the usual Writer methods (which are used exactly as they are for any Writer class) and one method to return the encoding of the object:

public String getEncoding(  )

Readers

The Reader class mirrors the java.io.InputStream class. It’s abstract with two protected constructors. Like InputStream and Writer, the Reader class is never used directly, only polymorphically through one of its subclasses. It has three read( ) methods as well as skip( ), close( ), ready( ), mark( ), reset( ), and markSupported( ) methods:

protected Reader(  )
protected Reader(Object lock)
public abstract int read(char[] text, int offset, int length) 
 throws IOException
public int read(  ) throws IOException
public int read(char[] text) throws IOException
public long skip(long n) throws IOException
public boolean ready(  )
public boolean markSupported(  )
public void mark(int readAheadLimit) throws IOException
public void reset(  ) throws IOException
public abstract void close(  ) throws IOException

The read(char[] text, int offset, int length) method is the fundamental method through which the other two read( ) methods are implemented. A subclass must override at least this method as well as close( ), though most will override some of the other read( ) methods as well in order to provide more efficient implementations.

Most of these methods are easily understood by analogy with their InputStream counterparts. The read( ) method returns a single Unicode character as an int with a value from to 65,535 or -1 on end of stream. The read(char[] text) method tries to fill the array text with characters and returns the actual number of characters read or -1 on end of stream. The read(char[] text, int offset, int length) method attempts to read length characters into the subarray of text beginning at offset and continuing for length characters. It also returns the actual number of characters read or -1 on end of stream. The skip(long n) method skips n characters. The mark( ) and reset( ) methods allow some readers to reset back to a marked position in the character sequence. The markSupported( ) method tells you whether this reader supports marking and resetting. The close( ) method closes the reader and any underlying input stream so that further attempts to read from it will throw IOExceptions.

The exception to the rule of similarity is ready( ) , which has the same general purpose as available( ) but not quite the same semantics, even modulo the byte-to-char conversion. Whereas available( ) returns an int specifying a minimum number of bytes that may be read without blocking, ready( ) returns only a boolean indicating whether the reader may be read without blocking. The problem is that some character encodings such as UTF-8 use different numbers of bytes for different characters. Thus it’s hard to tell how many characters are waiting in the network or filesystem buffer without actually reading them out of the buffer.

InputStreamReader is the most important concrete subclass of Reader. An InputStreamReader reads bytes from an underlying input stream such as a FileInputStream or TelnetInputStream. It converts these into characters according to a specified encoding and returns them. The constructor specifies the input stream to read from and the encoding to use:

public InputStreamReader(InputStream in)
public InputStreamReader(InputStream in, String encoding) 
 throws UnsupportedEncodingException

If no encoding is specified, the default encoding for the platform is used. If an unknown encoding is specified, then an UnsupportedEncodingException is thrown.

For example, this method reads an input stream and converts it all to one Unicode string using the MacCyrillic encoding:

public static String getMacCyrillicString(InputStream in) 
 throws IOException {
    
  InputStreamReader r = new InputStreamReader(in, "MacCyrillic");
  StringBuffer sb = new StringBuffer(  );
  int c;
  while ((c = r.read(  )) != -1) sb.append((char) c);
  r.close(  );
  return sb.toString(  );
    
}

Filter Readers and Writers

The InputStreamReader and OutputStreamWriter classes act as decorators on top of input and output streams that change the interface from a byte-oriented interface to a character-oriented interface. Once this is done, additional character-oriented filters can be layered on top of the reader or writer using the java.io.FilterReader and java.io.FilterWriter classes. As with filter streams, there are a variety of subclasses that perform specific filtering, including:

  • BufferedReader

  • BufferedWriter

  • LineNumberReader

  • PushbackReader

  • PrintWriter

Buffered readers and writers

The BufferedReader and BufferedWriter classes are the character-based equivalents of the byte-oriented BufferedInputStream and BufferedOutputStream classes. Where BufferedInputStream and BufferedOutputStream use an internal array of bytes as a buffer, BufferedReader and BufferedWriter use an internal array of chars.

When a program reads from a BufferedReader, text is taken from the buffer rather than directly from the underlying input stream or other text source. When the buffer empties, it is filled again with as much text as possible, even if not all of it is immediately needed. This will make future reads much faster.

When a program writes to a BufferedWriter, the text is placed in the buffer. The text is moved to the underlying output stream or other target only when the buffer fills up or when the writer is explicitly flushed. This can make writes much faster than would otherwise be the case.

Both BufferedReader and BufferedWriter have the usual methods associated with readers and writers, like read( ), ready( ), write( ), and close( ). They each have two constructors used to chain the BufferedReader or BufferedWriter to an underlying reader or writer and to set the size of the buffer. If the size is not set, then the default size of 8,192 characters is used:

public BufferedReader(Reader in, int bufferSize)
public BufferedReader(Reader in)
public BufferedWriter(Writer out)
public BufferedWriter(Writer out, int bufferSize)

For example, the earlier getMacCyrillicString( ) example was less than efficient because it read characters one at a time. Since MacCyrillic is a 1-byte character set, this also meant it read bytes one at a time. However, it’s straightforward to make it run faster by chaining a BufferedReader to the InputStreamReader like this:

public static String getMacCyrillicString(InputStream in) 
 throws IOException {
    
  Reader r = new InputStreamReader(in, "MacCyrillic");
  r = new BufferedReader(r, 1024);
  StringBuffer sb = new StringBuffer(  );
  int c;
  while ((c = r.read(  )) != -1) sb.append((char) c);
  r.close(  );
  return sb.toString(  );
    
}

All that was needed to buffer this method was one additional line of code. None of the rest of the algorithm had to change, since the only InputStreamReader methods used were the read( ) and close( ) methods declared in the Reader superclass and shared by all Reader subclasses, including BufferedReader.

The BufferedReader class also has a readLine( ) method that reads a single line of text and returns it as a string:

public String readLine(  ) throws IOException

This method is supposed to replace the deprecated readLine( ) method in DataInputStream, and it has mostly the same behavior as that method. The big difference is that by chaining a BufferedReader to an InputStreamReader, you can correctly read lines in character sets other than the default encoding for the platform. Unfortunately, this method shares the same bugs as the readLine( ) method in DataInputStream, discussed before. That is, it will tend to hang its thread when reading streams where lines end in carriage returns, such as is commonly the case when the streams derive from a Macintosh or a Macintosh text file. Consequently, you should scrupulously avoid this method in network programs.

It’s not all that difficult, however, to write a safe version of this class that cor- rectly implements the readLine( ) method. Example 4.1 is such a SafeBufferedReader class. It has exactly the same public interface as BufferedReader. It just has a slightly different private implementation. I’ll use this class in future chapters in situations where it’s extremely convenient to have a readLine( ) method.

Example 4-1. The SafeBufferedReader Class

package com.macfaq.io;

import java.io.*;

public class SafeBufferedReader extends BufferedReader {

  public SafeBufferedReader(Reader in) {
    this(in, 1024);
  }

  public SafeBufferedReader(Reader in, int bufferSize) {
    super(in, bufferSize);
  }

  private boolean lookingForLineFeed = false;
  
  public String readLine(  ) throws IOException {
    StringBuffer sb = new StringBuffer("");
    while (true) {
      int c = this.read(  );
      if (c == -1) { // end of stream
        return null;
      }
      else if (c == '\n') {
        if (lookingForLineFeed) {
          lookingForLineFeed = false;
          continue;
        }
        else {
          return sb.toString(  );
        }
      }
      else if (c == '\r') {
        lookingForLineFeed = true;
        return sb.toString(  );
      }
      else {
        lookingForLineFeed = false;
        sb.append((char) c);
      }
    }
  }

}

The BufferedWriter( ) class also adds one new method not included in its superclass, and this method is also geared toward writing lines. That method is newLine( ) :

public void newLine(  ) throws IOException

This method inserts a platform-dependent line-separator string into the output. The line.separator system property determines exactly what this string is. It will probably be a linefeed on Unix, a carriage return on the Macintosh, and a carriage return/linefeed pair on Windows. Since network protocols generally specify the required line terminator, you should not use this method for network programming. Instead, you should explicitly write the line terminator the protocol requires.

LineNumberReader

The LineNumberReader class replaces the deprecated LineNumberInputStream class from Java 1.0. It’s a subclass of BufferedReader that keeps track of the current line number being read. This can be retrieved at any time with the getLineNumber( ) method:

public int getLineNumber(  )

By default, the first line number is 0. However, the number of the current line and all subsequent lines can be changed with the setLineNumber( ) method:

public void setLineNumber(int lineNumber)

This method adjusts only the line numbers that getLineNumber( ) reports. It does not change the point at which the stream is read.

The LineNumberReader’s readLine( ) method shares the same bug as BufferedReader’s and DataInputStream’s, and thus is not suitable for network programming. However, the line numbers are also tracked if you use only the regular read( ) methods, and these do not share that bug. Besides these methods and the usual Reader methods, LineNumberReader has only these two constructors:

public LineNumberReader(Reader in)
public LineNumberReader(Reader in, int bufferSize)

Since LineNumberReader is a subclass of BufferedReader, it does have an internal character buffer whose size can be set with the second constructor. The default size is 8,192 characters.

PushbackReader

The PushbackReader class is the mirror image of the PushbackInputStream class. As usual, the main difference is that it pushes back chars rather than bytes. It provides three unread( ) methods that push characters onto the reader’s input buffer:

public void unread(int c) throws IOException
public void unread(char[] cbuf) throws IOException
public void unread(char[] cbuf, int offset, int length) 
 throws IOException

The first unread( ) method pushes a single character onto the reader. The second pushes an array of characters. The third pushes the specified subarray of characters starting with cbuf[offset] and continuing through cbuf [offset+length-1].

By default, the size of the pushback buffer is only one character. However, this can be adjusted in the second constructor:

public PushbackReader(Reader in)
public PushbackReader(Reader in, int bufferSize)

Trying to unread more characters than the buffer will hold throws an IOException.

PrintWriter

The PrintWriter class is a replacement for Java 1.0’s PrintStream class that properly handles multibyte character sets and international text. Sun originally planned to deprecate PrintStream in favor of PrintWriter but backed off when it realized this would invalidate too much existing code, especially code that depended on System.out. Nonetheless, new code should use PrintWriter instead of PrintStream.

Aside from the constructors, the PrintWriter class has an almost identical collection of methods to PrintStream. These include:

public PrintWriter(Writer out)
public PrintWriter(Writer out, boolean autoFlush)
public PrintWriter(OutputStream out)
public PrintWriter(OutputStream out, boolean autoFlush)
public void flush(  )
public void close(  )
public boolean checkError(  )
protected void setError(  )
public void write(int c)
public void write(char[] text, int offset, int length)
public void write(char[] text)
public void write(String s, int offset, int length)
public void write(String s)
public void print(boolean b)
public void print(char c)
public void print(int i)
public void print(long l)
public void print(float f)
public void print(double d)
public void print(char[] text)
public void print(String s)
public void print(Object o)
public void println(  )
public void println(boolean b)
public void println(char c)
public void println(int i)
public void println(long l)
public void println(float f)
public void println(double d)
public void println(char[] text)
public void println(String s)
public void println(Object o)

Most of these methods behave the same for PrintWriter as they do for PrintStream. The exceptions are that the four write( ) methods write characters rather than bytes and that if the underlying writer properly handles character set conversion, then so do all the methods of the PrintWriter. This is an improvement over the noninternationalizable PrintStream class, but it’s still not good enough for network programming. PrintWriter still has the problems of platform dependency and minimal error reporting that plague PrintStream.

It isn’t hard to write a PrintWriter class that does work for network programming. You simply have to require the programmer to specify a line separator and let the IOExceptions fall where they may. Example 4.2 demonstrates. Notice that all the constructors require an explicit line-separator string to be provided.

Example 4-2. SafePrintWriter

/*
 * @(#)SafePrintWriter.java 1.0 99/07/10
 *
 * Written 1999 by Elliotte Rusty Harold,
 * Placed in the public domain
 * No rights reserved.
 */

package com.macfaq.io;

import java.io.*;

/**
 * @version   1.0, 99/07/10
 * @author  Elliotte Rusty Harold
 * @since Java Network Programming, 2nd edition
 */

public class SafePrintWriter extends Writer {

  protected Writer out;

  private boolean autoFlush = false;
  private String lineSeparator;
  private boolean closed = false;

  public SafePrintWriter(Writer out, String lineSeparator) {
    this(out, false, lineSeparator);
  }

  public SafePrintWriter(Writer out, char lineSeparator) {
    this(out, false, String.valueOf(lineSeparator));
  }

  public SafePrintWriter(Writer out, boolean autoFlush, String lineSeparator) {
    super(out);
    this.out = out;
    this.autoFlush = autoFlush;
    this.lineSeparator = lineSeparator;
  }

  public SafePrintWriter(OutputStream out, boolean autoFlush, 
   String encoding, String lineSeparator) 
   throws UnsupportedEncodingException {
    this(new OutputStreamWriter(out, encoding), autoFlush, lineSeparator);
  }

  public void flush(  ) throws IOException {
  
    synchronized (lock) {
      if (closed) throw new IOException("Stream closed");
      out.flush(  );
    }
    
  }

  public void close(  ) throws IOException {
  
    try {
      this.flush(  );
    }
    catch (IOException e) {
    }
    
    synchronized (lock) {
      out.close(  );
      this.closed = true;
    }
    
  }

  public void write(int c) throws IOException {
    synchronized (lock) {
      if (closed) throw new IOException("Stream closed");
      out.write(c);
    }    
  }

  public void write(char[] text, int offset, int length) throws IOException {
    synchronized (lock) {
      if (closed) throw new IOException("Stream closed");
      out.write(text, offset, length);
    }    
  }

  public void write(char[] text) throws IOException {
    synchronized (lock) {
      if (closed) throw new IOException("Stream closed");
      out.write(text, 0, text.length);
    }    
  }

  public void write(String s, int offset, int length) throws IOException {

    synchronized (lock) {
      if (closed) throw new IOException("Stream closed");
      out.write(s, offset, length);
    }

  }

  public void print(boolean b) throws IOException {
    if (b) this.write("true");
    else this.write("false");
  }

  public void println(boolean b) throws IOException {
    if (b) this.write("true");
    else this.write("false");
    this.write(lineSeparator);
    if (autoFlush) out.flush(  );
  }

  public void print(char c) throws IOException {
    this.write(String.valueOf(c));
  }

  public void println(char c) throws IOException {
    this.write(String.valueOf(c));
    this.write(lineSeparator);
    if (autoFlush) out.flush(  );
  }

  public void print(int i) throws IOException {
    this.write(String.valueOf(i));
  }

  public void println(int i) throws IOException {
    this.write(String.valueOf(i));
    this.write(lineSeparator);
    if (autoFlush) out.flush(  );
  }

  public void print(long l) throws IOException {
    this.write(String.valueOf(l));
  }

  public void println(long l) throws IOException {
    this.write(String.valueOf(l));
    this.write(lineSeparator);
    if (autoFlush) out.flush(  );
  }

  public void print(float f) throws IOException {
    this.write(String.valueOf(f));
  }

  public void println(float f) throws IOException {
    this.write(String.valueOf(f));
    this.write(lineSeparator);
    if (autoFlush) out.flush(  );
  }

  public void print(double d) throws IOException {
    this.write(String.valueOf(d));
  }

  public void println(double d) throws IOException {
    this.write(String.valueOf(d));
    this.write(lineSeparator);
    if (autoFlush) out.flush(  );
  }

  public void print(char[] text) throws IOException {
    this.write(text);
  }

  public void println(char[] text) throws IOException {
    this.write(text);
    this.write(lineSeparator);
    if (autoFlush) out.flush(  );
  }

  public void print(String s) throws IOException {
    if (s == null) this.write("null");
    else this.write(s);
  }

  public void println(String s) throws IOException {
    if (s == null) this.write("null");
    else this.write(s);
    this.write(lineSeparator);
    if (autoFlush) out.flush(  );
  }

  public void print(Object o) throws IOException {
    if (o == null) this.write("null");
    else this.write(o.toString(  ));
  }

  public void println(Object o) throws IOException {
    if (o == null) this.write("null");
    else this.write(o.toString(  ));
    this.write(lineSeparator);
    if (autoFlush) out.flush(  );
  }

  public void println(  ) throws IOException {
    this.write(lineSeparator);
    if (autoFlush) out.flush(  );
  }

}
                     

This class actually extends Writer rather than FilterWriter, as does PrintWriter. It could extend FilterWriter instead. However, this would save only one field and one line of code, since this class needs to override every single method in FilterWriter (close( ), flush( ), and all three write( ) methods). The reason for this is twofold. First, the PrintWriter class has to be much more careful about synchronization than the FilterWriter class is. Second, some of the classes that may be used as an underlying Writer for this class, notably CharArrayWriter, do not implement the proper semantics for close( ) and allow further writes to take place even after the writer is closed. Consequently, we have to handle the checks for whether the stream is closed in this class rather than relying on the underlying Writer out to do it for us.

Note

This chapter has been a whirlwind tour of the java.io package, covering the bare minimum you need to know to write network, programs. For a more detailed and comprehensive look, with many more examples, you should check out my previous bOok, Java I/O (O’Reilly & Associates, Inc., 1999).

Get Java Network Programming, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.