Chapter 4. Java I/O

A large part of what network programs do is simple input and output, moving bytes from one system to another. Bytes are bytes; and to a large extent, reading data a server sends you is not all that different from reading a file. Sending text to a client is not all that different from writing a file. However, input and output (I/O) in Java is organized differently than it is in most other languages, such as C, Pascal, and C++. Consequently, I’d like to take one chapter to summarize Java’s unique approach to I/O.

I/O in Java is built on streams. Input streams read data. Output streams write data. Different fundamental stream classes such as java.io.FileInputStream and sun.net.TelnetOutputStream read and write particular sources of data. However, all fundamental output streams have the same basic methods to write data and all fundamental input streams use the same basic methods to read data. After a stream is created, you can often ignore the details of exactly what it is you’re reading or writing.

Filter streams can be chained to either an input stream or an output stream. Filters can modify the data as it’s read or written—for instance, by encrypting or compressing it—or they can simply provide additional methods for converting the data that’s read or written into other formats. For instance, the java.io.DataOutputStream class provides a method that converts an int to four bytes and writes those bytes onto its underlying output stream.

Finally, readers and writers can be chained to input and output streams to allow programs to read and write text (that is, characters) rather than bytes. Used properly, readers and writers can handle a wide variety of character encodings, including multibyte character sets such as SJIS and UTF-8.

Output Streams

Java’s basic output class is java.io.OutputStream :

public abstract class OutputStream

This class provides the fundamental methods needed to write data. These are:

public abstract void write(int b) throws IOException
public void write(byte[] data) throws IOException
public void write(byte[] data, int offset, int length) 
 throws IOException
public void flush(  ) throws IOException
public void close(  ) throws IOException

Subclasses of OutputStream use these methods to write data onto particular media. For instance, a FileOutputStream uses these methods to write data into a file. A TelnetOutputStream uses these methods to write data onto a network connection. A ByteArrayOutputStream uses these methods to write data into an expandable byte array. But whichever medium you’re writing to, you mostly use only these same five methods. Sometimes you may not even know exactly what kind of stream you’re writing onto. For instance, you won’t find TelnetOutputStream in the Java class library documentation. It’s deliberately hidden inside the sun packages. It’s returned by various methods in various classes in java.net, like the getOutputStream( ) method of java.net.Socket. However, these methods are declared to return only OutputStream, not the more specific subclass TelnetOutputStream. That’s the power of polymorphism. If you know how to use the superclass, you know how to use all the subclasses too.

OutputStream ’s fundamental method is write(int b). This method takes as an argument an integer from to 255 and writes the corresponding byte to the output stream. This method is declared abstract because subclasses will need to change it to handle their particular medium. For instance, a ByteArrayOutputStream can implement this method with pure Java code that copies the byte into its array. However, a FileOutputStream will need to use native code that understands how to write data in files on the host platform.

Take special care to note that although this method takes an int as an argument, it actually writes an unsigned byte. Java doesn’t have an unsigned byte data type, so an int has to be used here instead. The only real difference between an unsigned byte and a signed byte is the interpretation. They’re both made up of eight bits, and when you write an int onto a network connection using write(int b), only eight bits are placed on the wire. If an int outside the range 0-255 is passed to write(int b), the least significant byte of the number is written, and the remaining three bytes are ignored. (This is the effect of casting an int to a byte.) On rare occasion, however, you may find a buggy third-party class that does something different, such as throwing an IllegalArgumentException or always writing 255, so it’s best not to rely on this behavior if possible.

For example, the character generator protocol defines a server that sends out ASCII text. The most popular variation of this protocol sends 72-character lines containing printable ASCII characters. (The printable ASCII characters are those from 33 to 126 that exclude the various whitespace and control characters.) The first line contains characters 33 through 104 sorted. The second line contains characters 34 through 105. The third line contains characters 35 through 106. This continues through line 29, which contains characters 55 through 126. At that point, the characters wrap around so that line 30 contains characters 56 through 126 followed by character 33 again. Lines are terminated with a carriage return (ASCII 13) and a linefeed (ASCII 10). The output looks like this:

!"#$%&'(  )*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefgh
"#$%&'(  )*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghi
#$%&'(  )*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghij
$%&'(  )*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijk
%&'(  )*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijkl
&'(  )*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklm
'(  )*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmn

Since ASCII is a 7-bit character set, each character is sent as a single byte. Consequently, this protocol is straightforward to implement using the basic write( ) methods as the next code fragment demonstrates:

public static void generateCharacters(OutputStream out) 
  throws IOException {
  
   int firstPrintableCharacter     = 33;
   int numberOfPrintableCharacters = 94;
   int numberOfCharactersPerLine   = 72;

   int start = firstPrintableCharacter;
   while (true) { /* infinite loop */
     for (int i = start; i < start+numberOfCharactersPerLine; i++) {
       out.write((
        (i-firstPrintableCharacter) % numberOfPrintableCharacters) 
         + firstPrintableCharacter);
     }
     out.write('\r'); // carriage return
     out.write('\n'); // linefeed
     start = ((start+1) - firstPrintableCharacter) 
       % numberOfPrintableCharacters + firstPrintableCharacter;
   }

The character generator server class (the exact details of which will have to wait until we discuss server sockets in Chapter 11) passes an OutputStream named out to the generateCharacters( ) method. Bytes are written onto out one at a time. These bytes are given as integers in a rotating sequence from 33 to 126. Most of the arithmetic here is to make the loop rotate in that range. After each 72 characters are written, a carriage return and a linefeed are written onto the output stream. The next start character is calculated and the loop repeats. The entire method is declared to throw IOException. That’s important because the character generator server will terminate only when the client closes the connection. The Java code will see this as an IOException.

Writing a single byte at a time is often inefficient. For example, every TCP segment that goes out your Ethernet card contains at least 40 bytes of overhead for routing and error correction. If each byte is sent by itself, then you may be filling the wire with 41 times more data than you think you are! Consequently, most TCP/IP implementations buffer data to some extent. That is, they accumulate bytes in memory and send them to their eventual destination only when a certain number have accumulated or a certain amount of time has passed. However, if you have more than one byte ready to go, it’s not a bad idea to send them all at once. Using write(byte[] data) or write(byte[] data, int offset, int length) is normally much faster than writing all the components of the data array one at a time. For instance, here’s an implementation of the generateCharacters( ) method that sends a line at a time by stuffing a complete line into a byte array:

public static void generateCharacters(OutputStream out) 
   throws IOException {
  
    int firstPrintableCharacter = 33;
    int numberOfPrintableCharacters = 94;
    int numberOfCharactersPerLine = 72;
    int start = firstPrintableCharacter;
    byte[] line = new byte[numberOfCharactersPerLine+2];
    // the +2 is for the carriage return and linefeed
    
    while (true) { /* infinite loop */      
      for (int i = start; i < start+numberOfCharactersPerLine; i++) {
        line[i-start] = (byte) ((i-firstPrintableCharacter) 
         % numberOfPrintableCharacters + firstPrintableCharacter);
      }
      line[72] = (byte) '\r'; // carriage return
      line[73] = (byte) '\n'; // line feed
      out.write(line);
      start = ((start+1)-firstPrintableCharacter) 
       % numberOfPrintableCharacters + firstPrintableCharacter;
    }
  
  }

The algorithm for calculating which bytes to write when is the same as for the previous implementation. The crucial difference is that the bytes are all stuffed into a byte array before being written onto the network. Also notice that the int result of the calculation must be cast to a byte before being stored in the array. This wasn’t necessary in the previous implementation because the single byte write( ) method is declared to take an int as an argument.

Streams can also be buffered in software, directly in the Java code as well as in the network hardware. Typically, this is accomplished by chaining a BufferedOutputStream or a BufferedWriter to the underlying stream, a technique we’ll explore shortly. Consequently, if you are done writing data, it’s important to flush the output stream. For example, suppose you’ve written a 300-byte request to an HTTP 1.1 server that uses HTTP Keep-Alive. You generally want to wait for a response before sending any more data. However, if the output stream has a 1,024-byte buffer, then the stream may be waiting for more data to arrive before it sends the data out of its buffer. No more data will be written onto the stream until after the server response arrives, but that’s never going to arrive because the request hasn’t yet been sent! The buffered stream won’t send the data to the server until it gets more data from the underlying stream, but the underlying stream won’t send more data until it gets data from the server, which won’t send data until it gets the data that’s stuck in the buffer! Figure 4.1 illustrates this Catch-22. The flush( ) method rescues you from this deadlock by forcing the buffered stream to send its data even if the buffer isn’t yet full.

Data can get lost if you don’t flush your streams

Figure 4-1. Data can get lost if you don’t flush your streams

It’s important to flush whether you think you need to or not. Depending on how you got hold of a reference to the stream, you may or may not know whether it’s buffered. (For instance, System.out is buffered whether you want it to be or not.) If flushing isn’t necessary for a particular stream, it’s a very low cost operation. However, if it is necessary, it’s very necessary. Failing to flush when you need to can lead to unpredictable, unrepeatable program hangs that are extremely hard to diagnose if you don’t have a good idea of what the problem is in the first place. As a corollary to all this, you should flush all streams immediately before you close them. Otherwise, data left in the buffer when the stream is closed may get lost.

Finally, when you’re done with a stream, you should close it by invoking its close( ) method. This releases any resources associated with the stream, such as file handles or ports. Once an output stream has been closed, further writes to it will throw IOExceptions. However, some kinds of streams may still allow you to do things with the object. For instance, a closed ByteArrayOutputStream can still be converted to an actual byte array and a closed DigestOutputStream can still return its digest.

Get Java Network Programming, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.