Chapter 1. Streams

This chapter discusses Java’s stream classes, which are defined in the java.io.* package. While streams are not really part of RMI, a working knowledge of the stream classes is an important part of an RMI programmer’s skillset. In particular, this chapter provides essential background information for understanding two related areas: sockets and object serialization.

The Core Classes

A stream is an ordered sequence of bytes. However, it’s helpful to also think of a stream as a data structure that allows client code to either store or retrieve information. Storage and retrieval are done sequentially—typically, you write data to a stream one byte at a time or read information from the stream one byte at a time. However, in most stream classes, you cannot “go back”—once you’ve read a piece of data, you must move on. Likewise, once you’ve written a piece of data, it’s written.

You may think that a stream sounds like an impoverished data structure. Certainly, for most programming tasks, a HashMap or an ArrayList storing objects is preferable to a read-once sequence of bytes. However, streams have one nice feature: they are a simple and correct model for almost any external device connected to a computer. Why correct? Well, when you think about it, the code-level mechanics of writing data to a printer are not all that different from sending data over a modem; the information is sent sequentially, and, once it’s sent, it can not be retrieved or “un-sent.”[3] Hence, streams are an abstraction that allow client code to access an external resource without worrying too much about the specific resource.

Using the streams library is a two-step process. First, device-specific code that creates the stream objects is executed; this is often called “opening” the stream. Then, information is either read from or written to the stream. This second step is device-independent; it relies only on the stream interfaces. Let’s start by looking at the stream classes offered with Java: InputStream and OutputStream.

InputStream

InputStream is an abstract class that represents a data source. Once opened, it provides information to the client that created it. The InputStream class consists of the following methods:

public int available(  ) throws IOException
public void close(  ) throws IOException
public void mark(int numberOfBytes) throws IOException
public  boolean markSupported(  ) throws IOException
public abstract int read(  ) throws IOException
public int read(byte[] buffer) throws IOException 
public int read(byte[] buffer, int startingOffset, int numberOfBytes) throws
    IOException
public void reset(  ) throws IOException
public long skip(long numberOfBytes) throws IOException

These methods serve three different roles: reading data, stream navigation, and resource management.

Reading data

The most important methods are those that actually retrieve data from the stream. InputStream defines three basic methods for reading data:

public int read(  ) throws IOException
public int read(byte[] buffer) throws IOException
public int read(byte[] buffer, int startingOffset, int numberOfBytes) throws
    IOException

The first of these methods, read( ), simply returns the next available byte in the stream. This byte is returned as an integer in order to allow the InputStream to return nondata values. For example, read( ) returns -1 if there is no data available, and no more data will be available to this stream. This can happen, for example, if you reach the end of a file. On the other hand, if there is currently no data, but some may become available in the future, the read( ) method blocks. Your code then waits until a byte becomes available before continuing.

Tip

A piece of code is said to block if it must wait for a resource to finish its job. For example, using the read( ) method to retrieve data from a file can force the method to halt execution until the target hard drive becomes available. Blocking can sometimes lead to undesirable results. If your code is waiting for a byte that will never come, the program has effectively crashed.

The other two methods for retrieving data are more advanced versions of read( ), added to the InputStream class for efficiency. For example, consider what would happen if you created a tight loop to fetch 65,000 bytes one at a time from an external device. This would be extraordinarily inefficient. If you know you’ll be fetching large amounts of data, it’s better to make a single request:

byte buffer = new byte[1000];
read(buffer);

The read(byte[] buffer) method is a request to read enough bytes to fill the buffer (in this case, buffer.length number of bytes). The integer return value is the number of bytes that were actually read, or -1 if no bytes were read.

Finally, read(byte[] buffer, int startingOffset, int numberOfBytes) is a request to read the exact numberOfBytes from the stream and place them in the buffer starting at position startingOffset. For example:

read(buffer, 2, 7);

This is a request to read 7 bytes and place them in the locations buffer[2], buffer[3], and so on up to buffer[8]. Like the previous read( ), this method returns an integer indicating the amount of bytes that it was able to read, or -1 if no bytes were read at all.

Stream navigation

Stream navigation methods are methods that enable you to move around in the stream without necessarily reading in data. There are five stream navigation methods:

public int available(  ) throws IOException
public long skip(long numberOfBytes) throws IOException 
public void mark(int numberOfBytes) throws IOException 
public  boolean markSupported(  ) throws IOException 
public void reset(  ) throws IOException

available( ) is used to discover how many bytes are guaranteed to be immediately available. To avoid blocking, you can call available( ) before each read( ), as in the following code fragment:

while (stream.available(  ) >0 )) {
	processNextByte(stream.read(  ));
}

Warning

There are two caveats when using available( ) in this way. First, you should make sure that the stream from which you are reading actually implements available( ) in a meaningful way. For example, the default implementation, defined in InputStream, simply returns 0. This behavior, while technically correct, is really misleading. (The preceding code fragment will not work if the stream always returns 0.) The second caveat is that you should make sure to use buffering. See Section 1.3 later in this chapter for more details on how to buffer streams.

The skip( ) method simply moves you forward numberOfBytes in the stream. For many streams, skipping is equivalent to reading in the data and then discarding it.

Warning

In fact, most implementations of skip( ) do exactly that: repeatedly read and discard the data. Hence, if numberOfBytes worth of data aren’t available yet, these implementations of skip( ) will block.

Many input streams are unidirectional: they only allow you to move forward. Input streams that support repeated access to their data do so by implementing marking. The intuition behind marking is that code that reads data from the stream can mark a point to which it might want to return later. Input streams that support marking return true when markSupported( ) is called. You can use the mark( ) method to mark the current location in the stream. The method’s sole parameter, numberOfBytes, is used for expiration—the stream will retire the mark if the reader reads more than numberOfBytes past it. Calling reset( ) returns the stream to the point where the mark was made.

Tip

InputStream methods support only a single mark. Consequently, only one point in an InputStream can be marked at any given time.

Resource management

Because streams are often associated with external devices such as files or network connections, using a stream often requires the operating system to allocate resources beyond memory. For example, most operating systems limit the number of files or network connections that a program can have open at the same time. The resource management methods of the InputStream class involve communication with native code to manage operating system-level resources.

The only resource management method defined for InputStream is close( ). When you’re done with a stream, you should always explicitly call close( ). This will free the associated system resources (e.g., the associated file descriptor for files).

At first glance, this seems a little strange. After all, one of the big advantages of Java is that it has garbage collection built into the language specification. Why not just have the object free the operating-system resources when the object is garbage collected?

The reason is that garbage collection is unreliable. The Java language specification does not explicitly guarantee that an object that is no longer referenced will be garbage collected (or even that the garbage collector will ever run). In practice, you can safely assume that, if your program runs short on memory, some objects will be garbage collected, and some memory will be reclaimed. But this assumption isn’t enough for effective management of scarce operating-system resources such as file descriptors. In particular, there are three main problems:

  • You have no control over how much time will elapse between when an object is eligible to be garbage collected and when it is actually garbage collected.

  • You have very little control over which objects get garbage collected.[4]

  • There isn’t necessarily a relationship between the number of file handles still available and the amount of memory available. You may run out of file handles long before you run out of memory. In which case, the garbage collector may never become active.

Put succinctly, the garbage collector is an unreliable way to manage anything other than memory allocation. Whenever your program is using scarce operating-system resources, you should explicitly release them. This is especially true for streams; a program should always close streams when it’s finished using them.

IOException

All of the methods defined for InputStream can throw an IOException. IOException is a checked exception. This means that stream manipulation code always occurs inside a try/catch block, as in the following code fragment:

try{
	while( -1 != (nextByte = bufferedStream.read(  ))) {
		char nextChar = (char) nextByte; 
		...
	}
}
catch (IOException e) {
	...
}

The idea behind IOException is this: streams are mostly used to exchanging data with devices that are outside the JVM. If something goes wrong with the device, the device needs a universal way to indicate an error to the client code.

Consider, for example, a printer that refuses to print a document because it is out of paper. The printer needs to signal an exception, and the exception should be relayed to the user; the program making the print request has no way of refilling the paper tray without human intervention. Moreover, this exception should be relayed to the user immediately.

Most stream exceptions are similar to this example. That is, they often require some sort of user action (or at least user notification), and are often best handled immediately. Therefore, the designers of the streams library decided to make IOException a checked exception, thereby forcing programs to explicitly handle the possibility of failure.

Tip

Some foreshadowing: RMI follows a similar design philosophy. Remote methods must be declared to throw RemoteException (and client code must catch RemoteException). RemoteException means “something has gone wrong, somewhere outside the JVM.”

OutputStream

OutputStream is an abstract class that represents a data sink. Once it is created, client code can write information to it. OutputStream consists of the following methods:

public void close(  ) throws IOException 
public void flush(  ) throws IOException
public void write(byte[] buffer) throws IOException
public void write(byte[] buffer, int startingOffset, int numberOfBytes) throws
    IOException
public void write(int value) throws IOException

The OutputStream class is a little simpler than InputStream; it doesn’t support navigation. After all, you probably don’t want to go back and write information a second time. OutputStream methods serve two purposes: writing data and resource management.

Writing data

OutputStream defines three basic methods for writing data:

public void write(byte[] buffer) throws IOException
public void write(byte[] buffer, int startingOffset, int numberOfBytes) throws
    IOException
public void write(int value) throws IOException

These methods are analogous to the read( ) methods defined for InputStream. Just as there was one basic method for reading a single byte of data, there is one basic method, write(int value), for writing a single byte of data. The argument to this write( ) method should be an integer between 0 and 255. If not, it is reduced to module 256 before being written.

Just as there were two array-based variants of read( ), there are two methods for writing arrays of bytes. write(byte[] buffer) causes all the bytes in the array to be written out to the stream. write(byte[] buffer, int startingOffset, int numberOfBytes) causes numberOfBytes bytes to be written, starting with the value at buffer[startingOffset].

Tip

The fact that the argument to the basic write( ) method is an integer is somewhat peculiar. Recall that read( ) returned an integer, rather than a byte, in order to allow instances of InputStream to signal exceptional conditions. write( ) takes an integer, rather than a byte, so that the read and write method declarations are parallel. In other words, if you’ve read a value in from a stream, and it’s not -1, you should be able to write it out to another stream without casting it.

Resource management

OutputStream defines two resource management methods:

public void close(  ) 
public void flush(  )

close( ) serves exactly the same role for OutputStream as it did for InputStream—itshould be called when the client code is done using the stream and wishes to free up all the associated operating-system resources.

The flush( ) method is necessary because output streams frequently use a buffer to store data that is being written. This is especially true when data is being written to either a file or a socket. Passing data to the operating system a single byte at a time can be expensive. A much more practical strategy is to buffer the data at the JVM level and occasionally call flush( ) to send the data en masse.

Viewing a File

To make this discussion more concrete, we will now discuss a simple application that allows the user to display the contents of a file in a JTextArea. The application is called ViewFile and is shown in Example 1-1. Note that the application’s main( ) method is defined in the com.ora.rmibook.chapter1.ViewFile class.[5] The resulting screenshot is shown in Figure 1-1.

The ViewFile application

Figure 1-1. The ViewFile application

Example 1-1. ViewFile.java

public class ViewfileFrame extends ExitingFrame{
//  lots of code to set up the user interface.
//  The View button's action listener is an inner class

	private void copyStreamToViewingArea(InputStream fileInputStream)
         throws IOException {
		BufferedInputStream bufferedStream = new BufferedInputStream(fileInputStream);
		int nextByte;
		_fileViewingArea.setText("");
		StringBuffer localBuffer = new StringBuffer(  );
		while( -1 != (nextByte = bufferedStream.read(  )))   {
			char nextChar = (char) nextByte; 	
			localBuffer.append(nextChar);
		}
		_fileViewingArea.append(localBuffer.toString(  ));
	}

	private class ViewFileAction extends AbstractAction {
		public ViewFileAction(  ) {
			putValue(Action.NAME, "View");
			putValue(Action.SHORT_DESCRIPTION, "View file contents in main text area.");
	}

		public void actionPerformed(ActionEvent event) {
			FileInputStream fileInputStream = _fileTextField.getFileInputStream(  );
			if (null==fileInputStream) {
				_fileViewingArea.setText("Invalid file name");
			}
			else {
				try {
					copyStreamToViewingArea(fileInputStream);
					 fileInputStream.close(  );
				}
				 catch (java.io.IOException ioException)  {
					_fileViewingArea.setText("\n Error occured while reading file");
				}
			}
		}

The important part of the code is the View button’s action listener and the copyStreamToViewingArea( ) method. copyStreamToViewingArea( ) takes an instance of InputStream and copies the contents of the stream to the central JTextArea. What happens when a user clicks on the View button? Assuming all goes well, and that no exceptions are thrown, the following three lines of code from the buttons’s action listener are executed:

FileInputStream fileInputStream = _fileTextField.getFileInputStream(  );
copyStreamToViewingArea(fileInputStream);
fileInputStream.close(  );

The first line is a call to the getFileInputStream( ) method on _fileTextField. That is, the program reads the name of the file from a text field and tries to open a FileInputStream. FileInputStream is defined in the java.io* package. It is a subclass of InputStream used to read the contents of a file.

Once this stream is opened, copyStreamToViewingArea( ) is called. copyStream-ToViewingArea( ) takes the input stream, wraps it in a buffer, and then reads it one byte at a time. There are two things to note here:

  • We explicitly check that nextByte is not equal to -1 (e.g., that we’re not at the end of the file). If we don’t do this, the loop will never terminate, and we will we will continue to append (char) -1 to the end of our text until the program crashes or throws an exception.

  • We use BufferedInputStream instead of using FileInputStream directly. Internally, a BufferedInputStream maintains a buffer so it can read and store many values at one time. Maintaining this buffer allows instances of Buffered-InputStream to optimize expensive read operations. In particular, rather than reading each byte individually, bufferedStream converts individual calls to its read( ) method into a single call to FileInputStream’s read(byte[] buffer) method. Note that buffering also provides another benefit. BufferedInputStream supports stream navigation through the use of marking.

Tip

Of course, the operating system is probably already buffering file reads and writes. But, as we noted above, even the act of passing data to the operating system (which uses native methods) is expensive and ought to be buffered.

Layering Streams

The use of BufferedInputStream illustrates a central idea in the design of the streams library: streams can be wrapped in other streams to provide incremental functionality. That is, there are really two types of streams:

Primitive streams

These are the streams that have native methods and talk to external devices. All they do is transmit data exactly as it is presented. FileInputStream and File-OuputStream are examples of primitive streams.

Intermediate streams

These streams are not direct representatives of a device. Instead, they function as a wrapper around an already existing stream, which we will call the underlying stream. The underlying stream is usually passed as an argument to the intermediate stream’s constructor. The intermediate stream has logic in its read( ) or write( ) methods that either buffers the data or transforms it before forwarding it to the underlying stream. Intermediate streams are also responsible for propagating flush( ) and close( ) calls to the underlying stream. BufferedInputStream and BufferedOutputStream are examples of intermediate streams.

Warning

close( ) and flush( ) propagate to sockets as well. That is, if you close a stream that is associated with a socket, you will close the socket. This behavior, while logical and consistent, can come as a surprise.

Compressing a File

To further illustrate the idea of layering, I will demonstrate the use of GZIPOutputStream, defined in the package java.util.zip, with the CompressFile application. This application is shown in Example 1-2.

CompressFile is an application that lets the user choose a file and then makes a compressed copy of it. The application works by layering three output streams together. Specifically, it opens an instance of FileOutputStream, which it then uses as an argument to the constructor of a BufferedOutputStream, which in turn is used as an argument to GZIPOutputStream’s constructor. All data is then written using GZIPOutputStream. Again, the main( ) method for this application is defined in the com.ora.rmibook.chapter1.CompressFile class.

The important part of the source code is the copy( ) method, which copies an InputStream to an OutputStream, and ActionListener, which is added to the Compress button. A screenshot of the application is shown in Figure 1-2.

The CompressFile application

Figure 1-2. The CompressFile application

Example 1-2. CompressFile.java

private int copy(InputStream source, OutputStream destination) throws IOException {
		int nextByte;
		int numberOfBytesCopied = 0;
		while(-1!= (nextByte = source.read(  ))) {
			destination.write(nextByte);
			numberOfBytesCopied++;
		}
		destination.flush(  );
		return numberOfBytesCopied;
	}

private class CompressFileAction extends AbstractAction {
	//  setup code omitted

	public void actionPerformed(ActionEvent event) {
		InputStream source = _startingFileTextField.getFileInputStream(  );
		OutputStream destination = _destinationFileTextField.getFileOutputStream(  );
		if ((null!=source) && (null!=destination)) {
 			try {
				BufferedInputStream bufferedSource = new BufferedInputStream(source);
				BufferedOutputStream bufferedDestination = new
                       BufferedOutputStream(destination);
				GZIPOutputStream zippedDestination = new
                       GZIPOutputStream(bufferedDestination);
				copy(bufferedSource, zippedDestination);
				bufferedSource.close(  );
				zippedDestination.close(  );
 			}
			 catch (IOException e){}


		}

How this works

When the user clicks on the Compress button, two input streams and three output streams are created. The input streams are similar to those used in the ViewFile application—they allow us to use buffering as we read in the file. The output streams, however, are new. First, we create an instance of FileOutputStream. We then wrap an instance of BufferedOutputStream around the instance of FileOutputStream. And finally, we wrap GZIPOutputStream around BufferedOutputStream. To see what this accomplishes, consider what happens when we start feeding data to GZIPOutputStream (the outermost OutputStream).

  1. write(nextByte) is repeatedly called on zippedDestination.

  2. zippedDestination does not immediately forward the data to buffered-Destination. Instead, it compresses the data and sends the compressed version of the data to bufferedDestination using write(int value).

  3. bufferedDestination does not immediately forward the data it received to destination. Instead, it puts the data in a buffer and waits until it gets a large amount of data before calling destination’s write(byte[] buffer) method.

Eventually, when all the data has been read in, zippedDestination’s close( ) method is called. This flushes bufferedDestination, which flushes destination, causing all the data to be written out to the physical file. After that, zippedDestination is closed, which causes bufferedDestination to be closed, which then causes destination to be closed, thus freeing up scarce system resources.

Some Useful Intermediate Streams

I will close our discussion of streams by briefly mentioning a few of the most useful intermediate streams in the Javasoft libraries. In addition to buffering and compressing, the two most commonly used intermediate stream types are DataInputStream/DataOutputStream and ObjectInputStream/ObjectOutputStream. We will discuss ObjectInputStream and ObjectOutputStream extensively in Chapter 10.

DataInputStream and DataOutputStream don’t actually transform data that is given to them in the form of bytes. However, DataInputStream implements the DataInput interface, and DataOutputStream implements the DataOutput interface. This allows other datatypes to be read from, and written to, streams. For example, DataOutput defines the writeFloat(float value) method, which can be used to write an IEEE 754 floating-point value out to a stream. This method takes the floating point argument, converts it to a sequence of four bytes, and then writes the bytes to the underlying stream.

If DataOutputStream is used to convert data for storage into an underlying stream, the data should always be read in with a DataInputStream object. This brings up an important principle: intermediate input and output streams which transform data must be used in pairs. That is, if you zip, you must unzip. If you encrypt, you must decrypt. And, if you use DataOuputStream, you must use DataInputStream.

Tip

We’ve only covered the basics of using streams. That’s all we need in order to understand RMI. To find out more about streams, and how to use them, either play around with the JDK—always the recommended approach—or see Java I/O by Elliotte Rusty Harold (O’Reilly).

Readers and Writers

The last topics I will touch on in this chapter are the Reader and Writer abstract classes. Readers and writers are like input streams and output streams. The primary difference lies in the fundamental datatype that is read or written; streams are byte-oriented, whereas readers and writers use characters and strings.

The reason for this is internationalization. Readers and writers were designed to allow programs to use a localized character set and still have a stream-like model for communicating with external devices. As you might expect, the method definitions are quite similar to those for InputStream and OutputStream. Here are the basic methods defined in Reader:

public void close(  )
public void mark(int readAheadLimit) 
public boolean markSupported(  ) 
public int read(  ) 
public int read(char[] cbuf) 
public int read(char[] cbuf, int off, int len) 
public boolean ready(  ) 
public void reset(  ) 
public long skip(long n)

These are analogous to the read( ) methods defined for InputStream. For example, read( ) still returns an integer. The difference is that, instead of data values being in the range of 0-255 (i.e., single bytes), the return value is in the range of 0-65535 (appropriate for characters, which are 2 bytes wide). However, a return value of -1 is still used to signal that there is no more data.

The only other major change is that InputStream’s available( ) method has been replaced with a boolean method, ready( ), which returns true if the next call to read( ) doesn’t block. Calling ready( ) on a class that extends Reader is analogous to checking (available( ) > 0) on InputStream.

There aren’t nearly so many subclasses of Reader or Writer as there are types of streams. Instead, readers and writers can be used as a layer on top of streams—most readers have a constructor that takes an InputStream as an argument, and most writers have a constructor that takes an OutputStream as an argument. Thus, in order to use both localization and compression when writing to a file, open the file and implement compression by layering streams, and then wrap your final stream in a writer to add localization support, as in the following snippet of code:

FileOutputStream destination = new FileOutputStream(fileName);
BufferedOutputStream bufferedDestination = new BufferedOutputStream(destination);
GZIPOutputStream zippedDestination = new GZIPOutputStream(bufferedDestination);
OutputStreamWriter destinationWriter = new OutputStreamWriter(zippedDestination);

Revisiting the ViewFile Application

There is one very common Reader/Writer pair: BufferedReader and BufferedWriter. Unlike the stream buffering classes, which don’t add any new functionality, BufferedReader and BufferedWriter add additional methods for handling strings. In particular, BufferedReader adds the readLine( ) method (which reads a line of text), and BufferedWriter adds the newLine( ) method, which appends a line separator to the output.

These classes are very handy when reading or writing complex data. For example, a newline character is often a useful way to signal “end of current record.” To illustrate their use, here is the action listener from ViewFileFrame, rewritten to use BufferedReader:

private class ViewFileAction extends AbstractAction {
public void actionPerformed(ActionEvent event) {
		FileReader fileReader = _fileTextField.getFileReader(  );
		 if (null==fileReader) {
			_fileViewingArea.setText("Invalid file name");
		}
		else {
			try {
				copyReaderToViewingArea(fileReader);
				fileReader.close(  );
		 	}
			 catch (java.io.IOException ioException) {
				_fileViewingArea.setText("\n Error occured while reading file");
			 }
		 }
	}

	private void copyReaderToViewingArea(Reader reader) throws IOException {
		BufferedReader bufferedReader = new BufferedReader(reader);
		 String nextLine;
		 _fileViewingArea.setText("");
		while( null != (nextLine = bufferedReader.readLine(  ))) {

			_fileViewingArea.append(nextLine + "\n");
		}



	}


[3] Print orders can be cancelled by sending another message: a cancellation message. But the original message was still sent.

[4] You can use SoftReference (defined in java.lang.ref) to get a minimal level of control over the order in which objects are garbage collected.

[5] This example uses classes from the Java Swing libraries. If you would like more information on Swing, see Java Swing (O’Reilly) or Java Foundation Classes in a Nutshell (O’Reilly).

Get Java RMI now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.