Input Streams

Java’s basic input class is java.io.InputStream:

public abstract class InputStream

This class provides the fundamental methods needed to read data as raw bytes. These are:

public abstract int read(  ) throws IOException
public int read(byte[] input) throws IOException
public int read(byte[] input, int offset, int length) throws IOException
public long skip(long n) throws IOException
public int available(  ) throws IOException
public void close(  ) throws IOException

Concrete subclasses of InputStream use these methods to read data from particular media. For instance, a FileInputStream reads data from a file. A TelnetInputStream reads data from a network connection. A ByteArrayInputStream reads data from an array of bytes. But whichever source you’re reading, you mostly use only these same six methods. Sometimes you may not even know exactly what kind of stream you’re reading from. For instance, TelnetInputStream is an undocumented class hidden inside the sun.net package. Instances of it are returned by various methods in the java.net package; for example, the openStream( ) method of java.net.URL. However, these methods are declared to return only InputStream, not the more specific subclass TelnetInputStream. That’s polymorphism at work once again. The instance of the subclass can be used transparently as an instance of its superclass. No specific knowledge of the subclass is required.

The basic method of InputStream is the noargs read( ) method. This method reads a single byte of data from the input stream’s source and returns it as a number from to 255. End of stream is signified by returning -1. Since Java doesn’t have an unsigned byte data type, this number is returned as an int. The read( ) method waits and blocks execution of any code that follows it until a byte of data is available and ready to be read. Input and output can be slow, so if your program is doing anything else of importance you should try to put I/O in its own thread.

The read( ) method is declared abstract because subclasses need to change it to handle their particular medium. For instance, a ByteArrayInputStream can implement this method with pure Java code that copies the byte from its array. However, a TelnetInputStream will need to use a native library that understands how to read data from the network interface on the host platform.

The following code fragment reads 10 bytes from the InputStream in and stores them in the byte array input. However, if end of stream is detected, the loop is terminated early:

byte[] input = new byte[10];
for (int i = 0; i < input.length; i++) {
  int b = in.read(  );
  if (b  == -1) break;
  input[i] = (byte) b;
}

Although read( ) reads only a byte, it returns an int. Thus, a cast is necessary before storing the result in the byte array. Of course, this produces a signed byte from -128 to 127 instead of the unsigned byte from to 255 returned by the read( ) method. However, as long as you keep clear which one you’re working with, this is not a major problem. You can convert a signed byte to an unsigned byte like this:

int i = b >= 0 ? b : 256 + b;

Reading a byte at a time is as inefficient as writing data one byte at a time. Consequently, there are also two overloaded read( ) methods that fill a specified array with multiple bytes of data read from the stream, read(byte[] input) and read(byte[] input, int offset, int length). The first attempts to fill the specified array input. The second attempts to fill the specified subarray of input starting at offset and continuing for length bytes.

Notice that I said these methods attempt to fill the array, not necessarily that they succeed. An attempt may fail in several ways. For instance, it’s not unheard of that while your program is reading data from a remote web server over a PPP dialup link, a bug in a switch in a phone company central office will disconnect you and several thousand of your neighbors from the rest of the world. This would throw an IOException. More commonly, however, a read attempt won’t completely fail but won’t completely succeed either. Some of the requested bytes may be read but not all of them. For example, you may try to read 1,024 bytes from a network connection, when only 512 have actually arrived from the server. The rest are still in transit. They’ll arrive eventually, but they aren’t available now. To account for this, the multibyte read methods return the number of bytes actually read. For example, consider this code fragment:

byte[] input  = new byte[1024];
int bytesRead = in.read(input);

It attempts to read 1,024 bytes from the InputStream in into the array input. However, if only 512 bytes are available, then that’s all that will be read, and bytesRead will be set to 512. To guarantee that all the bytes you want are actually read, you must place the read in a loop that reads repeatedly until the array is filled. For example:

int bytesRead   = 0;
int bytesToRead = 1024;
byte[] input    = new byte[bytesToRead];
while (bytesRead < bytesToRead) {
  bytesRead += in.read(input, bytesRead, bytesToRead - bytesRead);
}

This technique is especially crucial for network streams. Chances are that if a file is available at all, then all the bytes of a file are also available. However, since networks move much more slowly than CPUs, it is very easy for a program to empty a network buffer before all the data has arrived. In fact, if one of these two methods tries to read from a temporarily empty but open network buffer, it will generally return 0, indicating that no data is available but the stream is not yet closed. This is often preferable to the behavior of the single-byte read( ) method, which in the same circumstances will block execution of the running program.

All three read( ) methods return -1 to signal the end of the stream. If the stream ends while there’s still data that hasn’t been read, then the multibyte read methods will return that data until the buffer has been emptied. The next call to any of the read methods will return -1. The -1 is never placed in the array. The array contains only actual data. The previous code fragment had a bug because it didn’t consider the possibility that all 1,024 bytes might never arrive (as opposed to not being immediately available). Fixing that bug requires testing the return value of read( ) before adding it to bytesRead. For example:

int bytesRead=0;
int bytesToRead=1024;
byte[] input = new byte[bytesToRead];
while (bytesRead < bytesToRead) {
  int result = in.read(input, bytesRead, bytesToRead - bytesRead);
  if (result == -1) break;
  bytesRead += result;
}

If for some reason, you do not want to read until all the bytes you want are immediately available, you can use the available( ) method to determine how many bytes can be read without blocking. This is the minimum number of bytes you can read. You may in fact be able to read more, but you will be able to read at least as many bytes as available( ) suggests. For example:

int bytesAvailable = in.available(  );
byte[] input = new byte[bytesAvailable];
int bytesRead = in.read(input, 0, bytesAvailable);
// continue with rest of program immediately...

In this case, you can assert that bytesRead is exactly equal to bytesAvailable. You cannot, however, assert that bytesRead is greater than zero. It is possible that no bytes were available. On end of stream, available( ) returns 0. Generally, read(byte[] input, int offset, int length) returns -1 on end of stream; but if length is 0, then it will not notice the end of stream and will return instead.

On rare occasions, you may want to skip over data without reading it. The skip( ) method accomplishes this. It’s less useful on network connections than when reading from files. Network connections are sequential and overall quite slow so it’s not significantly more time-consuming to read data than to skip over it. Files are random access so that skipping can be implemented simply by repositioning a file pointer rather than processing each byte to be skipped.

As with output streams, once your program has finished with an input stream, it should close it by invoking its close( ) method. This releases any resources associated with the stream, such as file handles or ports. Once an input stream has been closed, further reads from it will throw IOExceptions. However, some kinds of streams may still allow you to do things with the object. For instance, you generally won’t want to get the message digest from a java.security.DigestInputStream until all the data has been read and the stream closed.

Marking and Resetting

The InputStream class also has three less commonly used methods that allow programs to back up and reread data they’ve already read. These are:

public void mark(int readAheadLimit)
public void reset(  ) throws IOException
public boolean markSupported(  )

To do this, you mark the current position in the stream with the mark( ) method. At a later point, you can reset the stream back to the marked position using the reset( ) method. Subsequent reads then return data starting from the marked position. However, you may not be able to reset back as far as you like. The number of bytes you can read from the mark and still reset is determined by the readAheadLimit argument to mark( ). If you try to reset back too far, an IOException will be thrown. Furthermore, there can be only one mark in a stream at any given time. Marking a second location erases the first mark.

Marking and resetting are usually implemented by storing every byte read from the marked position in an internal buffer. However, not all input streams support this. Thus, before trying to use marking and setting, you should check to see whether the markSupported( ) method returns true. If it does, the stream supports marking and resetting. Otherwise, mark( ) will do nothing and reset( ) will throw an IOException.

Note

In my opinion, this demonstrates very poor design. In practice, more streams don’t support marking and resetting than do. Attaching functionality to an abstract superclass that is not available to many, probably most, subclasses is a very poor idea. It would be better to place these three methods in a separate interface that could be implemented by those classes that provided this functionality. The disadvantage of this approach is that you couldn’t then invoke these methods on an arbitrary input stream of unknown type, but in practice you can’t do that anyway because not all streams support marking and resetting. Providing a method such as markSupported( ) to check for functionality at runtime is a more traditional, non-object-oriented solution to the problem. An object-oriented approach would embed this in the type system through interfaces and classes so that it could all be checked at compile time.

The only two input stream classes in java.io that always support marking are BufferedInputStream and ByteArrayInputStream. However, other input streams such as TelnetInputStream may support marking if they’re chained to a buffered input stream first.

Get Java Network Programming, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.