Java’s
basic input class is java.io.InputStream
:
public abstract class InputStream
This class provides the fundamental methods needed to read data as raw bytes. These are:
public abstract int read( ) throws IOException public int read(byte[] input) throws IOException public int read(byte[] input, int offset, int length) throws IOException public long skip(long n) throws IOException public int available( ) throws IOException public void close( ) throws IOException
Concrete subclasses of
InputStream
use these methods to read data from
particular media. For instance, a FileInputStream
reads data from a file. A TelnetInputStream
reads
data from a network connection. A
ByteArrayInputStream
reads data from an array of
bytes. But whichever source you’re reading, you mostly use only
these same six methods. Sometimes you may not even know exactly what
kind of stream you’re reading from. For instance,
TelnetInputStream
is an undocumented class hidden
inside the sun.net
package. Instances of it are
returned by various methods in the java.net
package; for example, the openStream( )
method of
java.net.URL
. However, these methods are declared
to return only InputStream
, not the more specific
subclass TelnetInputStream
. That’s
polymorphism at work once again. The instance of the subclass can be
used transparently as an instance of its superclass. No specific
knowledge of the subclass is required.
The basic method of InputStream
is the noargs
read( )
method. This method reads a single byte of
data from the input stream’s source and returns it as a number
from
to 255. End of stream is signified by returning -1. Since Java
doesn’t have an unsigned byte data type, this number is
returned as an int
. The read( )
method waits and blocks execution of any code that follows it until a
byte of data is available and ready to be read. Input and output can
be slow, so if your program is doing anything else of importance you
should try to put I/O in its own thread.
The read( )
method is declared abstract because
subclasses need to change it to handle their particular medium. For
instance, a ByteArrayInputStream
can implement
this method with pure Java code that copies the byte from its array.
However, a TelnetInputStream
will need to use a
native library that understands how to read data from the network
interface on the host platform.
The following code fragment reads 10 bytes from the
InputStream
in
and stores them
in the byte
array input
.
However, if end of stream is detected, the loop is terminated early:
byte[] input = new byte[10]; for (int i = 0; i < input.length; i++) { int b = in.read( ); if (b == -1) break; input[i] = (byte) b; }
Although read( )
reads only a byte, it returns an
int
. Thus, a cast is necessary before storing the
result in the byte array. Of course, this produces a signed byte from
-128 to 127 instead of the unsigned byte from
to 255 returned by the read( )
method. However, as
long as you keep clear which one you’re working with, this is
not a major problem. You can convert a signed byte to an unsigned
byte like this:
int i = b >= 0 ? b : 256 + b;
Reading a byte at a time is as inefficient as writing data one byte
at a time. Consequently, there are also two overloaded read( )
methods that fill a specified array with multiple bytes
of data read from the stream, read(byte[] input)
and read(byte[] input, int offset, int length)
.
The first attempts to fill the specified array
input
. The second attempts to fill the specified
subarray of input
starting at
offset
and continuing for
length
bytes.
Notice that I said these methods attempt to fill
the array, not necessarily that they succeed. An attempt may fail in
several ways. For instance, it’s not unheard of that while your
program is reading data from a remote web server over a PPP dialup
link, a bug in a switch in a phone company central office will
disconnect you and several thousand of your neighbors from the rest
of the world. This would throw an IOException
.
More commonly, however, a read attempt won’t completely fail
but won’t completely succeed either. Some of the requested
bytes may be read but not all of them. For example, you may try to
read 1,024 bytes from a network connection, when only 512 have
actually arrived from the server. The rest are still in transit.
They’ll arrive eventually, but they aren’t available now.
To account for this, the multibyte read methods return the number of
bytes actually read. For example, consider this code fragment:
byte[] input = new byte[1024]; int bytesRead = in.read(input);
It attempts to read 1,024 bytes from the
InputStream
in
into the array
input
. However, if only 512 bytes are available,
then that’s all that will be read, and
bytesRead
will be set to 512. To guarantee that
all the bytes you want are actually read, you must place the read in
a loop that reads repeatedly until the array is filled. For example:
int bytesRead = 0; int bytesToRead = 1024; byte[] input = new byte[bytesToRead]; while (bytesRead < bytesToRead) { bytesRead += in.read(input, bytesRead, bytesToRead - bytesRead); }
This technique is especially crucial for network streams. Chances are
that if a file is available at all, then all the bytes of a file are
also available. However, since networks move much more slowly than
CPUs, it is very easy for a program to empty a network buffer before
all the data has arrived. In fact, if one of these two methods tries
to read from a temporarily empty but open network buffer, it will
generally return 0, indicating that no data is available but the
stream is not yet closed. This is often preferable to the behavior of
the single-byte read( )
method, which in the same
circumstances will block execution of the running program.
All three read( )
methods return -1 to signal the
end of the stream. If the stream ends while there’s still data
that hasn’t been read, then the multibyte read methods will
return that data until the buffer has been emptied. The next call to
any of the read methods will return -1. The -1 is never placed in the
array. The array contains only actual data. The previous code
fragment had a bug because it didn’t consider the possibility
that all 1,024 bytes might never arrive (as opposed to not being
immediately available). Fixing that bug requires testing the return
value of read( )
before adding it to
bytesRead
. For example:
int bytesRead=0; int bytesToRead=1024; byte[] input = new byte[bytesToRead]; while (bytesRead < bytesToRead) { int result = in.read(input, bytesRead, bytesToRead - bytesRead); if (result == -1) break; bytesRead += result; }
If for some reason, you do not want to read until all the bytes you
want are immediately available, you can use the available( )
method to determine how many bytes can
be read without blocking. This is the minimum number of bytes you can
read. You may in fact be able to read more, but you will be able to
read at least as many bytes as available( )
suggests. For example:
int bytesAvailable = in.available( ); byte[] input = new byte[bytesAvailable]; int bytesRead = in.read(input, 0, bytesAvailable); // continue with rest of program immediately...
In this case, you can assert that bytesRead
is
exactly equal to bytesAvailable
. You cannot,
however, assert that bytesRead
is greater than
zero. It is possible that no bytes were available. On end of stream,
available( )
returns 0. Generally,
read(byte[] input, int offset, int length)
returns
-1 on end of stream; but if length
is 0, then it
will not notice the end of stream and will return
instead.
On rare occasions, you may want to skip over data without reading it.
The skip( )
method accomplishes this.
It’s less useful on network connections than when reading from
files. Network connections are sequential and overall quite slow so
it’s not significantly more time-consuming to read data than to
skip over it. Files are random access so that skipping can be
implemented simply by repositioning a file pointer rather than
processing each byte to be skipped.
As with output streams, once your program has finished with an input
stream, it should close it by invoking its close( )
method. This releases any resources associated with the
stream, such as file handles or ports. Once an input stream has been
closed, further reads from it will throw
IOException
s. However, some kinds of streams may
still allow you to do things with the object. For instance, you
generally won’t want to get the message digest from a
java.security.DigestInputStream
until all the data
has been read and the stream closed.
The InputStream
class also has three less commonly
used methods that allow programs to back up and reread data
they’ve already read. These are:
public void mark(int readAheadLimit) public void reset( ) throws IOException public boolean markSupported( )
To do this, you mark the current position in the stream with the
mark( )
method. At a later point, you can reset
the stream back to the marked position using the reset( )
method. Subsequent reads then return
data starting from the marked position. However, you may not be able
to reset back as far as you like. The number of bytes you can read
from the mark and still reset is determined by the
readAheadLimit
argument to mark( )
. If you try to reset back too far, an
IOException
will be thrown. Furthermore, there can
be only one mark in a stream at any given time. Marking a second
location erases the first mark.
Marking and resetting are usually implemented by storing every byte
read from the marked position in an internal buffer. However, not all
input streams support this. Thus, before trying to use marking and
setting, you should check to see whether the markSupported( )
method returns true. If it does,
the stream supports marking and resetting. Otherwise, mark( )
will do nothing and reset( )
will
throw an IOException
.
Note
In my opinion, this demonstrates very poor design. In practice, more streams don’t support marking and resetting than do. Attaching functionality to an abstract superclass that is not available to many, probably most, subclasses is a very poor idea. It would be better to place these three methods in a separate interface that could be implemented by those classes that provided this functionality. The disadvantage of this approach is that you couldn’t then invoke these methods on an arbitrary input stream of unknown type, but in practice you can’t do that anyway because not all streams support marking and resetting. Providing a method such as markSupported( )
to check for functionality at runtime is a more traditional, non-object-oriented solution to the problem. An object-oriented approach would embed this in the type system through interfaces and classes so that it could all be checked at compile time.
The only two input stream classes in java.io
that
always support marking are BufferedInputStream
and
ByteArrayInputStream
. However, other input streams
such as TelnetInputStream
may support marking if
they’re chained to a buffered input stream first.
Get Java Network Programming, Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.