InputStream
and OutputStream
are fairly raw classes. They
allow you to read and write bytes, either singly or in groups, but
that’s all. Deciding what those bytes mean—whether
they’re integers or IEEE 754 floating point numbers or Unicode
text—is completely up to the programmer and the code. However,
there are certain data formats that are extremely common and can
benefit from a solid implementation in the class library. For
example, many integers passed as parts of network protocols are
32-bit big-endian integers. Much text sent over the Web is either
7-bit ASCII or 8-bit Latin-1. Many files transferred by
ftp are stored in the zip format. Java provides
a number of filter classes you can attach to raw streams to translate
the raw bytes to and from these and other formats.
The filters come in two versions: the filter streams and the readers
and writers. The filter streams still work primarily with raw data as
bytes, for instance, by compressing the data or interpreting it as
binary numbers. The readers and writers handle the special case of
text in a variety of encodings such as UTF-8 and ISO 8859-1. Filter
streams are placed on top of raw streams such as a
TelnetInputStream
or a
FileOutputStream
or other filter streams. Readers
and writers can be layered on top of raw streams, filter streams, or
other readers and writers. However, filter streams cannot be placed
on top of a reader or a writer, so we’ll start here with filter
streams and address readers and writers in the next section.
Filters are organized in a chain as shown in Figure 4.2. Each link in the chain receives data from the
previous filter or stream and passes the data along to the next link
in the chain. In this example, a compressed, encrypted text file
arrives from the local network interface, where native code presents
it to the undocumented TelnetInputStream
. A
BufferedInputStream
buffers the data to speed up
the entire process. A CipherInputStream
decrypts
the data. A GZIPInputStream
decompresses the
deciphered data. An InputStreamReader
converts the
decompressed data to Unicode text. Finally, the text is read into the
application and processed.
Every filter output stream has the same write( )
,
close( )
, and flush( )
methods
as java.io.OutputStream
. Every filter input stream
has the same read( )
, close( )
,
and available( )
methods as
java.io.InputStream
. In some cases, such as
BufferedInputStream
and
BufferedOutputStream
, these may be the only
methods they have. The filtering is purely internal and does not
expose any new public interface. However, in most cases, the filter
stream adds public methods with additional purposes. Sometimes these
are intended to be used in addition to the usual read( )
and write( )
methods as with the
unread( )
method of
PushbackInputStream
. At other times, they almost
completely replace the original interface. For example, it’s
relatively rare to use the write( )
method of
PrintStream
instead of one of its print( )
and println( )
methods.
Filters are connected to streams by their constructor. For example,
the following code fragment buffers input from the file
data.txt
. First a
FileInputStream
object fin
is
created by passing the name of the file as an argument to the
FileInputStream
constructor. Then a
BufferedInputStream
object bin
is created by passing fin
as an argument to the
BufferedInputStream
constructor:
FileInputStream fin = new FileInputStream("data.txt"); BufferedInputStream bin = new BufferedInputStream(fin);
From this point forward, it’s possible to use the
read( )
methods of both fin
and
bin
to read data from the file
data.txt
. However, intermixing calls to
different streams connected to the same source may violate several
implicit contracts of the filter streams. Consequently, most of the
time you should use only the last filter in the chain to do the
actual reading or writing. One way to write your code so that
it’s at least harder to introduce this sort of bug is to
deliberately lose the reference to the underlying input stream. For
example:
InputStream in = new FileInputStream("data.txt"); in = new BufferedInputStream(in);
After these two lines execute, there’s no longer any way to
access the underlying file input stream, so you can’t
accidentally read from it and corrupt the buffer. This example works
because it’s not necessary to distinguish between the methods
of InputStream
and those of
BufferedInputStream
.
BufferedInputStream
is simply used polymorphically
as an instance of InputStream
in the first place.
In those cases where it is necessary to use the additional methods of
the filter stream not declared in the superclass, you may be able to
construct one stream directly inside another. For example:
DataOutputStream dout = new DataOutputStream(new BufferedOutputStream( new FileOutputStream("data.txt")));
Although these statements can get a little long, it’s easy to split the statement across several lines like this:
DataOutputStream dout = new DataOutputStream( new BufferedOutputStream( new FileOutputStream("data.txt") ) );
There are times when you may need to use the methods of multiple
filters in a chain. For instance, if you’re reading a Unicode
text file, you may want to read the byte order mark in the first
three bytes to determine whether the file is encoded as big-endian
UCS-2, little-endian UCS-2, or UTF-8 and then select the matching
Reader
filter for the encoding. Or if you’re
connecting to a web server, you may want to read the MIME header the
server sends to find the Content-encoding
and then
use that content encoding to pick the right Reader
filter to read the body of the response. Or perhaps you want to send
floating point numbers across a network connection using a
DataOutputStream
and then retrieve a
MessageDigest
from the
DigestOutputStream
that the
DataOutputStream
is chained to. In all these
cases, you do need to save and use references to each of the
underlying streams. However, under no circumstances should you ever
read from or write to anything other than the last filter in the
chain.
The
BufferedOutputStream
class stores written data in
a buffer (a protected byte array field named buf
)
until the buffer is full or the stream is flushed. Then it writes the
data onto the underlying output stream all at once. A single write of
many bytes is almost always much faster than many small writes that
add up to the same thing. This is especially true of network
connections because each TCP segment or UDP packet carries a finite
amount of overhead, generally about 40 bytes’ worth. This means
that sending 1 kilobyte of data 1 byte at a time actually requires
sending 40 kilobytes over the wire whereas sending it all at once
only requires sending a little more than 1K of data. Most network
cards and TCP implementations provide some level of buffering
themselves, so the real numbers aren’t quite this dramatic.
Nonetheless, buffering network output is generally a huge performance
win.
The BufferedInputStream
class also has a protected
byte array named buf
that servers as a buffer.
When one of the stream’s read( )
methods is
called, it first tries to get the requested data from the buffer.
Only when the buffer runs out of data does the stream read from the
underlying source. At this point, it reads as much data as it can
from the source into the buffer whether it needs all the data
immediately or not. Data that isn’t used immediately will be
available for later invocations of read( )
. When
reading files from a local disk, it’s almost as fast to read
several hundred bytes of data from the underlying stream as it is to
read one byte of data. Therefore, buffering can substantially improve
performance. The gain is less obvious on network connections where
the bottleneck is often the speed at which the network can deliver
data rather than either the speed at which the network interface
delivers data to the program or the speed at which the program runs.
Nonetheless, buffering input rarely hurts and will become more
important over time as network speeds increase.
BufferedInputStream
has two constructors, as does
BufferedOutputStream
:
public BufferedInputStream(InputStream in) public BufferedInputStream(InputStream in, int bufferSize) public BufferedOutputStream(OutputStream out) public BufferedOutputStream(OutputStream out, int bufferSize)
The first argument is the underlying stream from which unbuffered data will be read or to which buffered data will be written. The second argument, if present, specifies the number of bytes in the buffer. Otherwise, the buffer size is set to 2,048 bytes for an input stream and 512 bytes for an output stream. The ideal size for a buffer depends on what sort of stream you’re buffering. For network connections, you want something a little larger than the typical packet size. However, this can be hard to predict and varies depending on local network connections and protocols. Faster, higher bandwidth networks tend to use larger packets, though eight kilobytes is an effective maximum packet size for UDP on most networks today, and TCP segments are often no larger than a kilobyte.
BufferedInputStream
does not declare any new
methods of its own. It only overrides methods from
InputStream
. It does support marking and
resetting. For example:
public synchronized int read( ) throws IOException public synchronized int read(byte[] input, int offset, int length) throws IOException public synchronized long skip(long n) throws IOException public synchronized int available( ) throws IOException public synchronized void mark(int readLimit) public synchronized void reset( ) throws IOException public boolean markSupported( )
Starting in Java 1.2, the two multibyte read( )
methods attempt to completely fill the specified array or subarray of
data by reading from the underlying input stream as many times as
necessary. They return only when the array or subarray has been
completely filled, the end of stream is reached, or the underlying
stream would block on further reads. Most input streams (including
buffered input streams in Java 1.1.x and earlier) do not behave like
this. They read from the underlying stream or data source only once
before returning.
BufferedOutputStream
also does not declare any new
methods of its own. It overrides three methods from
OutputStream
:
public synchronized void write(int b) throws IOException public synchronized void write(byte[] data, int offset, int length) throws IOException public synchronized void flush( ) throws IOException
You call these methods exactly as you would for any output stream. The difference is that each write places data in the buffer rather than directly on the underlying output stream. Consequently, it is essential to flush the stream when you reach a point at which the data needs to be sent.
The PrintStream
class is the first filter output stream
most programmers encounter because System.out
is a
PrintStream
. However, other output streams can
also be chained to print streams, using these two constructors:
public PrintStream(OutputStream out) public PrintStream(OutputStream out, boolean autoFlush)
By default, print streams should be explicitly flushed. However, if
the autoFlush
argument is true, then the stream
will be flushed every time a byte array or linefeed is written or a
println( )
method is invoked.
As well as the usual write( )
, flush( )
, and close( )
methods,
PrintStream
has 9 overloaded print( )
methods and 10 overloaded println( )
methods:
public void print(boolean b) public void print(char c) public void print(int i) public void print(long l) public void print(float f) public void print(double d) public void print(char[] text) public void print(String s) public void print(Object o) public void println( ) public void println(boolean b) public void println(char c) public void println(int i) public void println(long l) public void println(float f) public void println(double d) public void println(char[] text) public void println(String s) public void println(Object o)
Each print( )
method converts its argument to a
string in a semipredictable fashion and writes the string onto the
underlying output stream using the default encoding. The
println( )
methods do the same thing, but they
also append a platform-dependent line separator character to the end
of the line they write. This is a linefeed (\n
) on
Unix, a carriage return (\r
) on the Mac, and a
carriage return/linefeed pair (\r\n
) on Windows.
The first problem is that the output from println( )
is platform-dependent. Depending on what system runs your
code, your lines may sometimes be broken with a linefeed, a carriage
return, or a carriage return/linefeed pair. This doesn’t cause
problems when writing to the console, but it’s a disaster for
writing network clients and servers that must follow a precise
protocol. Most network protocols such as HTTP specify that lines
should be terminated with a carriage return/linefeed pair. Using
println( )
makes it easy to write a program that
works on Windows but fails on Unix and the Mac. While many servers
and clients are liberal in what they accept and can handle incorrect
line terminators, there are occasional exceptions. In particular, in
conjunction with the bug in readLine( )
discussed
shortly, a client running on a Mac that uses println( )
may hang both the server and the client. To some extent,
this could be fixed by using only print( )
and
ignoring println( )
. However,
PrintStream
has other problems.
The second problem with PrintStream
is that it
assumes the default encoding of the platform on which it’s
running. However, this encoding may not be what the server or client
expects. For example, a web browser receiving XML files will expect
them to be encoded in UTF-8 or raw Unicode unless the server tells it
otherwise. However, a web server that uses
PrintStream
may well send them encoded in CP1252
from a U.S.-localized Windows system or SJIS from a
Japanese-localized system, whether the client expects or understands
those encodings or not. PrintStream
doesn’t
provide any mechanism to change the default encoding. This problem
can be patched over by using the related
PrintWriter
class instead. But the problems
continue.
The third problem is that PrintStream
eats all
exceptions. This makes PrintStream
suitable for
simple textbook programs such as HelloWorld, since simple console
output can be taught without burdening students with first learning
about exception handling and all that implies. However, network
connections are much less reliable than the console. Connections
routinely fail because of network congestion, phone company
misfeasance, remote systems crashing, and many more reasons. Network
programs must be prepared to deal with unexpected interruptions in
the flow of data. The way to do this is by handling exceptions.
However, PrintStream
catches any exceptions thrown
by the underlying output stream.
Notice that the declaration of
the standard five OutputStream
methods in
PrintStream
does not have the usual
throws
IOException
declaration:
public abstract void write(int b) public void write(byte[] data) public void write(byte[] data, int offset, int length) public void flush( ) public void close( )
Instead, PrintStream
relies on an outdated and
inadequate error flag. If the underlying stream throws an exception,
this internal error flag is set. The programmer is relied upon to
check the value of the flag using the checkError( )
method:
public boolean checkError( )
If programmers are to do any error checking at all on a
PrintStream
, they must explicitly check every
call. Furthermore, once an error has occurred, there is no way to
unset the flag so further errors can be detected. Nor is any
additional information available about what the error was. In short,
the error notification provided by PrintStream
is
wholly inadequate for unreliable network connections. At the end of
this chapter, we’ll introduce a class that fixes all these
shortcomings.
PushbackInputStream
is a subclass of
FilterInputStream
that provides a pushback stack
so that a program can “unread” bytes onto the input
stream. The HTTP protocol handler in Java 1.2 uses
PushbackInputStream
. You might also use it when
you need to check something a little way into the stream, then back
up. For instance, if you were reading an XML document, you might want
to read just far enough into the header to locate the encoding
declaration that tells you what character set the document uses, then
push all the read data back onto the input stream and start over with
a reader configured for that character set.
The read( )
and available( )
methods of PushbackInputStream
are invoked exactly
as with normal input streams. However, they first attempt to read
from the pushback buffer before reading from the underlying input
stream. What this class adds is unread( )
methods that push bytes into the
buffer:
public void unread(int b) throws IOException
This method pushes an unsigned byte given as an
int
between
and 255 onto the stream. Integers outside this range are truncated to
this range as by a cast to byte
. Assuming nothing
else is pushed back onto this stream, the next read from the stream
will return that byte. As multiple bytes are pushed onto the stream
by repeated invocations of unread( )
, they are
stored in a stack and returned in a last-in, first-out order. In
essence, the buffer is a stack sitting on top of an input stream.
Only when the stack is empty will the underlying stream be read.
There are two more unread( )
methods that push a
specified array or subarray onto the stream:
public void unread(byte[] input) throws IOException public void unread(byte[] input, int offset, int length) throws IOException
The arrays are stacked in last-in, first-out order. However, bytes pushed from the same array will be returned in the order they appeared in the array. That is, the zeroth component of the array will be read before the first component of the array.
By default, the buffer is only one byte long, and trying to unread
more than one byte throws an IOException
. However,
the buffer size can be changed with the second constructor as
follows:
public PushbackInputStream(InputStream in) public PushbackInputStream(InputStream in, int size)
Although PushbackInputStream
and
BufferedInputStream
both use buffers,
BufferedInputStream
uses them for data read from
the underlying input stream, while
PushbackInputStream
uses them for arbitrary data,
which may or may not, have been read from the stream originally.
Furthermore, PushbackInputStream
does not allow
marking and resetting. The markSupported( )
method
of PushbackInputStream
returns false.
The
DataInputStream
and
DataOutputStream
classes provide methods for
reading and writing Java’s primitive data types and strings in
a binary format. The binary formats used are primarily intended for
exchanging data between two different Java programs whether through a
network connection, a data file, a pipe, or some other intermediary.
What a data output stream writes, a data input stream can read.
However, it happens that the formats used are the same ones used for
most Internet protocols that exchange binary numbers. For instance,
the time protocol uses 32-bit big-endian integers, just like
Java’s int
data type. The controlled-load
network element service uses 32-bit IEEE 754 floating point numbers,
just like Java’s float
data type. (This is
probably correlation rather than causation. Both Java and most
network protocols were designed by Unix developers, and consequently
both tend to use the formats common to most Unix systems.) However,
this isn’t true for all network protocols, so you should check
details for any protocol you use. For instance, the Network Time
Protocol (NTP) represents times as 64-bit unsigned fixed point
numbers with the integer part in the first 32 bits and the fraction
part in the last 32 bits. This doesn’t match any primitive data
type in any common programming language, though it is fairly
straightforward to work with, at least as far as is necessary for
NTP.
The DataOutputStream
class offers these 11 methods
for writing particular Java data types:
public final void writeBoolean(boolean b) throws IOException public final void writeByte(int b) throws IOException public final void writeShort(int s) throws IOException public final void writeChar(int c) throws IOException public final void writeInt(int i) throws IOException public final void writeLong(long l) throws IOException public final void writeFloat(float f) throws IOException public final void writeDouble(double d) throws IOException public final void writeChars(String s) throws IOException public final void writeBytes(String s) throws IOException public final void writeUTF(String s) throws IOException
All data is written in big-endian format. Integers are written in
two’s complement in the minimum number of bytes possible. Thus
a byte
is written as one two’s-complement
byte, a short
as two two’s-complement bytes,
an int
as four two’s-complement bytes, and a
long
as eight two’s-complement bytes. Floats
and doubles are written in IEEE 754 form in 4 and 8 bytes,
respectively. Booleans are written as a single byte with the value
for false and 1 for true. Chars are written as two unsigned bytes.
The last three methods are a little trickier. The
writeChars( )
method simply iterates
through the String
argument, writing each
character in turn as a 2-byte, big-endian Unicode character. The
writeBytes( )
method iterates through the
String
argument but writes only the least
significant byte of each character. Thus information will be lost for
any string with characters from outside the Latin-1 character set.
This method may be useful on some network protocols that specify the
ASCII encoding, but it should be avoided most of the time.
Neither writeChars( )
nor writeBytes( )
encodes the length of the string in the output stream.
Consequently, you can’t really distinguish between raw
characters and characters that make up part of a string. The
writeUTF( )
method does include the
length of the string. It encodes the string itself in a
variant of UTF-8 rather than raw Unicode. Since
writeUTF( )
uses a variant of UTF-8 that’s
subtly incompatible with most non-Java software, it should be used
only for exchanging data with other Java programs that use a
DataInputStream
to read strings. For exchanging
UTF-8 text with all other software, you should use an
InputStreamReader
with the appropriate encoding.
(There wouldn’t be any confusion if Sun had just called this
method and its partner writeString( )
and
readString( )
rather than writeUTF( )
and readUTF( )
.)
As well as these methods to write binary numbers,
DataOutputStream
also overrides three of the
customary OutputStream
methods:
public void write(int b) public void write(byte[] data, int offset, int length) public void flush( )
These are invoked in the usual fashion with the usual semantics.
DataInputStream
is the complementary class to
DataOutputStream
. Every format that
DataOutputStream
writes,
DataInputStream
can read. In addition,
DataInputStream
has the usual read( )
, available( )
, skip( )
, and close( )
methods as well as
methods for reading complete arrays of bytes and lines of text.
There are 9 methods to read binary data that match the 11 methods in
DataOutputStream
(there’s no exact
complement for writeBytes( )
and
writeChars( )
; these are handled by reading the
bytes and chars one at a time):
public final boolean readBoolean( ) throws IOException public final byte readByte( ) throws IOException public final char readChar( ) throws IOException public final short readShort( ) throws IOException public final int readInt( ) throws IOException public final long readLong( ) throws IOException public final float readFloat( ) throws IOException public final double readDouble( ) throws IOException public final String readUTF( ) throws IOException
In addition, DataInputStream
provides two methods
to read unsigned bytes and unsigned shorts and return the equivalent
int
. Java doesn’t have either of these data
types, but you may encounter them when reading binary data written by
a C program:
public final int readUnsignedByte( ) throws IOException public final int readUnsignedShort( ) throws IOException
DataInputStream
has the usual two multibyte
read( )
methods that read data into an array or
subarray and return the number of bytes read. It also has two
readFully( )
methods that repeatedly read
data from the underlying input stream into an array until the
requested number of bytes have been read. If enough data cannot be
read, then an IOException
is thrown. These methods
are especially useful when you know in advance exactly how many bytes
you have to read. This might be the case when you’ve read the
Content-length
field out of an HTTP MIME header
and thus know how many bytes of data there are:
public final int read(byte[] input) throws IOException public final int read(byte[] input, int offset, int length) throws IOException public final void readFully(byte[] input) throws IOException public final void readFully(byte[] input, int offset, int length) throws IOException
Finally, DataInputStream
provides the popular
readLine( )
method that reads a line of text as
delimited by a line terminator and returns a string:
public final String readLine( ) throws IOException
However, this method should not be used under any circumstances, both
because it is deprecated and because it is buggy. It’s
deprecated because it doesn’t properly convert non-ASCII
characters to bytes in most circumstances. That task is now handled
by the readLine( )
method of the
BufferedReader
class. However, both that method
and this one share the same insidious bug: they do not always
recognize a single carriage return as ending a line. Rather,
readLine( )
recognizes only a linefeed or a
carriage return/linefeed pair. When a carriage return is detected in
the stream, readLine( )
waits to see whether the
next character is a linefeed before continuing. If it is a linefeed,
then both the carriage return and the linefeed are thrown away, and
the line is returned as a String
. If it
isn’t a linefeed, then the carriage return is thrown away, the
line is returned as a String
, and the extra
character that was read becomes part of the next line. However, if
the carriage return is the last character in the stream (a very
likely occurrence if the stream originates from a Macintosh or a file
created on a Macintosh), then readLine( )
hangs,
waiting for the last character that isn’t forthcoming.
This problem isn’t so obvious when reading files because there
will almost certainly be a next character, -1 for end of stream if
nothing else. However, on persistent network connections such as
those used for FTP and late-model HTTP, a server or client may simply
stop sending data after the last character and wait for a response
without actually closing the connection. If you’re lucky, the
connection may eventually time out on one end or the other and
you’ll get an IOException
, though this will
probably take at least a couple of minutes. If you’re not
lucky, the program will hang indefinitely.
Note that it is not enough for your program to merely be running on
Windows or Unix to avoid this bug. It must also ensure that it does
not send or receive text files created on a Macintosh and that it
never talks to Macintosh clients or servers. These are very strong
conditions in the heterogeneous world of the Internet. It is
obviously much simpler to avoid readLine( )
completely.
The
java.util.zip
package contains filter streams that
compress and decompress streams in zip, gzip, and deflate formats.
Besides its better-known uses with respect to files, this allows your
Java applications to easily exchange compressed data across the
network. HTTP 1.1 explicitly includes support for compressed file
transfer in which the server compresses and the browser decompresses
files, in effect trading increasingly cheap CPU power for
still-expensive network bandwidth. This is done completely
transparently to the user. Of course, it’s not at all
transparent to the programmer who has to write the compression and
decompression code. However, the java.util.zip
filter streams make it a lot more transparent than it otherwise would
be.
There are six stream classes that perform compression and decompression. The input streams decompress data and the output streams compress it. These are:
public class DeflaterOutputStream extends FilterOutputStream public class InflaterInputStream extends FilterInputStream public class GZIPOutputStream extends FilterOutputStream public class GZIPInputStream extends FilterInputStream public class ZipOutputStream extends FilterOutputStream public class ZipInputStream extends FilterInputStream
All of these use essentially the same compression algorithm. They differ only in various constants and meta-information included with the compressed data. In addition, a zip stream may contain more than one compressed file.
Compressing and decompressing data with these classes is almost
trivially easy. You simply chain the filter to the underlying stream
and read or write it like normal. For example, suppose you want to
read the compressed file allnames.gz
. You simply
open a FileInputStream
to the file and chain a
GZIPInputStream
to that like this:
FileInputStream fin = new FileInputStream("allnames.gz"); GZIPInputStream gzin = new GZIPInputStream(fin);
From that point forward, you can read uncompressed data from
gzin
using merely the usual read( )
, skip( )
, and available( )
methods. For instance, this code fragment reads and
decompresses a file named allnames.gz
in the
current working directory:
FileInputStream fin = new FileInputStream("allnames.gz"); GZIPInputStream gzin = new GZIPInputStream(fin); FileOutputStream fout = new FileOutputStream("allnames"); int b = 0; while ((b = gzin.read( )) != -1) fout.write(b); gzin.close( ); out.flush( ); out.close( );
In fact, it isn’t even necessary to know that
gzin
is a GZIPInputStream
for
this to work. A simple InputStream
type would work
equally well. For example:
InputStream in = new GZIPInputStream(new FileInputStream("allnames.gz"));
DeflaterOutputStream
and
InflaterInputStream
are equally straightforward.
ZipInputStream
and
ZipOutputStream
are a little more complicated
because a zip file is actually an archive that may contain multiple
entries, each of which must be read separately. Each file in a zip
archive is represented as a ZipEntry
object whose
getName( )
method returns the original name of the
file. For example, this code fragment decompresses the archive
shareware.zip
in the current working directory:
FileInputStream fin = new FileInputStream("shareware.zip"); ZipInputStream zin = new ZipInputStream(fin); ZipEntry ze = null; int b = 0; while ((ze = zin.getNextEntry( )) != null) { FileOutputStream fout = new FileOutputStream(ze.getName( )); while ((b = zin.read( )) != -1) fout.write(b); zin.closeEntry( ); fout.flush( ); fout.close( ); } zin.close( );
The
java.util.security
package contains two filter streams
that can calculate a message digest for a stream. They are
DigestInputStream
and
DigestOutputStream
. A message digest, represented
in Java by the java.util.security.MessageDigest
class, is a strong hash code for the stream; that is, it is a large
integer (typically 20 bytes long in binary format) that can easily be
calculated from a stream of any length in such a fashion that no
information about the stream is available from the message digest.
Message digests can be used for digital signatures and for detecting
data that has been corrupted in transit across the network.
In practice, the use of message digests in digital signatures is more
important. Mere data corruption can be detected with much simpler,
less computationally expensive algorithms. However, the digest filter
streams are so easy to use that at times it may be worth paying the
computational price for the corresponding increase in programmer
productivity. To calculate a digest for an output stream, you first
construct a MessageDigest
object that uses a
particular algorithm, such as
the Secure Hash Algorithm (SHA).
You pass both the MessageDigest
object and the
stream you want to digest to the
DigestOutputStream
constructor. This chains the
digest stream to the underlying output stream. Then you write data
onto the stream as normal, flush it, close it, and invoke the
getMessageDigest( )
method to retrieve the
MessageDigest
object. Finally you invoke the
digest( )
method on the
MessageDigest
object to finish calculating the
actual digest. For example:
MessageDigest sha = MessageDigest.getInstance("SHA"); DigestOutputStream dout = new DigestOutputStream(out, sha); byte[] buffer = new byte[128]; while (true) { int bytesRead = in.read(buffer); if (bytesRead < 0) break; dout.write(buffer, 0, bytesRead); } dout.flush( ); dout.close( ); byte[] result = dout.getMessageDigest().digest( );
Calculating the digest of an input stream you read is equally simple.
It still isn’t quite as transparent as some of the other filter
streams because you do need to be at least marginally conversant with
the methods of the MessageDigest
class.
Nonetheless, it’s still far easier than writing your own secure
hash function and manually feeding it each byte you write.
Of course, you also need a way of associating a particular message
digest with a particular stream. In some circumstances, the digest
may be sent over the same channel used to send the digested data. The
sender can calculate the digest as it sends data, while the receiver
calculates the digest as it receives the data. When the sender is
done, it sends some signal that the receiver recognizes as indicating
end of stream and then sends the digest. The receiver receives the
digest, checks that the digest received is the same as the one
calculated locally, and closes the connection. If the digests
don’t match, the receiver may instead ask the sender to send
the message again. Alternatively, both the digest and the files it
digests may be stored in the same zip archive. And there are many
other possibilities. Situations like this generally call for the
design of a relatively formal custom protocol. However, while the
protocol may be complicated, the calculation of the digest is
straightforward, thanks to the DigestInputStream
and DigestOutputStream
filter classes.
Not all filter streams are part of the
core Java API. For legal reasons, the filters for encrypting and
decrypting data, CipherInputStream
and
CipherOutputStream
, are part of a standard
extension to Java called the Java Cryptography Extension, JCE for
short. This is in the javax.crypto
package. Sun
provides an implementation of this API in the U.S. and Canada
available from
http://java.sun.com/products/jce/, and various
third parties have written independent implementations that are
available worldwide. Of particular note is the more or less Open
Source Cryptix package, which can be retrieved from
http://www.cryptix.org/.
The
CipherInputStream
and
CipherOutputStream
classes are both powered by a
Cipher
engine object that encapsulates the
algorithm used to perform encryption and decryption. By changing the
Cipher
engine object, you change the algorithm
that the streams use to encrypt and decrypt. Most ciphers also
require a key that’s used to encrypt and decrypt the data.
Symmetric or secret key ciphers use the same key for both encryption
and decryption. Asymmetric or public key ciphers use the different
keys for encryption and decryption. The encryption key can be
distributed as long as the decryption key is kept secret. Keys are
specific to the algorithm in use, and are represented in Java by
instances of the java.security.Key
interface. The
Cipher
object is set in the constructor. Like all
filter stream constructors, these constructors also take another
input stream as an argument:
public CipherInputStream(InputStream in, Cipher c) public CipherOutputStream(InputStream in, Cipher c)
To get a properly initialized Cipher
object, you
use the static Cipher.getInstance( )
factory method. This
Cipher
object must be initialized for either
encryption or decryption with init( )
before being
passed into one of the previous constructors. For example, this code
fragment prepares a CipherInputStream
for
decryption using the password “two and not a fnord” and
the Data Encryption Standard (DES) algorithm:
byte[] desKeyData = "two and not a fnord".getBytes( ); DESKeySpec desKeySpec = new DESKeySpec(desKeyData); SecretKeyFactory keyFactory = SecretKeyFactory.getInstance("DES"); SecretKey desKey = keyFactory.generateSecret(desKeySpec); Cipher des = Cipher.getInstance("DES"); des.init(Cipher.DECRYPT_MODE, desKey); CipherInputStream cin = new CipherInputStream(fin, des);
This fragment uses classes from the java.security
,
java.security.spec
,
javax.crypto
, and
javax.crypto.spec
packages. Different
implementations of the JCE support different groups of encryption
algorithms. Common algorithms include DES, RSA, and Blowfish. The
construction of a key is generally algorithm specific. Consult the
documentation for your JCE implementation for more details.
CipherInputStream
overrides most of the normal
InputStream
methods like read( )
and available( )
.
CipherOutputStream
overrides most of the usual
OutputStream
methods like write( )
and flush( )
. These methods are all
invoked much as they would be for any other stream. However, as the
data is read or written, the stream’s Cipher
object either decrypts or encrypts the data. (Assuming your program
wants to work with unencrypted data as is most commonly the case, a
cipher input stream will decrypt the data, and a cipher output stream
will encrypt the data.) For example, this code fragment encrypts the
file secrets.txt
using the password “Mary
had a little spider”:
String infile = "secrets.txt"; String outfile = "secrets.des"; String password = "Mary had a little spider"; try { FileInputStream fin = new FileInputStream(infile); FileOutputStream fout = new FileOutputStream(outfile); // register the provider that implements the algorithm Provider sunJce = new com.sun.crypto.provider.SunJCE( ); Security.addProvider(sunJce); // create a key char[] pbeKeyData = password.toCharArray( ); PBEKeySpec pbeKeySpec = new PBEKeySpec(pbeKeyData); SecretKeyFactory keyFactory = SecretKeyFactory.getInstance("PBEWithMD5AndDES"); SecretKey pbeKey = keyFactory.generateSecret(pbeKeySpec); // use Data Encryption Standard Cipher pbe = Cipher.getInstance("PBEWithMD5AndDES"); pbe.init(Cipher.ENCRYPT_MODE, pbeKey); CipherOutputStream cout = new CipherOutputStream(fout, pbe); byte[] input = new byte[64]; while (true) { int bytesRead = fin.read(input); if (bytesRead == -1) break; cout.write(input, 0, bytesRead); } cout.flush( ); cout.close( ); fin.close( ); } catch (Exception e) { System.err.println(e); e.printStackTrace( ); }
I admit that this is more complicated than it needs to be.
There’s a lot of setup work involved in creating the
Cipher
object that actually performs the
encryption. Partly that’s a result of key generation involving
quite a bit more than a simple password. However, a large part of it
is also due to inane U.S. export laws that prevent Sun from fully
integrating the JCE with the JDK and JRE. To a large extent, the
complex architecture used here is driven by a need to separate the
actual encrypting and decrypting code from the cipher stream classes.
Get Java Network Programming, Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.