A large part of what network programs do is simple input and output, moving bytes from one system to another. Bytes are bytes; and to a large extent, reading data a server sends you is not all that different from reading a file. Sending text to a client is not all that different from writing a file. However, input and output (I/O) in Java is organized differently than it is in most other languages, such as C, Pascal, and C++. Consequently, I’d like to take one chapter to summarize Java’s unique approach to I/O.
I/O in Java is built on streams. Input streams read data. Output
streams write data. Different fundamental stream classes such as
java.io.FileInputStream
and
sun.net.TelnetOutputStream
read and write
particular sources of data. However, all fundamental output streams
have the same basic methods to write data and all fundamental input
streams use the same basic methods to read data. After a stream is
created, you can often ignore the details of exactly what it is
you’re reading or writing.
Filter streams can
be chained to either an input stream or an output stream. Filters can
modify the data as it’s read or written—for instance, by
encrypting or compressing it—or they can simply provide
additional methods for converting the data that’s read or
written into other formats. For instance, the
java.io.DataOutputStream
class provides a method
that converts an int
to four bytes and writes
those bytes onto its underlying output stream.
Finally, readers and writers can be chained to input and output streams to allow programs to read and write text (that is, characters) rather than bytes. Used properly, readers and writers can handle a wide variety of character encodings, including multibyte character sets such as SJIS and UTF-8.
Java’s
basic output class is java.io.OutputStream
:
public abstract class OutputStream
This class provides the fundamental methods needed to write data. These are:
public abstract void write(int b) throws IOException public void write(byte[] data) throws IOException public void write(byte[] data, int offset, int length) throws IOException public void flush( ) throws IOException public void close( ) throws IOException
Subclasses of OutputStream
use these methods to
write data onto particular media. For instance, a
FileOutputStream
uses these methods to write data
into a file. A TelnetOutputStream
uses these
methods to write data onto a network connection. A
ByteArrayOutputStream
uses these methods to write
data into an expandable byte array. But whichever medium you’re
writing to, you mostly use only these same five methods. Sometimes
you may not even know exactly what kind of stream you’re
writing onto. For instance, you won’t find
TelnetOutputStream
in the Java class library
documentation. It’s deliberately hidden inside the
sun
packages. It’s returned by various
methods in various classes in java.net
, like the
getOutputStream( )
method of
java.net.Socket
. However, these methods are
declared to return only OutputStream
, not the more
specific subclass TelnetOutputStream
. That’s
the power of polymorphism. If you know how to use the superclass, you
know how to use all the subclasses
too.
OutputStream
’s fundamental method is
write(int b)
. This method takes as an argument an
integer from
to 255 and writes the corresponding byte to the output stream. This
method is declared abstract because subclasses will need to change it
to handle their particular medium. For instance, a
ByteArrayOutputStream
can implement this method
with pure Java code that copies the byte into its array. However, a
FileOutputStream
will need to use native code that
understands how to write data in files on the host platform.
Take special care
to note that although this method takes an int
as
an argument, it actually writes an unsigned byte. Java doesn’t
have an unsigned byte data type, so an int
has to
be used here instead. The only real difference between an unsigned
byte and a signed byte is the interpretation. They’re both made
up of eight bits, and when you write an int
onto a
network connection using write(int b)
, only eight
bits are placed on the wire. If an int
outside the
range 0-255 is passed to write(int b)
, the least
significant byte of the number is written, and the remaining three
bytes are ignored. (This is the effect of casting an
int
to a byte
.) On rare
occasion, however, you may find a buggy third-party class that does
something different, such as throwing an
IllegalArgumentException
or always writing 255, so
it’s best not to rely on this behavior if possible.
For example, the character generator protocol defines a server that sends out ASCII text. The most popular variation of this protocol sends 72-character lines containing printable ASCII characters. (The printable ASCII characters are those from 33 to 126 that exclude the various whitespace and control characters.) The first line contains characters 33 through 104 sorted. The second line contains characters 34 through 105. The third line contains characters 35 through 106. This continues through line 29, which contains characters 55 through 126. At that point, the characters wrap around so that line 30 contains characters 56 through 126 followed by character 33 again. Lines are terminated with a carriage return (ASCII 13) and a linefeed (ASCII 10). The output looks like this:
!"#$%&'( )*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefgh "#$%&'( )*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghi #$%&'( )*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghij $%&'( )*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijk %&'( )*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijkl &'( )*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklm '( )*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmn
Since ASCII is a 7-bit character set, each character is sent as a
single byte. Consequently, this protocol is straightforward to
implement using the basic write( )
methods as the
next code fragment demonstrates:
public static void generateCharacters(OutputStream out) throws IOException { int firstPrintableCharacter = 33; int numberOfPrintableCharacters = 94; int numberOfCharactersPerLine = 72; int start = firstPrintableCharacter; while (true) { /* infinite loop */ for (int i = start; i < start+numberOfCharactersPerLine; i++) { out.write(( (i-firstPrintableCharacter) % numberOfPrintableCharacters) + firstPrintableCharacter); } out.write('\r'); // carriage return out.write('\n'); // linefeed start = ((start+1) - firstPrintableCharacter) % numberOfPrintableCharacters + firstPrintableCharacter; }
The character generator server class (the exact details of which will
have to wait until we discuss server sockets in Chapter 11) passes an OutputStream
named out
to the generateCharacters( )
method. Bytes are written onto out
one
at a time. These bytes are given as integers in a rotating sequence
from 33 to 126. Most of the arithmetic here is to make the loop
rotate in that range. After each 72 characters are written, a
carriage return and a linefeed are written onto the output stream.
The next start character is calculated and the loop repeats. The
entire method is declared to throw IOException
.
That’s important because the character generator server will
terminate only when the client closes the connection. The Java code
will see this as an IOException
.
Writing a single byte at a time is often inefficient. For example,
every TCP segment that goes out your Ethernet card contains at least
40 bytes of overhead for routing and error correction. If each byte
is sent by itself, then you may be filling the wire with 41 times
more data than you think you are! Consequently, most TCP/IP
implementations buffer data to some extent. That is, they accumulate
bytes in memory and send them to their eventual destination only when
a certain number have accumulated or a certain amount of time has
passed. However, if you have more than one byte ready to go,
it’s not a bad idea to send them all at once. Using
write(byte[] data)
or write(byte[] data, int
offset, int length)
is normally much
faster than writing all the components of the data
array one at a time. For instance, here’s an implementation of
the generateCharacters( )
method that sends a line
at a time by stuffing a complete line into a byte array:
public static void generateCharacters(OutputStream out) throws IOException { int firstPrintableCharacter = 33; int numberOfPrintableCharacters = 94; int numberOfCharactersPerLine = 72; int start = firstPrintableCharacter; byte[] line = new byte[numberOfCharactersPerLine+2]; // the +2 is for the carriage return and linefeed while (true) { /* infinite loop */ for (int i = start; i < start+numberOfCharactersPerLine; i++) { line[i-start] = (byte) ((i-firstPrintableCharacter) % numberOfPrintableCharacters + firstPrintableCharacter); } line[72] = (byte) '\r'; // carriage return line[73] = (byte) '\n'; // line feed out.write(line); start = ((start+1)-firstPrintableCharacter) % numberOfPrintableCharacters + firstPrintableCharacter; } }
The algorithm for calculating which bytes to write when is the same
as for the previous implementation. The crucial difference is that
the bytes are all stuffed into a byte array before being written onto
the network. Also notice that the int
result of
the calculation must be cast to a byte
before
being stored in the array. This wasn’t necessary in the
previous implementation because the single byte write( )
method is declared to take an int
as
an argument.
Streams
can also be buffered in software, directly in the Java code as well
as in the network hardware. Typically, this is accomplished by
chaining a BufferedOutputStream
or a
BufferedWriter
to the underlying stream, a
technique we’ll explore shortly. Consequently, if you are done
writing data, it’s important to flush the output stream. For
example, suppose you’ve written a 300-byte request to an HTTP
1.1 server that uses HTTP Keep-Alive. You generally want to wait for
a response before sending any more data. However, if the output
stream has a 1,024-byte buffer, then the stream may be waiting for
more data to arrive before it sends the data out of its buffer. No
more data will be written onto the stream until after the server
response arrives, but that’s never going to arrive because the
request hasn’t yet been sent! The buffered stream won’t
send the data to the server until it gets more data from the
underlying stream, but the underlying stream won’t send more
data until it gets data from the server, which won’t send data
until it gets the data that’s stuck in the buffer! Figure 4.1 illustrates this Catch-22. The flush( )
method rescues you from this deadlock by forcing the
buffered stream to send its data even if the buffer isn’t yet
full.
It’s important to flush whether you think you need to or not.
Depending on how you got hold of a reference to the stream, you may
or may not know whether it’s buffered. (For instance,
System.out
is buffered whether you want it to be
or not.) If flushing isn’t necessary for a particular stream,
it’s a very low cost operation. However, if it is necessary,
it’s very necessary. Failing to flush when you need to can lead
to unpredictable, unrepeatable program hangs that are extremely hard
to diagnose if you don’t have a good idea of what the problem
is in the first place. As a corollary to all this, you should flush
all streams immediately before you close them. Otherwise, data left
in the buffer when the stream is closed may get lost.
Finally, when you’re done with a
stream, you should close it by invoking its close( )
method. This releases any resources associated with the
stream, such as file handles or ports. Once an output stream has been
closed, further writes to it will throw
IOException
s. However, some kinds of streams may
still allow you to do things with the object. For instance, a closed
ByteArrayOutputStream
can still be converted to an
actual byte array and a closed DigestOutputStream
can still return its digest.
Get Java Network Programming, Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.