We are now going to complete our introduction to core Java I/O
facilities by returning to the java.nio
package. The
name NIO stands for “New I/O” and, as we saw earlier in this chapter in
our discussion of java.nio.file
, one
aspect of NIO is simply to update and enhance features of the legacy
java.io
package. Much of the general
NIO functionality does indeed overlap with existing APIs. However, NIO was
first introduced to address specific issues of scalability for large
systems, especially in networked applications. The following section
outlines the basic elements of NIO, which center on working with
buffers and channels.
Most of the need for the NIO package was driven by the desire to add nonblocking and selectable I/O to Java. Prior to NIO, most read and write operations in Java were bound to threads and were forced to block for unpredictable amounts of time. Although certain APIs such as Sockets (which we’ll see in Chapter 13) provided specific means to limit how long an I/O call could take, this was a workaround to compensate for the lack of a more general mechanism. In many languages, even those without threading, I/O could still be done efficiently by setting I/O streams to a nonblocking mode and testing them for their readiness to send or receive data. In a nonblocking mode, a read or write does only as much work as can be done immediately—filling or emptying a buffer and then returning. Combined with the ability to test for readiness, this allows a single-threaded application to continuously service many channels efficiently. The main thread “selects” a stream that is ready and works with it until it blocks and then moves on to another. On a single-processor system, this is fundamentally equivalent to using multiple threads. It turns out that this style of processing has scalability advantages even when using a pool of threads (rather than just one). We’ll discuss this in detail in Chapter 13when we discuss networking and building servers that can handle many clients simultaneously.
In addition to nonblocking and selectable I/O, the NIO package enables closing and interrupting I/O operations asynchronously. As discussed in Chapter 9, prior to NIO there was no reliable way to stop or wake up a thread blocked in an I/O operation. With NIO, threads blocked in I/O operations always wake up when interrupted or when the channel is closed by anyone. Additionally, if you interrupt a thread while it is blocked in an NIO operation, its channel is automatically closed. (Closing the channel because the thread is interrupted might seem too strong, but usually it’s the right thing to do.)
Channel I/O is designed around the concept of buffers, which are a sophisticated form of array, tailored to working with communications. The NIO package supports the concept of direct buffers—buffers that maintain their memory outside the Java VM in the host operating system. Because all real I/O operations ultimately have to work with the host OS by maintaining the buffer space there, some operations can be made much more efficient. Data moving between two external endpoints can be transferred without first copying it into Java and back out.
NIO provides two general-purpose file-related features not
found in java.io
: memory-mapped files
and file locking. We’ll discuss memory-mapped files later, but suffice
it to say that they allow you to work with file data as if it were all
magically resident in memory. File locking supports the concept of
shared and exclusive locks on regions of files—useful for concurrent
access by multiple applications.
While java.io
deals
with streams, java.nio
works with
channels. A channel is an endpoint for
communication. Although in practice channels are similar to streams, the
underlying notion of a channel is more abstract and primitive. Whereas
streams in java.io
are defined in
terms of input or output with methods to read and write bytes, the basic
channel interface says nothing about how communications happen. It
simply has the notion of being open or closed, supported via the methods
isOpen()
and close()
. Implementations of channels for
files, network sockets, or arbitrary devices then add their own methods
for operations, such as reading, writing, or transferring data. The
following channels are provided by NIO:
FileChannel
Pipe.SinkChannel
,Pipe.SourceChannel
SocketChannel
,ServerSocketChannel
,DatagramChannel
We’ll cover FileChannel
in this
chapter. The Pipe
channels are simply
the channel equivalents of the java.io
Pipe
facilities. We’ll talk about Socket
and Datagram
channels in Chapter 13. Additionally, in Java 7 there are now
asynchronous versions of both the file and socket channels: AsynchronousFileChannel
, AsynchronousSocketChannel
, AsynchronousServerSocketChannel
, and AsynchronousDatagramChannel
. These
asynchronous versions essentially buffer all of their operations through
a thread pool and report results back through an asynchronous API. We’ll
talk about the asynchronous file channel later in this chapter.
All these basic channels implement the ByteChannel
interface, designed for channels
that have read and write methods like I/O streams. ByteChannel
s read and write ByteBuffer
s, however, as opposed to plain byte
arrays.
In addition to these channel implementations, you can bridge
channels with java.io
I/O streams and
readers and writers for interoperability. However, if you mix these
features, you may not get the full benefits and performance offered by
the NIO package.
Most of the utilities of the java.io
and java.net
packages operate on byte arrays. The
corresponding tools of the NIO package are built around ByteBuffer
s (with
character-based buffer CharBuffer
for text).
Byte arrays are simple, so why are buffers necessary? They serve several
purposes:
They formalize the usage patterns for buffered data, provide for things like read-only buffers, and keep track of read/write positions and limits within a large buffer space. They also provide a mark/reset facility like that of
java.io.BufferedInputStream
.They provide additional APIs for working with raw data representing primitive types. You can create buffers that “view” your byte data as a series of larger primitives, such as
short
s,int
s, orfloat
s. The most general type of data buffer,ByteBuffer
, includes methods that let you read and write all primitive types just likeDataOutputStream
does for streams.They abstract the underlying storage of the data, allowing for special optimizations by Java. Specifically, buffers may be allocated as direct buffers that use native buffers of the host operating system instead of arrays in Java’s memory. The NIO
Channel
facilities that work with buffers can recognize direct buffers automatically and try to optimize I/O to use them. For example, a read from a file channel into a Java byte array normally requires Java to copy the data for the read from the host operating system into Java’s memory. With a direct buffer, the data can remain in the host operating system, outside Java’s normal memory space until and unless it is needed.
A buffer is a subclass of a java.nio.Buffer
object. The base Buffer
class is
something like an array with state. It does not specify what type of
elements it holds (that is for subtypes to decide), but it does define
functionality that is common to all data buffers. A Buffer
has a fixed size called its
capacity. Although all the standard Buffer
s provide “random access” to their
contents, a Buffer
generally
expects to be read and written sequentially, so Buffer
s maintain the notion of a
position where the next element is read or
written. In addition to position, a Buffer
can maintain two other pieces of
state information: a limit, which is a position
that is a “soft” limit to the extent of a read or write, and a
mark, which can be used to remember an earlier
position for future recall.
Implementations of Buffer
add
specific, typed get and put methods that read and write the buffer
contents. For example, ByteBuffer
is a buffer of bytes and it has get()
and put()
methods that
read and write bytes and arrays of bytes (along with many other useful
methods we’ll discuss later). Getting from and putting to the Buffer
changes the position marker, so the
Buffer
keeps track of its contents
somewhat like a stream. Attempting to read or write past the limit
marker generates a BufferUnderflowException
or BufferOverflowException
,
respectively.
The mark, position, limit, and capacity values always obey the following formula:
mark
<=
position
<=
limit
<=
capacity
The position for reading and writing the Buffer
is always between the mark, which
serves as a lower bound, and the limit, which serves as an upper
bound. The capacity represents the physical extent of the buffer
space.
You can set the position and limit markers explicitly with the
position()
and
limit()
methods.
Several convenience methods are provided for common usage patterns.
The reset()
method sets
the position back to the mark. If no mark has been set, an InvalidMarkException
is thrown. The
clear()
method resets
the position to 0
and makes the
limit the capacity, readying the buffer for new data (the mark is
discarded). Note that the clear()
method does not actually do anything to the data in the buffer; it
simply changes the position markers.
The flip()
method is used for
the common pattern of writing data into the buffer and then reading it
back out. flip
makes the
current position the limit and then resets the current position to
0
(any mark is thrown away), which
saves having to keep track of how much data was read. Another method,
rewind()
, simply resets the
position to 0
, leaving the limit
alone. You might use it to write the same size data again. Here is a
snippet of code that uses these methods to read data from a channel
and write it to two channels:
ByteBuffer
buff
=
...
while
(
inChannel
.
read
(
buff
)
>
0
)
{
// position = ?
buff
.
flip
();
// limit = position; position = 0;
outChannel
.
write
(
buff
);
buff
.
rewind
();
// position = 0
outChannel2
.
write
(
buff
);
buff
.
clear
();
// position = 0; limit = capacity
}
This might be confusing the first time you look at it because
here, the read from the Channel
is
actually a write to the Buffer
and
vice versa. Because this example writes all the available data up to
the limit, either flip()
or
rewind()
have the same effect in
this case.
As stated earlier, various buffer types add get and put
methods for reading and writing specific data types. Each of the Java
primitive types has an associated buffer type: ByteBuffer
, CharBuffer
,
ShortBuffer
,
IntBuffer
, LongBuffer
,
FloatBuffer
, and
DoubleBuffer
. Each
provides get and put methods for reading and writing its type and
arrays of its type. Of these, ByteBuffer
is the most flexible. Because it
has the “finest grain” of all the buffers, it has been given a full
complement of get and put methods for reading and writing all the
other data types as well as byte
.
Here are some ByteBuffer
methods:
byte
get
()
char
getChar
()
short
getShort
()
int
getInt
()
long
getLong
()
float
getFloat
()
double
getDouble
()
void
put
(
byte
b
)
void
put
(
ByteBuffer
src
)
void
put
(
byte
[]
src
,
int
offset
,
int
length
)
void
put
(
byte
[]
src
)
void
putChar
(
char
value
)
void
putShort
(
short
value
)
void
putInt
(
int
value
)
void
putLong
(
long
value
)
void
putFloat
(
float
value
)
void
putDouble
(
double
value
)
As we said, all the standard buffers also support random access.
For each of the aforementioned methods of ByteBuffer
, an additional form takes an
index; for example:
getLong
(
int
index
)
putLong
(
int
index
,
long
value
)
But that’s not all. ByteBuffer
can also provide “views” of
itself as any of the coarse-grained types. For example, you can fetch
a ShortBuffer
view of a ByteBuffer
with the asShortBuffer()
method. The ShortBuffer
view is
backed by the ByteBuffer
, which means that they work on
the same data, and changes to either one affect the other. The view
buffer’s extent starts at the ByteBuffer
’s current position, and its
capacity is a function of the remaining number of bytes, divided by
the new type’s size. (For example, short
s consume two bytes each, float
s four, and long
s and double
s take eight.) View buffers are
convenient for reading and writing large blocks of a contiguous type
within a ByteBuffer
.
CharBuffer
s are interesting
as well, primarily because of their integration with String
s. Both CharBuffer
s and String
s implement the java.lang.CharSequence
interface. This is
the interface that provides the standard charAt()
and length()
methods. Because of this, newer
APIs (such as the java.util.regex
package) allow you to use a CharBuffer
or a String
interchangeably. In this case, the
CharBuffer
acts like a modifiable
String
with user-configurable,
logical start and end positions.
Because we’re talking about reading and writing types
larger than a byte, the question arises: in what order do the bytes of
multibyte values (e.g., short
s and
int
s) get written? There are two
camps in this world: “big endian” and “little endian.”[36] Big endian means that the most significant bytes come
first; little endian is the reverse. If you’re writing binary data for
consumption by some native application, this is important.
Intel-compatible computers use little endian, and many workstations
that run Unix use big endian. The ByteOrder
class
encapsulates the choice. You can specify the byte order to use with
the ByteBuffer order()
method,
using the identifiers ByteOrder.BIG_ENDIAN
and ByteOrder.LITTLE_ENDIAN
like so:
byteArray
.
order
(
ByteOrder
.
BIG_ENDIAN
);
You can retrieve the native ordering for your platform using the
static ByteOrder.nativeOrder()
method. (I know you’re curious.)
You can create a buffer either by allocating it
explicitly using allocate()
or by
wrapping an existing plain Java array type. Each buffer type has a
static allocate()
method that takes
a capacity (size) and also a wrap()
method that takes an existing array:
CharBuffer
cbuf
=
CharBuffer
.
allocate
(
64
*
1024
);
A direct buffer is allocated in the same way, with the
allocateDirect()
method:
ByteBuffer
bbuf
=
ByteBuffer
.
allocateDirect
(
64
*
1024
);
ByteBuffer
bbuf2
=
ByteBuffer
.
wrap
(
someExistingArray
);
As we described earlier, direct buffers can use operating system memory structures that are optimized for use with some kinds of I/O operations. The tradeoff is that allocating a direct buffer is a little slower and heavier weight operation than a plain buffer, so you should try to use them for longer-term buffers.
Character encoders and decoders turn characters into raw
bytes and vice versa, mapping from the Unicode standard to particular
encoding schemes. Encoders and decoders have long existed in Java for
use by Reader
and Writer
streams and in the methods of the
String
class that work with byte
arrays. However, early on there was no API for working with encoding
explicitly; you simply referred to encoders and decoders wherever
necessary by name as a String
. The
java.nio.charset
package formalized the idea of a Unicode character set encoding with the
Charset
class.
The Charset
class is a factory
for Charset
instances, which know how
to encode character buffers to byte buffers and decode byte buffers to
character buffers. You can look up a character set by name with the
static Charset.forName()
method and
use it in conversions:
Charset
charset
=
Charset
.
forName
(
"US-ASCII"
);
CharBuffer
charBuff
=
charset
.
decode
(
byteBuff
);
// to ascii
ByteBuffer
byteBuff
=
charset
.
encode
(
charBuff
);
// and back
You can also test to see if an encoding is available with the
static Charset.isSupported()
method.
The following character sets are guaranteed to be supplied:
You can list all the encoders available on your platform using the
static availableCharsets()
method:
Map
map
=
Charset
.
availableCharsets
();
Iterator
it
=
map
.
keySet
().
iterator
();
while
(
it
.
hasNext
()
)
System
.
out
.
println
(
it
.
next
()
);
The result of availableCharsets()
is a map because character
sets may have “aliases” and appear under more than one name.
In addition to the buffer-oriented classes of the java.nio
package, the InputStreamReader
and OutputStreamWriter
bridge classes of the
java.io
package have been updated to
work with Charset
as well. You can
specify the encoding as a Charset
object or by name.
You can get more control over the encoding and decoding
process by creating an instance of CharsetEncoder
or CharsetDecoder
(a codec) with the Charset newEncoder()
and newDecoder()
methods. In the previous
snippet, we assumed that all the data was available in a single
buffer. More often, however, we might have to process data as it
arrives in chunks. The encoder/decoder API allows for this by
providing more general encode()
and
decode()
methods that
take a flag specifying whether more data is expected. The codec needs
to know this because it might have been left hanging in the middle of
a multibyte character conversion when the data ran out. If it knows
that more data is coming, it does not throw an error on this
incomplete conversion. In the following snippet, we use a decoder to
read from a ByteBuffer bbuff
and
accumulate character data into a CharBuffer
cbuff
:
CharsetDecoder
decoder
=
Charset
.
forName
(
"US-ASCII"
).
newDecoder
();
boolean
done
=
false
;
while
(
!
done
)
{
bbuff
.
clear
();
done
=
(
in
.
read
(
bbuff
)
==
-
1
);
bbuff
.
flip
();
decoder
.
decode
(
bbuff
,
cbuff
,
done
);
}
cbuff
.
flip
();
// use cbuff. . .
Here, we look for the end of input condition on the in
channel to set the flag done
. Note that we take advantage of the
flip()
method on ByteBuffer
to set the limit to the amount of
data read and reset the position, setting us up for the decode
operation in one step. The encode()
and decode()
methods also return a
result object, CoderResult
, that
can determine the progress of encoding (we do not use it in the
previous snippet). The methods isError()
, isUnderflow()
, and
isOverflow()
on the
CoderResult
specify why encoding
stopped: for an error, a lack of bytes on the input buffer, or a full
output buffer, respectively.
Now that we’ve covered the basics of channels and buffers,
it’s time to look at a real channel type. The FileChannel
is the NIO equivalent of the
java.io.RandomAccessFile
, but it
provides several core new features in addition to some performance
optimizations. In particular, use a FileChannel
in place of a plain java.io
file stream if you wish to use file
locking, memory-mapped file access, or highly optimized data transfer
between files or between file and network channels.
A FileChannel
can be created
for a Path
using the static FileChannel
open()
method.
FileSystem
fs
=
FileSystems
.
getDefault
();
Path
p
=
fs
.
getPath
(
"/tmp/foo.txt"
);
// Open default for reading
try
(
FileChannel
channel
=
FileChannel
.
open
((
p
)
)
{
...
}
// Open with options for writing
import
static
java
.
nio
.
file
.
StandardOpenOption
.*;
try
(
FileChannel
channel
=
FileChannel
.
open
(
p
,
WRITE
,
APPEND
,
...
)
)
{
...
}
By default, open()
creates a
read-only channel for the file. We can open a channel for writing or
appending and control other more advanced features such as atomic create
and data syncing by passing additional options as shown in the second
part of the previous example. Table 12-4 summarizes these
options.
Table 12-4. java.nio.file.StandardOpenOption
Option | Description |
---|---|
READ , WRITE | Open the file for read-only or write-only (default is read-only). Use both for read-write. |
APPEND | Open the file for writing; all writes are positioned at the end of the file. |
CREATE | Use with WRITE to open
the file and create it if needed. |
CREATE_NEW | Use with WRITE to
create a file atomically; failing if the file already
exists. |
DELETE_ON_CLOSE | Attempt to delete the file when it is closed or, if open, when the VM exits. |
SYNC , DSYNC | Wherever possible, guarantee that write operations block
until all data is written to storage. SYNC does this for all file changes
including data and metadata (attributes) whereas DSYNC only adds this requirement for
the data content of the file. |
SPARSE | Use when creating a new file, requests the file be sparse. On filesystems where this is supported, a sparse file handles very large, mostly empty files without allocating as much real storage for empty portions. |
TRUNCATE_EXISTING | Use WRITE on an
existing file, set the file length to zero upon opening
it. |
A FileChannel
can also be
constructed from a classic FileInputStream
, FileOutputStream
, or RandomAccessFile
:
FileChannel
readOnlyFc
=
new
FileInputStream
(
"file.txt"
).
getChannel
();
FileChannel
readWriteFc
=
new
RandomAccessFile
(
"file.txt"
,
"rw"
)
.
getChannel
();
FileChannel
s created from these
file input and output streams are read-only or write-only, respectively.
To get a read/write FileChannel
, you
must construct a RandomAccessFile
with read/write permissions, as in the previous example.
Using a FileChannel
is just
like a RandomAccessFile
, but it works
with ByteBuffer
instead of byte
arrays:
ByteBuffer
bbuf
=
ByteBuffer
.
allocate
(
...
);
bbuf
.
clear
();
readOnlyFc
.
position
(
index
);
readOnlyFc
.
read
(
bbuf
);
bbuf
.
flip
();
readWriteFc
.
write
(
bbuf
);
You can control how much data is read and written either by setting buffer position and limit markers or using another form of read/write that takes a buffer starting position and length. You can also read and write to a random position by supplying indexes with the read and write methods:
readWriteFc
.
read
(
bbuf
,
index
)
readWriteFc
.
write
(
bbuf
,
index2
);
In each case, the actual number of bytes read or written depends
on several factors. The operation tries to read or write to the limit of
the buffer, and the vast majority of the time that is what happens with
local file access. The operation is guaranteed to block only until at
least one byte has been processed. Whatever happens, the number of bytes
processed is returned, and the buffer position is updated accordingly,
preparing you to repeat the operation until it is complete if needed.
This is one of the conveniences of working with buffers; they can manage
the count for you. Like standard streams, the channel read()
method returns -1
upon reaching the end of input.
The size of the file is always available with the size()
method. It can change if you write past
the end of the file. Conversely, you can truncate the file to a
specified length with the truncate()
method.
FileChannel
s are
safe for use by multiple threads and guarantee that data “viewed” by
them is consistent across channels in the same VM. Unless you specify
the SYNC
or DSYNC
options, no guarantees are made about
how quickly writes are propagated to the storage mechanism. If you
only intermittently need to be sure that data is safe before moving
on, you can use the force()
method to
flush changes to disk. The force()
method takes a Boolean argument indicating whether or not file
metadata, including timestamp and permissions, must be written (sync
or dsync). Some systems keep track of reads on files as well as
writes, so you can save a lot of updates if you set the flag to
false
, which indicates that you
don’t care about syncing that data immediately.
As with all Channel
s, a
FileChannel
may be closed by any
thread. Once closed, all its read/write and position-related methods
throw a ClosedChannelException
.
FileChannel
s support
exclusive and shared locks on regions of files through the lock()
method:
FileLock
fileLock
=
fileChannel
.
lock
();
int
start
=
0
,
len
=
fileChannel2
.
size
();
FileLock
readLock
=
fileChannel2
.
lock
(
start
,
len
,
true
);
Locks may be either shared or exclusive. An exclusive lock prevents others from acquiring a lock of any kind on the specified file or file region. A shared lock allows others to acquire overlapping shared locks but not exclusive locks. These are useful as write and read locks, respectively. When you are writing, you don’t want others to be able to write until you’re done, but when reading, you need only to block others from writing, not reading concurrently.
The no-args lock()
method in
the previous example attempts to acquire an exclusive lock for the
whole file. The second form accepts a starting and length parameter as
well as a flag indicating whether the lock should be shared (or
exclusive). The FileLock
object
returned by the lock()
method can
be used to release the lock:
fileLock
.
release
();
Note that file locks are only guaranteed be a
cooperative API; they do not necessarily prevent
anyone from reading or writing to the locked file contents. In
general, the only way to guarantee that locks are obeyed is for both
parties to attempt to acquire the lock and use it. Also, shared locks
are not implemented on some systems, in which case all requested locks
are exclusive. You can test whether a lock is shared with the
isShared()
method.
FileChannel
locks are held
until the channel is closed or interrupted, so performing locks within
a try
-with-resources statement will
help ensure that locks are released more robustly.
try
(
FileChannel
channel
=
FileChannel
.
open
(
p
,
WRITE
)
)
{
channel
.
lock
();
...
}
One of the most interesting features offered through
FileChannel
is the ability to map a
file into memory. When a file is memory-mapped,
like magic it becomes accessible through a single ByteBuffer
—as if the entire file was read
into memory at once. The implementation of this is extremely
efficient, generally among the fastest ways to access the data. For
working with large files, memory mapping can save a lot of resources
and time.
This may seem counterintuitive; we’re getting a conceptually easier way to access our data and it’s also faster and more efficient? What’s the catch? There really is no catch. The reason for this is that all modern operating systems are based on the idea of virtual memory. In a nutshell, that means that the operating system makes disk space act like memory by continually paging (swapping 4KB blocks called “pages”) between memory and disk, transparent to the applications. Operating systems are very good at this; they efficiently cache the data that the application is using and let go of what is not in use. Memory-mapping a file is really just taking advantage of what the OS is doing internally.
A good example of where a memory-mapped file would be useful is
in a database. Imagine a 10 GB file containing records indexed at
various positions. By mapping the file, we can work with a standard
ByteBuffer
, reading
and writing data at arbitrary positions and letting the native
operating system read and write the underlying data in fine-grained
pages as necessary. We could emulate this behavior with RandomAccessFile
or FileChannel
, but we would have to explicitly
read and write data into buffers first, and the implementation would
almost certainly not be as efficient.
A mapping is created with the FileChannel map()
method. For
example:
FileChannel
fc
=
FileChannel
.
open
(
fs
.
getPath
(
"index.db"
),
CREATE
,
READ
,
WRITE
);
MappedByteBuffer
mappedBuff
=
fc
.
map
(
FileChannel
.
MapMode
.
READ_WRITE
,
0
,
fc
.
size
()
);
The map()
method returns
a MappedByteBuffer
,
which is simply the standard ByteBuffer
with a few additional methods
relating to the mapping. The most important is force()
, which ensures that any data written
to the buffer is flushed out to permanent storage on the disk. The
READ_ONLY
and
READ_WRITE
constant
identifiers of the FileChannel.MapMode
static inner class
specify the type of access. Read/write access is available only when
mapping a read/write file channel. Data read through the buffer is
always consistent within the same Java VM. It may also be consistent
across applications on the same host machine, but this is not
guaranteed.
Again, a MappedByteBuffer
acts just like a ByteBuffer
.
Continuing with the previous example, we could decode the buffer with
a character decoder and search for a pattern like so:
CharBuffer
cbuff
=
Charset
.
forName
(
"US-ASCII"
).
decode
(
mappedBuff
);
Matcher
matcher
=
Pattern
.
compile
(
"abc*"
).
matcher
(
cbuff
);
while
(
matcher
.
find
()
)
System
.
out
.
println
(
matcher
.
start
()+
": "
+
matcher
.
group
(
0
)
);
Here, we have implemented something like the Unix grep command by relying on the Regular
Expression API working with our CharBuffer
as a CharSequence
. We’ve cheated a bit in this
example since the CharBuffer
allocated by the decode()
method is
as large as the mapped file and must be held in memory. To do this
efficiently, we could use the CharsetDecoder
discussed earlier in this
chapter to iterate through the large mapped space without pulling
everything into memory.
The final feature of File
Channel
that we’ll examine is
performance optimization. FileChannel
supports two highly optimized
data transfer methods: transferFrom()
and
transferTo()
, which
move data between the file channel and another channel. These methods
can take advantage of direct buffers internally to move data between
the channels as fast as possible, often without copying the bytes into
Java’s memory space at all. The following example should be the
fastest way to implement a file copy in Java short of using the
built-in Files
copy()
method:
import
java.nio.channels.*
;
import
java.nio.file.*
;
import
static
java
.
nio
.
file
.
StandardOpenOption
.*;
public
class
CopyFile
{
public
static
void
main
(
String
[]
args
)
throws
Exception
{
FileSystem
fs
=
FileSystems
.
getDefault
();
Path
fromFile
=
fs
.
getPath
(
args
[
0
]
);
Path
toFile
=
fs
.
getPath
(
args
[
1
]
);
try
(
FileChannel
in
=
FileChannel
.
open
(
fromFile
);
FileChannel
out
=
FileChannel
.
open
(
toFile
,
CREATE
,
WRITE
);
)
{
in
.
transferTo
(
0
,
(
int
)
in
.
size
(),
out
);
}
}
}
When we return to NIO in the next chapter, we will see
that network channels are types of SelectableChannel
, which means that they can
be managed with a selector to poll for when the
channels are ready to be read or written and manage them efficiently
without blocking threads. File channels are not
selectable channels and most regular file operations simply block
until they are completed. This is not to say that file operations
always block until all the bytes we want are read from or written to
disk. In general, read operations may return fewer bytes than
requested and write operations may boh write fewer bytes and also may
buffer data in memory unless we use the SYNC or DSYNC open options.
But in a world where disk access can be many, many orders of magnitude
slower than in-memory operations even these partial reads and writes
may be slow enough that we do not wish to block waiting for
them.
The obvious solution is to use multithreading and coordinate our
reads and writes in a separate thread from our main logic. Java 7 has
made this easier by introducing the AysnchronousFileChannel
, which is a file
channel that delegates all of its operations to a thread pool and can
report results using a Future
object or asynchronous callback. All read and write operations on
asynchronous file channels must specify the byte offset for the
operation (as there is no well-defined “current” offset into the file
at any given time). The simplest example is to write a file update in
the background without gathering results:
AsynchronousFileChannel
channel
=
AsynchronousFileChannel
.
open
(
path
,
WRITE
);
// Write logBuffer to the end of the file in the background, returning
// immediately
channel
.
write
(
logBuffer
,
channel
.
size
()
);
...
Here, we have constructed an AsynchronousFileChannel
analogous to the way
we’d open a regular file channel. Our write happens in the background
and the write()
method returns
immediately. By default, the channel will use a system default thread
pool to perform our write in the background. Alternately, we could
have supplied our own Executor
service for the thread pool as an argument to the open()
call. If at some point we need to
sync up and guarantee that all data is written, we can use the
channel’s force()
method to block
until all writes are complete.
A more interesting case is a read operation where we need the
bytes returned from the operation. In this case we can supply a
callback CompletionHandler
object
that will push the results to us when they are ready.
AsynchronousFileChannel
channel
=
AsynchronousFileChannel
.
open
(
path
);
ByteBuffer
bbuff
=
ByteBuffer
.
allocate
(
1024
);
Object
attachment
=
...;
channel
.
read
(
bbuff
,
offset
,
attachment
,
new
CompletionHandler
<
Integer
,
Object
>()
{
@Override
public
void
completed
(
Integer
result
,
Object
attachment
)
{
System
.
out
.
println
(
"read bytes = "
+
result
);
}
@Override
public
void
failed
(
Throwable
exc
,
Object
attachment
){
...
}
}
);
The additional argument attachment
in the read call can be any
object we like, and it is simply returned to us in the callback as a
way for us to maintain any context needed to service the result. Here,
we print the number of bytes ready, which as usual may be fewer than
we requested, but at least didn’t require us to wait for them. The
other possibility illustrated here is that the read may fail, in which
case our failed()
method is invoked
with the associated exception.
We’ve laid the groundwork for using the NIO package in
this chapter, but left out some of the important pieces. In the next
chapter, we’ll see more of the real motivation for java.nio
when we talk about nonblocking and
selectable I/O. In addition to the performance optimizations that can be
made through direct buffers, these capabilities make possible designs
for network servers that use fewer threads and can scale well to large
systems. In that chapter, we’ll look at the other significant Channel
types: SocketChannel
, ServerSocketChannel
, and DatagramChannel
.
[36] The terms big endian and little endian come from Jonathan Swift’s novel Gulliver’s Travels, where it denoted two camps of Lilliputians: those who eat their eggs from the big end and those who eat them from the little end.
Get Learning Java, 4th Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.