By Robert Love
Book Price: $49.99 USD
£30.99 GBP
PDF Price: $34.99
Cover | Table of Contents | Colophon
int) called the file descriptor, abbreviated fd. File descriptors are shared with user space, and are used directly by user programs to access files. A large part of Linux system programming consists of opening, manipulating, closing, and otherwise using file descriptors.int type. Not using a special type—an fd_t, say—is often considered odd, but is, historically, the Unix way. Each Linux process has a maximum number of files that it may open. File descriptors start at 0, and go up to one less than this maximum value. By default, the maximum is 1,024, but it can be configured as high as 1,048,576. Because negative values are not legal file descriptors, −1 is often used to indicate an error from a function that would otherwise return a valid file descriptor.read( ) and write( ) system calls. Before a file can be accessed, however, it must be opened via an open( ) or creat( ) system call. Once done using the file, it should be closed using the system call close( ).open( ) system call:
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int open (const char *name, int flags);
int open (const char *name, int flags, mode_t mode);
open( ) system call maps the file given by the pathname name to a file descriptor, which it returns on success. The file position is set to zero, and the file is opened for access according to the flags given by flags.flags argument must be one of O_RDONLY, O_WRONLY, or O_RDWR. Respectively, these arguments request that the file be opened only for reading, only for writing, or for both reading and writing.
int fd;
fd = open ("/home/kidd/madagascar", O_RDONLY);
if (fd == −1)
/* error */
open( ) system call must have sufficient permissions to obtain the access requested.flags argument can be bitwise-ORed with one or more of the following values, modifying the behavior of the open request:O_APPENDO_ASYNCSIGIO by default) will be generated when the specified file becomes readable or writable. This flag is available only for terminals and sockets, not for regular files.O_CREATname does not exist, the kernel will create it. If the file already exists, this flag has no effect unless O_EXCL is also given.O_DIRECTread( ) system call, defined in POSIX.1:
#include <unistd.h>
ssize_t read (int fd, void *buf, size_t len);
len bytes into buf from the current file offset of the file referenced by fd. On success, the number of bytes written into buf is returned. On error, the call returns −1, and errno is set. The file position is advanced by the number of bytes read from fd. If the object represented by fd is not capable of seeking (for example, a character device file), the read always occurs from the "current" position.fd into word. The number of bytes read is equal to the size of the unsigned long type, which is four bytes on 32-bit Linux systems, and eight bytes on 64-bit systems. On return, nr contains the number of bytes read, or −1 on error:
unsigned long word;
ssize_t nr;
/* read a couple bytes into 'word' from 'fd' */
nr = read (fd, &word, sizeof (unsigned long));
if (nr == −1)
/* error */
len bytes, and it could produce certain errors that this code does not check for and handle. Code such as this, unfortunately, is very common. Let's see how to improve it.read( ) to return a positive nonzero value less than len. This can happen for a number of reasons: less than len bytes may have been available, the system call may have been interrupted by a signal, the pipe may have broken (if fd is a pipe), and so on.0 is another consideration when using read( ). The read( ) system call returns 0 to indicate end-of-file (EOF); in this case, of course, no bytes were read. EOF is not considered an error (and hence is not accompanied by a −1 return value); it simply indicates that the file position has advanced past the last valid offset in the file, and thus there is nothing else to read. If, however, a call is made for write( ). write( ) is the counterpart of read( ) and is also defined in POSIX.1:
#include <unistd.h>
ssize_t write (int fd, const void *buf, size_t count);
write( ) writes up to count bytes starting at buf to the current file position of the file referenced by the file descriptor fd. Files backed by objects that do not support seeking (for example, character devices) always write starting at the "head."−1 is returned, and errno is set appropriately. A call to write( ) can return 0, but this return value does not have any special meaning; it simply implies that zero bytes were written.read( ), the most basic usage is simple:
const char *buf = "My ship is solid!";
ssize_t nr;
/* write the string in 'buf' to 'fd' */
nr = write (fd, buf, strlen (buf));
if (nr == −1)
/* error */
read( ), this usage is not quite right. Callers also need to check for the possible occurrence of a partial write:
unsigned long word = 1720;
size_t count;
ssize_t nr;
count = sizeof (word);
nr = write (fd, &word, count);
if (nr == −1)
/* error, check errno */
else if (nr != count)
/* possible error, but 'errno' not set */
write( ) system call is less likely to return a partial write than a read( ) system call is to return a partial read. Also, there is no EOF condition for a write( ) system call. For regular files, write( ) is guaranteed to perform the entire requested write, unless an error occurs.write( ) may return an error revealing what caused the first call to perform only a partial write (although, again, this situation is not very common). Here's an example:
ssize_t ret, nr;
while (len != 0 && (ret = write (fd, buf, len)) != 0) {
if (ret == −1) {
if (errno == EINTR)
continue;
perror ("write");
break;
}
len -= ret;
buf += ret;
}
fsync( ) system call, defined by POSIX.1b:
#include <unistd.h>
int fsync (int fd);
fsync( ) ensures that all dirty data associated with the file mapped by the file descriptor fd is written back to disk. The file descriptor fd must be open for writing. The call writes back both data and metadata, such as creation timestamps, and other attributes contained in the inode. It will not return until the hard drive says that the data and metadata are on the disk.fsync( ) to know whether the data is physically on the disk. The hard drive can report that the data was written, but the data may in fact reside in the drive's write cache. Fortunately, data in a hard disk's cache should be committed to the disk in short order.fdatasync( ):
#include <unistd.h>
int fdatasync (int fd);
fsync( ), except that it only flushes data. The call does not guarantee that metadata is synchronized to disk, and is therefore potentially faster. Often this is sufficient.
int ret;
ret = fsync (fd);
if (ret == −1)
/* error */
O_DIRECT flag to open( ) instructs the kernel to minimize the presence of I/O management. When this flag is provided, I/O will initiate directly from user-space buffers to the device, bypassing the page cache. All I/O will be synchronous; operations will not return until completed.close( ) system call:
#include <unistd.h>
int close (int fd);
close( ) unmaps the open file descriptor fd, and disassociates the process from the file. The given file descriptor is then no longer valid, and the kernel is free to reuse it as the return value to a subsequent open( ) or creat( ) call. A call to close( ) returns 0 on success. On error, it returns −1, and sets errno appropriately. Usage is simple:
if (close (fd) == −1)
perror ("close");
close( ) may also result in an unlinked file finally being physically removed from the disk.close( ). This can result in missing a crucial error condition because errors associated with deferred operations may not manifest until later, and close( ) can report them.errno values on failure. Other than EBADF (the given file descriptor was invalid), the most important error value is EIO, indicating a low-level I/O error probably unrelated to the actual close. Regardless of any reported error, the file descriptor, if valid, is always closed, and the associated data structures are freed.lseek( ) system call is provided to set the file position of a file descriptor to a given value. Other than updating the file position, it performs no other action, and initiates no I/O whatsoever:
#include <sys/types.h>
#include <unistd.h>
off_t lseek (int fd, off_t pos, int origin);
lseek( ) depends on the origin argument, which can be one of the following:SEEK_CURfd is set to its current value plus pos, which can be negative, zero, or positive. A pos of zero returns the current file position value.SEEK_ENDfd is set to the current length of the file plus pos, which can be negative, zero, or positive. A pos of zero sets the offset to the end of the file.SEEK_SETfd is set to pos. A pos of zero sets the offset to the beginning of the file.−1 and errno is set as appropriate.fd to 1825:
off_t ret;
ret = lseek (fd, (off_t) 1825, SEEK_SET);
if (ret == (off_t) −1)
/* error */
fd to the end of the file:
off_t ret;
ret = lseek (fd, 0, SEEK_END);
if (ret == (off_t) −1)
/* error */
lseek( ) returns the updated file position, it can be used to find the current file position via a SEEK_CUR to zero:
int pos;
pos = lseek (fd, 0, SEEK_CUR);
if (pos == (off_t) −1)
/* error */
else
/* 'pos' is the current position of fd */
lseek( ) are seeking to the beginning, seeking to the end, or determining the current file position of a file descriptor.lseek( ) to advance the file pointer past the end of a file. For example, this code seeks to 1,688 bytes beyond the end of the file mapped by fd:
int ret;
ret = lseek (fd, (off_t) 1688, SEEK_END);
if (ret == (off_t) −1)
/* error */
lseek( ), Linux provides two variants of the read( ) and write( ) system calls that each take as a parameter the file position from which to read or write. Upon completion, they do not update the file position.pread( ):
#define _XOPEN_SOURCE 500
#include <unistd.h>
ssize_t pread (int fd, void *buf, size_t count, off_t pos);
count bytes into buf from the file descriptor fd at file position pos.pwrite( ):
#define _XOPEN_SOURCE 500
#include <unistd.h>
ssize_t pwrite (int fd, const void *buf, size_t count, off_t pos);
count bytes from buf to the file descriptor fd at file position pos.p brethren, except that they completely ignore the current file position; instead of using the current position, they use the value provided by pos. Also, when done, they do not update the file position. In other words, any intermixed read( ) and write( ) calls could potentially corrupt the work done by the positional calls.read( ) or write( ) call with a call to lseek( ), with three differences. First, these calls are easier to use, especially when doing a tricky operation such as moving through a file backward or randomly. Second, they do not update the file pointer upon completion. Finally, and most importantly, they avoid any potential races that might occur when using lseek( ). As threads share file descriptors, it would be possible for a different thread in the same program to update the file position after the first thread's call to lseek( ), but before its read or write operation executed. Such race conditions can be avoided by using the pread( ) and pwrite( ) system calls.0 from pread( ) indicates EOF; from pwrite( ), a return value of 0 indicates that the call did not write anything. On error, both calls return
#include <unistd.h>
#include <sys/types.h>
int ftruncate (int fd, off_t len);
#include <unistd.h>
#include <sys/types.h>
int truncate (const char *path, off_t len);
len. The ftruncate( ) system call operates on the file descriptor given by fd, which must be open for writing. The truncate( ) system call operates on the filename given by path, which must be writable. Both return 0 on success. On error, they return −1, and set errno as appropriate.len. The data previously existing between len and the old length is discarded, and no longer accessible via a read request.Edward Teach was a notorious English pirate. He was nicknamed Blackbeard.
#include <unistd.h>
#include <stdio.h>
int main( )
{
int ret;
ret = truncate ("./pirate.txt", 45);
if (ret == −1) {
perror ("truncate");
return −1;
}
return 0;
}
Edward Teach was a notorious English pirate.
read( ) system call is issued, and there is not yet any data—the process will block, no longer able to service the other file descriptors. It might block for just a few seconds, making the application inefficient and annoying the user. However, if no data becomes available on the file descriptor, it could block forever. Because file descriptors' I/O is often interrelated—think pipes—it is quite possible for one file descriptor not to become ready until another is serviced. Particularly with network applications, which may have many sockets open simultaneously, this is potentially quite a problem.read( ) system call, it takes an interesting journey. The C library provides definitions of the system call that are converted to the appropriate trap statements at compile-time. Once a user-space process is trapped into the kernel, passed through the system call handler, and handed to the read( ) system call, the kernel figures out what object dd bs=1 count=2097152 if=/dev/zero of=pirate
bs=1 argument, this command will copy two megabytes from the device /dev/zero (a virtual device providing an endless stream of zeros) to the file pirate in 2,097,152 one byte chunks. That is, it will copy the data via about two million read and write operations—one byte at a time.dd bs=1024 count=2048 if=/dev/zero of=pirate
dd bs=1 count=2097152 if=/dev/zero of=pirate
bs=1 argument, this command will copy two megabytes from the device /dev/zero (a virtual device providing an endless stream of zeros) to the file pirate in 2,097,152 one byte chunks. That is, it will copy the data via about two million read and write operations—one byte at a time.dd bs=1024 count=2048 if=/dev/zero of=pirate
Block size | Real time | User time | System time |
|---|---|---|---|
1 byte | 18.707 seconds | 1.118 seconds | 17.549 seconds |
1,024 bytes | 0.025 seconds | 0.002 seconds | 0.023 seconds |
1,130 bytes | 0.035 seconds | 0.002 seconds | 0.027 seconds |
FILE typedef, which is defined in <stdio.h>.fopen( ):
#include <stdio.h>
FILE * fopen (const char *path, const char *mode);
path according to the given modes, and associates a new stream with it.mode argument describes how to open the given file. It is one of the following strings:rr+ww+aa+b, although this value is always ignored on Linux. Some operating systems treat text and binary files differently, and the b mode instructs the file to be opened in binary mode. Linux, as with all POSIX-conforming systems, treats text and binary files identically.fopen( ) returns a valid FILE pointer. On failure, it returns NULL, and sets errno appropriately.stream:
FILE *stream;
stream = fopen ("/etc/manifest", "r");
if (!stream)
/* error */
fdopen( ) converts an already open file descriptor (fd) to a stream:
#include <stdio.h>
FILE * fdopen (int fd, const char *mode);
fopen( ), and must be compatible with the modes originally used to open the file descriptor. The modes w and w+ may be specified, but they will not cause truncation. The stream is positioned at the file position associated with the file descriptor.fdopen( ) returns a valid file pointer; on failure, it returns NULL.open( ) system call, and then uses the backing file descriptor to create an associated stream:
FILE *stream;
int fd;
fd = open ("/home/kidd/map.txt", O_RDONLY);
if (fd == −1)
/* error */
stream = fdopen (fd, "r");
if (!stream)
/* error */
fclose( ) function closes a given stream:
#include <stdio.h>
int fclose (FILE *stream);
fclose( ) returns 0. On failure, it returns EOF and sets errno appropriately.fcloseall( ) function closes all streams associated with the current process, including standard in, standard out, and standard error:
#define _GNU_SOURCE
#include <stdio.h>
int fcloseall (void);
0; it is Linux-specific.