O'Reilly logo

Network Security with OpenSSL by Pravir Chandra, Matt Messier, John Viega

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Abstract Input/Output

The BIO package provides a powerful abstraction for handling input and output. Many different types of BIO objects are available for use, but they all fall into one of two basic categories: source/sink and filter, both of which will be described in detail in upcoming sections. BIOs can be attached together in chains, allowing the data to flow through multiple BIO objects for processing as it is read or written. For example, a BIO chain can be created that causes data to be base64-encoded as it is written out to a file and decoded as it is read from a file. This feature of BIOs makes them very flexible and powerful. A single function with a BIO parameter can be written to read or write some data, and just by setting up a BIO chain, it is possible for that one function to deal with all kinds of different types of data encoding.

The OpenSSL library provides a variety of functions for creating and destroying BIOs, chaining them together, and reading or writing data. It's important to note that the BIO package is a low-level package, and as such, you must exercise care in using it. Many of the functions will allow you to perform operations that could later lead to unpredictable behavior and even crashes.

BIO_new function is used to create a new BIO. It requires a BIO_METHOD object to be specified, which defines the type of BIO the new object will be. We'll discuss the available BIO_METHOD objects in the next two sections. If the BIO is created successfully, it will be returned. If an error occurred in creating the BIO, NULL will be returned.

The BIO *BIO_new(BIO_METHOD *type);

Once a BIO is created, its BIO_METHOD can be changed to some other type using the BIO_set function, which will return 0 if an error occurs; otherwise, the return will be nonzero to indicate success. You should take care in using BIO_set, particularly if the BIO is part of a chain because the call will improperly break the chain.

int BIO_set(BIO *bio, BIO_METHOD *type);

When a BIO is no longer needed, it should be destroyed. The function BIO_free will destroy a single BIO and return nonzero if it was successfully destroyed; otherwise, it will return 0.

int BIO_free(BIO *bio);

The BIO_vfree function is identical to BIO_free except that it does not return a value.

void BIO_vfree(BIO *bio);

The BIO_free_all function can be used to free an entire chain of BIOs. When using BIO_free_all, you must ensure that you specify the BIO that is the head of the chain, which is usually a filter BIO. If the BIO that you wish to destroy is part of a chain, you must first remove it from the chain before calling BIO_free or BIO_vfree; otherwise, the chain will end up with a dangling pointer to the BIO that you've destroyed.

void BIO_free_all(BIO *bio);

The BIO_push and BIO_pop functions are poorly named because they imply that a stack is being operated on, but in fact, there is no stack.

The BIO_push function will append a BIO to a BIO, either creating or lengthening a BIO chain. The returned BIO will always be the BIO that was initially specified as the head of the chain. In other words, the return value will be the same as the first argument, bio.

BIO *BIO_push(BIO *bio, BIO *append);
bio

The BIO that should have another BIO, typically a filter BIO, appended to its chain.

append

The BIO that should be appended to the chain.

The BIO_pop function will remove the specified BIO from the chain that it is part of and return the next BIO in the chain or NULL if there is no next BIO.

BIO *BIO_pop(BIO *bio);
bio

The BIO that should be removed from the chain of which it is a part.

BIO_read behaves almost identically to the C runtime function read. The primary difference between the two is in how the return value is interpreted. For both functions, a return value that is greater than zero is the number of bytes that were successfully read. A return value of 0 indicates that no data is currently available to be read. For the C read function, a return value of -1 indicates that an error occurred. Often, this is the case with BIO_read as well, but it doesn't necessarily mean that an error has occurred. We'll talk more about this in a moment.

int BIO_read(BIO *bio, void *buf, int len);
bio

The first BIO in a chain that will be used for reading data. If there is no chain, this is a source BIO; otherwise, it should be a filter BIO.

buf

The buffer that will receive the data that is read.

len

The number of bytes to read. It may be less than the actual buffer size, but it should never be larger.

Another function that is provided for reading data from a source BIO is BIO_gets , which usually behaves almost identically to its C runtime counterpart, fgets. In general, you should probably avoid using this function if you can, because it is not supported by all types of BIOs, and some types of BIOs may behave differently than you might expect. Normally, though, it will read data until it finds an end-of-line character or the maximum number of bytes are read, whichever happens first. If an end-of-line character is read, it will be included in the buffer. The return value from this function is the same as for BIO_read.

int BIO_gets(BIO *bio, char *buf, int len);
bio

The first BIO in a chain that will be used for reading data. If there is no chain, this is a source BIO; otherwise, it should be a filter BIO.

buf

The buffer that will receive the data that is read.

len

The maximum number of bytes to read. This length should include space for the NULL terminating character, and of course, should never exceed the size of the buffer that will receive the data.

Corresponding to BIO_read for reading from a source BIO is BIO_write , which writes data to a sink BIO. It behaves almost identically to the C runtime function write. The primary difference between the two is in how the return value is interpreted, as is true for BIO_read, as we just described. The return value is interpreted in much the same way as it is for BIO_read and BIO_gets, with the difference being that a positive value indicates the number of bytes that were successfully written.

int BIO_write(BIO *bio, const void *buf, int len);
bio

The first BIO in a chain that will be used for writing data. If there is no chain, this is a sink BIO; otherwise, it should be a filter BIO.

buf

The buffer that contains the data to be written.

len

The number of bytes from the buffer that should be written. It may be less than the actual buffer size, but it should never be larger.

BIO_puts interprets the specified buffer as a C-style string and attempts to write it out in its entirety. The buffer must contain a NULL terminating character, but it will not be written out with the rest of the data. The return value from this function is interpreted the same as it is for BIO_write.

int BIO_puts(BIO *bio, const char *buf);
bio

The first BIO in a chain that will be used for writing data. If there is no chain, this is a sink BIO; otherwise, it should be a filter BIO.

buf

The buffer that contains the data to be written.

We mentioned that for each of the four reading and writing functions, a 0 or -1 return value may or may not necessarily indicate that an error has occurred. A suite of functions is provided that allows us to determine whether an error really did occur, and whether we should retry the operation.

If BIO_should_retry returns a nonzero value, the call that caused the condition should be retried later. If it returns 0, the actual error condition is determined by the type of BIO. For example, if BIO_read and BIO_should_retry both return 0 and the type of BIO is a socket, the socket has been closed.

int BIO_should_retry(BIO *bio);

If BIO_should_read returns nonzero, the BIO needs to read data. As an example, this condition could occur when a filter BIO is decrypting a block cipher, and a complete block has not been read from the source. In such a case, the block would need to be completely read in order for the data to be successfully decrypted.

int BIO_should_read(BIO *bio);

If BIO_should_write returns nonzero, the BIO needs to write data. This condition could possibly occur when more data is required to satisfy a block cipher's need to fill a buffer before it can be encrypted.

int BIO_should_write(BIO *bio);

If BIO_should_io_special returns nonzero, an exceptional condition has occurred, and the meaning is entirely dependent on the type of BIO that caused the condition. For example, with a socket BIO, this could mean that out-of-band data has been received.

int BIO_should_io_special(BIO *bio);

The function BIO_retry_type returns a bit mask that describes the condition. Possible bit fields include BIO_FLAGS_READ, BIO_FLAGS_WRITE, and BIO_FLAGS_IO_SPECIAL. It is conceivable that more than one bit could be set, but with the types of BIOs that are currently included as part of OpenSSL, only one will ever be set. The functions BIO_should_read, BIO_should_write, and BIO_should_io_special are implemented as macros that test the three bits corresponding to their names.

int BIO_retry_type(BIO *bio);

The function BIO_get_retry_BIO will return a pointer to the BIO in the BIO chain that caused the retry condition. If its second argument, reason, is not NULL, it will be loaded with the reason code for the retry condition. The retry condition doesn't necessarily have to be caused by a source/sink BIO, but can be caused by a filter BIO as well.

BIO *BIO_get_retry_BIO(BIO *bio, int *reason);

The function BIO_get_retry_reason returns the reason code for the retry operation. The retry condition must be a special condition, and the BIO passed must be the BIO that caused the condition. In most cases, the BIO passed to BIO_get_retry_reason should be the BIO that is returned by BIO_get_retry_BIO.

int BIO_get_retry_reason(BIO *bio);

In many cases, BIO_flush will do nothing, but in cases in which buffered I/O is involved, it will force any buffered data to be written. For example, with a buffered file sink, it's effectively the same as calling fflush on the FILE object attached to the BIO.

int BIO_flush(BIO *bio);

Source/Sink BIOs

A BIO that is used for reading is known as a source BIO, and a sink BIO is one that is used for writing. A source/sink BIO is attached to a concrete input/output medium such as a file, a socket, or memory. Only a single source/sink BIO may exist in a chain. It is possible to conceive of situations in which it might be useful to have more than one, particularly for writing, but the source/sink types of BIOs provided by OpenSSL do not currently allow for more than one source/sink BIO to exist in a chain.

OpenSSL provides nine source/sink types of BIOs that can be used with BIO_new and BIO_set. A function is provided for each that simply returns a BIO_METHOD object suitable for passing to BIO_new or BIO_set. Most of the source/sink types of BIOs require additional setup work beyond just creating a BIO with the appropriate BIO_METHOD. We'll cover only the four most commonly used types in any detail here due to space limitations and the huge number of individual functions that are available to operate on them in various ways.

Memory sources/sinks

A memory BIO treats a memory segment the same as a file or socket, and can be created by using BIO_s_mem to obtain a BIO_METHOD object suitable for use with BIO_new and BIO_set. As an alternative, the function BIO_new_mem_buf can be used to create a read-only memory BIO, which requires a pointer to an existing memory segment for reading as well as the size of the buffer. If the size of the buffer is specified as -1, the buffer is assumed to be a C-style string, and the size of the buffer is computed to be the length of the string, not including the NULL terminating character.

When a memory BIO is created using BIO_new and BIO_s_mem, a new memory segment is created, and resized as necessary. The memory segment is owned by the BIO in this case and is destroyed when the BIO is destroyed unless BIO_set_close prevents it. BIO_get_mem_data or BIO_get_mem_ptr can be used to obtain a pointer to the memory segment. A memory BIO created with BIO_new_mem_buf will never destroy the memory segment attached to the BIO, regardless of whether BIO_set_close is used to enable it. Example 4-4 demonstrates how to create a memory BIO.

Example 4-4. Creating a memory BIO

/* Create a read/write BIO */
bio = BIO_new(BIO_s_mem(  ));
 
/* Create a read-only BIO using an allocated buffer */
buffer = malloc(4096);
bio = BIO_new_mem_buf(buffer, 4096);
 
/* Create a read-only BIO using a C-style string */
bio = BIO_new_mem_buf("This is a read-only buffer.", -1);
 
/* Get a pointer to a memory BIO's memory segment */
BIO_get_mem_ptr(bio, &buffer);
 
/* Prevent a memory BIO from destroying its memory segment when it is destroyed
 */
BIO_set_close(bio, BIO_NOCLOSE);

File sources/sinks

Two types of file BIOs are available: buffered and unbuffered. A buffered file BIO is a wrapper around the standard C runtime FILE object and its related functions. An unbuffered file BIO is a wrapper around a file descriptor and its related functions. With the exception of how the two different types of file BIOs are created, the interface for using them is essentially the same.

A buffered file BIO can be created by using BIO_s_file to obtain a BIO_METHOD object suitable for use with BIO_new and BIO_set. Alternatively, BIO_new_file can be used the same way as the standard C runtime function, fopen, is used, or BIO_new_fp can be used to create a BIO around an already existing FILE object. Using BIO_new_fp, you must specify the FILE object to use and a flag indicating whether the FILE object should be closed when the BIO is destroyed.

An unbuffered file BIO can be created by using BIO_s_fd to obtain a BIO_METHOD object suitable for use with BIO_new and BIO_set. Alternatively, BIO_new_fd can be used in the same way that BIO_new_fp cis used for buffered BIOs. The difference is that a file descriptor rather than a FILE object must be specified.

For either a buffered or an unbuffered file BIO created with BIO_new or BIO_set, additional work must be done to make the BIO usable. Initially, no underlying file object is attached to the BIO, and any read or write operations performed on the BIO always fail. Unbuffered file types of BIOs require that BIO_set_fd be used to attach a file descriptor to the BIO. Buffered file types of BIOs require that BIO_set_file be used to attach a FILE object to the BIO, or one of BIO_read_filename, BIO_write_filename, BIO_append_filename, or BIO_rw_filename be used to create an underlying FILE object with the appropriate mode for the BIO. Example 4-5 shows how to create a file BIO.

Example 4-5. Creating a file BIO

/* Create a buffered file BIO with an existing FILE object that will be closed
   when the BIO is destroyed. */
file = fopen("filename.ext", "r+");
bio = BIO_new(BIO_s_file(  ));
BIO_set_file(bio, file, BIO_CLOSE);
 
/* Create an unbuffered file BIO with an existing file descriptor that will not
   be closed when the BIO is destroyed. */
fd = open("filename.ext", O_RDWR);
bio = BIO_new(BIO_s_fd(  ));
BIO_set_fd(bio, fd, BIO_NOCLOSE);
 
/* Create a buffered file BIO with a new FILE object owned by the BIO */
bio = BIO_new_file("filename.ext", "w");
 
/* Create an unbuffered file BIO with an existing file descriptor that will be
   closed when the BIO is destroyed. */
fd = open("filename.ext", O_RDONLY);
bio = BIO_new_fd(fd, BIO_CLOSE);

Socket sources/sinks

There are three types of socket BIOs. The simplest is a socket BIO that must have an already existing socket descriptor attached to it. Such a BIO can be created using BIO_s_socket to obtain a BIO_METHOD object suitable for use with BIO_new and BIO_set. The socket descriptor can then be attached to the BIO using BIO_set_fd . This type of BIO works almost like an unbuffered file BIO. Alternatively, BIO_new_socket can be used in the same way that BIO_new_fd works for unbuffered file BIOs.

The second type of BIO socket is a connection socket. This type of BIO creates a new socket that is initially unconnected. The IP address and port to connect to must be set, and the connection established before data can be read from or written to the BIO. BIO_s_connect is used to obtain a BIO_METHOD object suitable for use with BIO_new and BIO_set. To set the address, either BIO_set_conn_hostname can be used to set the hostname or BIO_set_conn_ip can be used to set the IP address in dotted decimal form. Both functions take the connection address as a C-style string. The port to connect to is set using BIO_set_conn_port or BIO_set_conn_int_port . The difference between the two is that BIO_set_conn_port takes the port number as a string, which can be either a port number or a service name such as "http" or "https", and BIO_set_conn_int_port takes the port number as an integer. Once the address and port are set for making a connection, an attempt to establish a connection can be made via BIO_do_connect . Once a connection is successfully established, the BIO can be used just as if it was a plain socket BIO.

The third type of BIO socket is an accept socket. This type of BIO creates a new socket that will listen for incoming connections and accept them. When a connection is established, a new BIO object is created that is bound to the accepted socket. The new BIO object is chained to the original BIO and should be disconnected from the chain before use. Data can be read or written with the new BIO object. The original BIO object can then be used to accept more connections.

In order to create an accept socket type of socket BIO, use BIO_s_accept to obtain a BIO_METHOD object suitable for use with BIO_new and BIO_set. The port used to listen for connections must be set before the BIO can be placed into listening mode. This can be done using BIO_set_accept_port , which accepts the port as a string. The port can be either a number or the name of a service, just like with BIO_set_conn_port . Once the port is set, BIO_do_accept will place the BIO's socket into listening mode. Successive calls to BIO_do_accept will block until a new connection is established. Example 4-6 demonstrates.

Example 4-6. Creating a socket BIO

/* Create a socket BIO attached to an already existing socket descriptor.  The
   socket descriptor will not be closed when the BIO is destroyed. */
bio = BIO_new(BIO_s_socket(  ));
BIO_set_fd(bio, sd, BIO_NOCLOSE);
 
/* Create a socket BIO attached to an already existing socket descriptor.  The
   socket descriptor will be closed when the BIO is destroyed. */
bio = BIO_new_socket(sd, BIO_CLOSE);
 
/* Create a socket BIO to establish a connection to a remote host. */
bio = BIO_new(BIO_s_connect(  ));
BIO_set_conn_hostname(bio, "www.ora.com");
BIO_set_conn_port(bio, "http");
BIO_do_connect(bio);
 
/* Create a socket BIO to listen for an incoming connection. */
bio = BIO_new(BIO_s_accept(  ));
BIO_set_accept_port(bio, "https");
BIO_do_accept(bio); /* place the underlying socket into listening mode */
for (;;)
{
    BIO_do_accept(bio); /* wait for a new connection */
    new_bio = BIO_pop(bio);
    /* new_bio now behaves like a BIO_s_socket(  ) BIO */
}

BIO pairs

The final type of source/sink BIO that we'll discuss is a BIO pair. A BIO pair is similar to an anonymous pipe,[1] but does have one important difference. In a BIO pair, two source/sink BIOs are bound together as peers so that anything written to one can be read from the other. Similarly, an anonymous pipe creates two endpoints, but only one can be written to, and the other is read from. Both endpoints of a BIO pair can be read to and written from.

A BIO pair can be formed by joining two already existing BIO objects, or two new BIO objects can be created in a joined state. The function BIO_make_bio_pair will join two existing BIO objects created using the BIO_METHOD object returned from the BIO_s_bio function. It accepts two parameters, each one a BIO that will be an endpoint in the resultant pair. When a BIO is created using BIO_s_bio to obtain a BIO_METHOD suitable for use with BIO_new, it must be assigned a buffer with a call to BIO_set_write_buf_size , which accepts two parameters. The first is the BIO to assign the buffer to, and the second is the size in bytes of the buffer to be assigned.

New BIO objects can be created already joined with the convenience function BIO_new_bio_pair, which accepts four parameters. The first and third parameters are pointers to BIO objects that will receive a pointer to each newly created BIO object. The second and fourth parameters are the sizes of the buffers to be assigned to each half of the BIO pair. If an error occurs, such as an out of memory condition, the function will return zero; otherwise, it will return nonzero.

The function BIO_destroy_bio_pair will sever the pairing of the two endpoints in a BIO pair. This function is useful when you want to break up a pair and reassign one or both of the endpoints to other potential endpoints. The function accepts one parameter, which is one of the endpoints in a pair. It should only be called on one half of a pair, not both. Calling BIO_free will also cleanly sever a pair, but will only free the one endpoint of the pair that is passed to it.

One of the useful features of BIO pairs is their ability to use the SSL engine (which requires the use of BIO objects) while maintaining control over the low-level IO primitives. For example, you could provide an endpoint of a BIO pair to the SSL engine for reading and writing, and then use the other end of the endpoint to read and write the data however you wish. In other words, if the SSL engine writes to the BIO, you can read that data from the other endpoint and do what you wish with it. Likewise, when the SSL engine needs to read data, you write to the other endpoint, and the SSL engine will read it. Included in the OpenSSL distribution is a test application (the source file is ssl/ssltest.c) that is a good example of how to use BIO pairs. It implements a client and a server in the same application. The client and the server talk to each other within the same application without requiring sockets or some other low-level communication mechanism. Example 4-7 demonstrates how BIO pairs can be created, detached, and reattached.

Example 4-7. Creating BIO pairs

a = BIO_new(BIO_s_bio(  ));
BIO_set_write_buf_size(a, 4096);
b = BIO_new(BIO_s_bio(  ));
BIO_set_write_buf_size(b, 4096);
BIO_make_bio_pair(a, b);
 
BIO_new_bio_pair(&a, 8192, &b, 8192);
 
c = BIO_new(BIO_s_bio(  ));
BIO_set_write_buf_size(c, 1024);
BIO_destroy_bio_pair(a); /* disconnect a from b */
BIO_make_bio_pair(a, c);

Filter BIOs

A filter BIO by itself provides no utility. It must be chained with a source/sink BIO and possibly other filter BIOs to be useful. The ability to chain filters with other BIOs is perhaps the most powerful feature of OpenSSL's BIO package, and it provides a great deal of flexibility. A filter BIO often performs some kind of translation of data before writing to or after reading from a concrete medium, such as a file or socket.

Creating BIO chains is reasonably simple and straightforward; however, care must be taken to keep track of the BIO that is at the end of the chain so that the chain can be manipulated and destroyed safely. If you destroy a BIO that is in the middle of a chain without first removing it from the chain, it's a safe bet that your program will crash shortly thereafter. As we mentioned earlier, the BIO package is one of OpenSSL's lower-level packages, and as such, little error checking is done. This places the burden on the programmer to be sure that any operations performed on a BIO chain are both legal and error-free.

When creating a chain, you must also ensure that you create the chain in the proper order. For example, if you use filters that perform base64 conversion and encryption, you would probably want to perform base64 encoding after encryption, not before. It's also important to ensure that your source/sink BIO is at the end of the chain. If it's not, none of the filters in the chain will be used.

The interface for creating a filter BIO is similar to creating source/sink BIO. BIO_new is used to create a new BIO with the appropriate BIO_METHOD object. Filter BIOs are provided by OpenSSL for performing encryption and decryption, base64 encoding and decoding, computing message digests, and buffering. There are a handful of others as well, but they are of limited use, since they are either platform-specific or meant for testing the BIO package.

The function shown in Example 4-8 can be used to write data to a file using the BIO package. What's interesting about the function is that it creates a chain of four BIOs. The result is that the data written to the file is encrypted and base64 encoded with the base64 encoding performed after the data is encrypted. The data is first encrypted using outer triple CBC DES and the specified key. The encrypted data is then base64-encoded before it is written to the file through an in-memory buffer. The in-memory buffer is used because triple CBC DES is a block cipher, and the two filters cooperate to ensure that the cipher's blocks are filled and padded properly. Chapter 6 discusses symmetric ciphers in detail.

Example 4-8. Assembling and using a BIO chain

int write_data(const char *filename, char *out, int len, unsigned char *key)
{
    int total, written;
    BIO *cipher, *b64, *buffer, *file;
 
    /* Create a buffered file BIO for writing */
    file = BIO_new_file(filename, "w");
    if (!file)
        return 0;
 
    /* Create a buffering filter BIO to buffer writes to the file */
    buffer = BIO_new(BIO_f_buffer(  ));
 
    /* Create a base64 encoding filter BIO */
    b64 = BIO_new(BIO_f_base64(  ));
 
    /* Create the cipher filter BIO and set the key.  The last parameter of
       BIO_set_cipher is 1 for encryption and 0 for decryption */
    cipher = BIO_new(BIO_f_cipher(  ));
    BIO_set_cipher(cipher, EVP_des_ede3_cbc(  ), key, NULL, 1);
 
    /* Assemble the BIO chain to be in the order cipher-b64-buffer-file */
    BIO_push(cipher, b64);
    BIO_push(b64, buffer);
    BIO_push(buffer, file);
 
    /* This loop writes the data to the file.  It checks for errors as if the
       underlying file were non-blocking */
    for (total = 0;  total < len;  total += written)
    {
        if ((written = BIO_write(cipher, out + total, len - total)) <= 0)
        {
            if (BIO_should_retry(cipher))
            {
                written = 0;
                continue;
            }
            break;
        }
    }
 
    /* Ensure all of our data is pushed all the way to the file */
    BIO_flush(cipher);
 
    /* We now need to free the BIO chain. A call to BIO_free_all(cipher) would
       accomplish this, but we'll first remove b64 from the chain for
       demonstration purposes. */
    BIO_pop(b64);
 
    /* At this point the b64 BIO is isolated and the chain is cipher-buffer-file.
       The following frees all of that memory */
    BIO_free(b64);
    BIO_free_all(cipher);
}


[1] An anonymous pipe is a common operating system construct in which two file descriptors are created, but no file is created or socket opened. The two descriptors are connected to each other where one can be written to and the other read from. The data written to one half of the pipe can be read from the other half of the pipe.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required