Direct Memory Access and Bus Mastering

Direct memory access, or DMA, is the advanced topic that completes our overview of memory issues. DMA is the hardware mechanism that allows peripheral components to transfer their I/O data directly to and from main memory without the need for the system processor to be involved in the transfer. Use of this mechanism can greatly increase throughput to and from a device, because a great deal of computational overhead is eliminated.

To exploit the DMA capabilities of its hardware, the device driver needs to be able to correctly set up the DMA transfer and synchronize with the hardware. Unfortunately, because of its hardware nature, DMA is very system dependent. Each architecture has its own techniques to manage DMA transfers, and the programming interface is different for each. The kernel can’t offer a unified interface, either, because a driver can’t abstract too much from the underlying hardware mechanisms. Some steps have been made in that direction, however, in recent kernels.

This chapter concentrates mainly on the PCI bus, since it is currently the most popular peripheral bus available. Many of the concepts are more widely applicable, though. We also touch on how some other buses, such as ISA and SBus, handle DMA.

Overview of a DMA Data Transfer

Before introducing the programming details, let’s review how a DMA transfer takes place, considering only input transfers to simplify the discussion.

Data transfer can be triggered in two ways: either the software asks for data (via a function such as read) or the hardware asynchronously pushes data to the system.

In the first case, the steps involved can be summarized as follows:

  1. When a process calls read, the driver method allocates a DMA buffer and instructs the hardware to transfer its data. The process is put to sleep.

  2. The hardware writes data to the DMA buffer and raises an interrupt when it’s done.

  3. The interrupt handler gets the input data, acknowledges the interrupt, and awakens the process, which is now able to read data.

The second case comes about when DMA is used asynchronously. This happens, for example, with data acquisition devices that go on pushing data even if nobody is reading them. In this case, the driver should maintain a buffer so that a subsequent read call will return all the accumulated data to user space. The steps involved in this kind of transfer are slightly different:

  1. The hardware raises an interrupt to announce that new data has arrived.

  2. The interrupt handler allocates a buffer and tells the hardware where to transfer its data.

  3. The peripheral device writes the data to the buffer and raises another interrupt when it’s done.

  4. The handler dispatches the new data, wakes any relevant process, and takes care of housekeeping.

A variant of the asynchronous approach is often seen with network cards. These cards often expect to see a circular buffer (often called a DMA ring buffer) established in memory shared with the processor; each incoming packet is placed in the next available buffer in the ring, and an interrupt is signaled. The driver then passes the network packets to the rest of the kernel, and places a new DMA buffer in the ring.

The processing steps in all of these cases emphasize that efficient DMA handling relies on interrupt reporting. While it is possible to implement DMA with a polling driver, it wouldn’t make sense, because a polling driver would waste the performance benefits that DMA offers over the easier processor-driven I/O.

Another relevant item introduced here is the DMA buffer. To exploit direct memory access, the device driver must be able to allocate one or more special buffers, suited to DMA. Note that many drivers allocate their buffers at initialization time and use them until shutdown—the word allocate in the previous lists therefore means “get hold of a previously allocated buffer.”

Allocating the DMA Buffer

This section covers the allocation of DMA buffers at a low level; we will introduce a higher-level interface shortly, but it is still a good idea to understand the material presented here.

The main problem with the DMA buffer is that when it is bigger than one page, it must occupy contiguous pages in physical memory because the device transfers data using the ISA or PCI system bus, both of which carry physical addresses. It’s interesting to note that this constraint doesn’t apply to the SBus (see Section 15.5 in Chapter 15), which uses virtual addresses on the peripheral bus. Some architectures can also use virtual addresses on the PCI bus, but a portable driver cannot count on that capability.

Although DMA buffers can be allocated either at system boot or at runtime, modules can only allocate their buffers at runtime. Chapter 7 introduced these techniques: Section 7.5 talked about allocation at system boot, while Section 7.1 and Section 7.3 described allocation at runtime. Driver writers must take care to allocate the right kind of memory when it will be used for DMA operations—not all memory zones are suitable. In particular, high memory will not work for DMA on most systems—the peripherals simply cannot work with addresses that high.

Most devices on modern buses can handle 32-bit addresses, meaning that normal memory allocations will work just fine for them. Some PCI devices, however, fail to implement the full PCI standard and cannot work with 32-bit addresses. And ISA devices, of course, are limited to 16-bit addresses only.

For devices with this kind of limitation, memory should be allocated from the DMA zone by adding the GFP_DMA flag to the kmalloc or get_free_pages call. When this flag is present, only memory that can be addressed with 16 bits will be allocated.

Do-it-yourself allocation

We have seen how get_free_pages (and therefore kmalloc) can’t return more than 128 KB (or, more generally, 32 pages) of consecutive memory space. But the request is prone to fail even when the allocated buffer is less than 128 KB, because system memory becomes fragmented over time.[52]

When the kernel cannot return the requested amount of memory, or when you need more than 128 KB (a common requirement for PCI frame grabbers, for example), an alternative to returning -ENOMEM is to allocate memory at boot time or reserve the top of physical RAM for your buffer. We described allocation at boot time in Section 7.5 in Chapter 7, but it is not available to modules. Reserving the top of RAM is accomplished by passing a mem= argument to the kernel at boot time. For example, if you have 32 MB, the argument mem=31M keeps the kernel from using the top megabyte. Your module could later use the following code to gain access to such memory:

dmabuf = ioremap( 0x1F00000 /* 31M */, 0x100000 /* 1M */);

Actually, there is another way to allocate DMA space: perform aggressive allocation until you are able to get enough consecutive pages to make a buffer. We strongly discourage this allocation technique if there’s any other way to achieve your goal. Aggressive allocation results in high machine load, and possibly in a system lockup if your aggressiveness isn’t correctly tuned. On the other hand, sometimes there is no other way available.

In practice, the code invokes kmalloc(GFP_ATOMIC) until the call fails; it then waits until the kernel frees some pages, and then allocates everything once again. If you keep an eye on the pool of allocated pages, sooner or later you’ll find that your DMA buffer of consecutive pages has appeared; at this point you can release every page but the selected buffer. This kind of behavior is rather risky, though, because it may lead to a deadlock. We suggest using a kernel timer to release every page in case allocation doesn’t succeed before a timeout expires.

We’re not going to show the code here, but you’ll find it in misc-modules/allocator.c; the code is thoroughly commented and designed to be called by other modules. Unlike every other source accompanying this book, the allocator is covered by the GPL. The reason we decided to put the source under the GPL is that it is neither particularly beautiful nor particularly clever, and if someone is going to use it, we want to be sure that the source is released with the module.

Bus Addresses

A device driver using DMA has to talk to hardware connected to the interface bus, which uses physical addresses, whereas program code uses virtual addresses.

As a matter of fact, the situation is slightly more complicated than that. DMA-based hardware uses bus, rather than physical, addresses. Although ISA and PCI addresses are simply physical addresses on the PC, this is not true for every platform. Sometimes the interface bus is connected through bridge circuitry that maps I/O addresses to different physical addresses. Some systems even have a page-mapping scheme that can make arbitrary pages appear contiguous to the peripheral bus.

At the lowest level (again, we’ll look at a higher-level solution shortly), the Linux kernel provides a portable solution by exporting the following functions, defined in <asm/io.h>:

unsigned long virt_to_bus(volatile void * address);
void * bus_to_virt(unsigned long address);

The virt_to_bus conversion must be used when the driver needs to send address information to an I/O device (such as an expansion board or the DMA controller), while bus_to_virt must be used when address information is received from hardware connected to the bus.

DMA on the PCI Bus

The 2.4 kernel includes a flexible mechanism that supports PCI DMA (also known as bus mastering). It handles the details of buffer allocation and can deal with setting up the bus hardware for multipage transfers on hardware that supports them. This code also takes care of situations in which a buffer lives in a non-DMA-capable zone of memory, though only on some platforms and at a computational cost (as we will see later).

The functions in this section require a struct pci_dev structure for your device. The details of setting up a PCI device are covered in Chapter 15. Note, however, that the routines described here can also be used with ISA devices; in that case, the struct pci_dev pointer should simply be passed in as NULL.

Drivers that use the following functions should include <linux/pci.h>.

Dealing with difficult hardware

The first question that must be answered before performing DMA is whether the given device is capable of such operation on the current host. Many PCI devices fail to implement the full 32-bit bus address space, often because they are modified versions of old ISA hardware. The Linux kernel will attempt to work with such devices, but it is not always possible.

The function pci_dma_supported should be called for any device that has addressing limitations:

int pci_dma_supported(struct pci_dev *pdev, dma_addr_t mask);

Here, mask is a simple bit mask describing which address bits the device can successfully use. If the return value is nonzero, DMA is possible, and your driver should set the dma_mask field in the PCI device structure to the mask value. For a device that can only handle 16-bit addresses, you might use a call like this:

if (pci_dma_supported (pdev, 0xffff))
    pdev->dma_mask = 0xffff;
else {
    card->use_dma = 0;   /* We'll have to live without DMA */
    printk (KERN_WARN, "mydev: DMA not supported\n");
}

As of kernel 2.4.3, a new function, pci_set_dma_mask, has been provided. This function has the following prototype:

int pci_set_dma_mask(struct pci_dev *pdev, dma_addr_t mask);

If DMA can be supported with the given mask, this function returns 0 and sets the dma_mask field; otherwise, -EIO is returned.

For devices that can handle 32-bit addresses, there is no need to call pci_dma_supported.

DMA mappings

A DMA mapping is a combination of allocating a DMA buffer and generating an address for that buffer that is accessible by the device. In many cases, getting that address involves a simple call to virt_to_bus; some hardware, however, requires that mapping registers be set up in the bus hardware as well. Mapping registers are an equivalent of virtual memory for peripherals. On systems where these registers are used, peripherals have a relatively small, dedicated range of addresses to which they may perform DMA. Those addresses are remapped, via the mapping registers, into system RAM. Mapping registers have some nice features, including the ability to make several distributed pages appear contiguous in the device’s address space. Not all architectures have mapping registers, however; in particular, the popular PC platform has no mapping registers.

Setting up a useful address for the device may also, in some cases, require the establishment of a bounce buffer. Bounce buffers are created when a driver attempts to perform DMA on an address that is not reachable by the peripheral device—a high-memory address, for example. Data is then copied to and from the bounce buffer as needed. Making code work properly with bounce buffers requires adherence to some rules, as we will see shortly.

The DMA mapping sets up a new type, dma_addr_t, to represent bus addresses. Variables of type dma_addr_t should be treated as opaque by the driver; the only allowable operations are to pass them to the DMA support routines and to the device itself.

The PCI code distinguishes between two types of DMA mappings, depending on how long the DMA buffer is expected to stay around:

Consistent DMA mappings

These exist for the life of the driver. A consistently mapped buffer must be simultaneously available to both the CPU and the peripheral (other types of mappings, as we will see later, can be available only to one or the other at any given time). The buffer should also, if possible, not have caching issues that could cause one not to see updates made by the other.

Streaming DMA mappings

These are set up for a single operation. Some architectures allow for significant optimizations when streaming mappings are used, as we will see, but these mappings also are subject to a stricter set of rules in how they may be accessed. The kernel developers recommend the use of streaming mappings over consistent mappings whenever possible. There are two reasons for this recommendation. The first is that, on systems that support them, each DMA mapping uses one or more mapping registers on the bus. Consistent mappings, which have a long lifetime, can monopolize these registers for a long time, even when they are not being used. The other reason is that, on some hardware, streaming mappings can be optimized in ways that are not available to consistent mappings.

The two mapping types must be manipulated in different ways; it’s time to look at the details.

Setting up consistent DMA mappings

A driver can set up a consistent mapping with a call to pci_alloc_consistent:

void *pci_alloc_consistent(struct pci_dev *pdev, size_t size,
                           dma_addr_t *bus_addr);

This function handles both the allocation and the mapping of the buffer. The first two arguments are our PCI device structure and the size of the needed buffer. The function returns the result of the DMA mapping in two places. The return value is a kernel virtual address for the buffer, which may be used by the driver; the associated bus address, instead, is returned in bus_addr. Allocation is handled in this function so that the buffer will be placed in a location that works with DMA; usually the memory is just allocated with get_free_pages (but note that the size is in bytes, rather than an order value).

Most architectures that support PCI perform the allocation at the GFP_ATOMIC priority, and thus do not sleep. The ARM port, however, is an exception to this rule.

When the buffer is no longer needed (usually at module unload time), it should be returned to the system with pci_free_consistent:

void pci_free_consistent(struct pci_dev *pdev, size_t size,
                         void *cpu_addr, dma_handle_t bus_addr);

Note that this function requires that both the CPU address and the bus address be provided.

Setting up streaming DMA mappings

Streaming mappings have a more complicated interface than the consistent variety, for a number of reasons. These mappings expect to work with a buffer that has already been allocated by the driver, and thus have to deal with addresses that they did not choose. On some architectures, streaming mappings can also have multiple, discontiguous pages and multipart “scatter-gather” buffers.

When setting up a streaming mapping, you must tell the kernel in which direction the data will be moving. Some symbols have been defined for this purpose:

PCI_DMA_TODEVICE , PCI_DMA_FROMDEVICE

These two symbols should be reasonably self-explanatory. If data is being sent to the device (in response, perhaps, to a write system call), PCI_DMA_TODEVICE should be used; data going to the CPU, instead, will be marked with PCI_DMA_FROMDEVICE.

PCI_DMA_BIDIRECTIONAL

If data can move in either direction, use PCI_DMA_BIDIRECTIONAL.

PCI_DMA_NONE

This symbol is provided only as a debugging aid. Attempts to use buffers with this “direction” will cause a kernel panic.

For a number of reasons that we will touch on shortly, it is important to pick the right value for the direction of a streaming DMA mapping. It may be tempting to just pick PCI_DMA_BIDIRECTIONAL at all times, but on some architectures there will be a performance penalty to pay for that choice.

When you have a single buffer to transfer, map it with pci_map_single:

dma_addr_t pci_map_single(struct pci_dev *pdev, void *buffer,
                          size_t size, int direction);

The return value is the bus address that you can pass to the device, or NULL if something goes wrong.

Once the transfer is complete, the mapping should be deleted with pci_unmap_single:

void pci_unmap_single(struct pci_dev *pdev, dma_addr_t bus_addr,
                      size_t size, int direction);

Here, the size and direction arguments must match those used to map the buffer.

There are some important rules that apply to streaming DMA mappings:

  • The buffer must be used only for a transfer that matches the direction value given when it was mapped.

  • Once a buffer has been mapped, it belongs to the device, not the processor. Until the buffer has been unmapped, the driver should not touch its contents in any way. Only after pci_unmap_single has been called is it safe for the driver to access the contents of the buffer (with one exception that we’ll see shortly). Among other things, this rule implies that a buffer being written to a device cannot be mapped until it contains all the data to write.

  • The buffer must not be unmapped while DMA is still active, or serious system instability is guaranteed.

You may be wondering why the driver can no longer work with a buffer once it has been mapped. There are actually two reasons why this rule makes sense. First, when a buffer is mapped for DMA, the kernel must ensure that all of the data in that buffer has actually been written to memory. It is likely that some data will remain in the processor’s cache, and must be explicitly flushed. Data written to the buffer by the processor after the flush may not be visible to the device.

Second, consider what happens if the buffer to be mapped is in a region of memory that is not accessible to the device. Some architectures will simply fail in this case, but others will create a bounce buffer. The bounce buffer is just a separate region of memory that is accessible to the device. If a buffer is mapped with a direction of PCI_DMA_TODEVICE, and a bounce buffer is required, the contents of the original buffer will be copied as part of the mapping operation. Clearly, changes to the original buffer after the copy will not be seen by the device. Similarly, PCI_DMA_FROMDEVICE bounce buffers are copied back to the original buffer by pci_unmap_single; the data from the device is not present until that copy has been done.

Incidentally, bounce buffers are one reason why it is important to get the direction right. PCI_DMA_BIDIRECTIONAL bounce buffers are copied before and after the operation, which is often an unnecessary waste of CPU cycles.

Occasionally a driver will need to access the contents of a streaming DMA buffer without unmapping it. A call has been provided to make this possible:

void pci_sync_single(struct pci_dev *pdev, dma_handle_t bus_addr,
                     size_t size, int direction);

This function should be called before the processor accesses a PCI_DMA_FROMDEVICE buffer, and after an access to a PCI_DMA_TODEVICE buffer.

Scatter-gather mappings

Scatter-gather mappings are a special case of streaming DMA mappings. Suppose you have several buffers, all of which need to be transferred to or from the device. This situation can come about in several ways, including from a readv or writev system call, a clustered disk I/O request, or a list of pages in a mapped kernel I/O buffer. You could simply map each buffer in turn and perform the required operation, but there are advantages to mapping the whole list at once.

One reason is that some smart devices can accept a scatterlist of array pointers and lengths and transfer them all in one DMA operation; for example, “zero-copy” networking is easier if packets can be built in multiple pieces. Linux is likely to take much better advantage of such devices in the future. Another reason to map scatterlists as a whole is to take advantage of systems that have mapping registers in the bus hardware. On such systems, physically discontiguous pages can be assembled into a single, contiguous array from the device’s point of view. This technique works only when the entries in the scatterlist are equal to the page size in length (except the first and last), but when it does work it can turn multiple operations into a single DMA and speed things up accordingly.

Finally, if a bounce buffer must be used, it makes sense to coalesce the entire list into a single buffer (since it is being copied anyway).

So now you’re convinced that mapping of scatterlists is worthwhile in some situations. The first step in mapping a scatterlist is to create and fill in an array of struct scatterlist describing the buffers to be transferred. This structure is architecture dependent, and is described in <linux/scatterlist.h>. It will always contain two fields, however:

char *address;

The address of a buffer used in the scatter/gather operation

unsigned int length;

The length of that buffer

To map a scatter/gather DMA operation, your driver should set the address and length fields in a struct scatterlist entry for each buffer to be transferred. Then call:

int pci_map_sg(struct pci_dev *pdev, struct scatterlist *list,
               int nents, int direction);

The return value will be the number of DMA buffers to transfer; it may be less than nents, the number of scatterlist entries passed in.

Your driver should transfer each buffer returned by pci_map_sg. The bus address and length of each buffer will be stored in the struct scatterlist entries, but their location in the structure varies from one architecture to the next. Two macros have been defined to make it possible to write portable code:

dma_addr_t sg_dma_address(struct scatterlist *sg);

Returns the bus (DMA) address from this scatterlist entry

unsigned int sg_dma_len(struct scatterlist *sg);

Returns the length of this buffer

Again, remember that the address and length of the buffers to transfer may be different from what was passed in to pci_map_sg.

Once the transfer is complete, a scatter-gather mapping is unmapped with a call to pci_unmap_sg:

void pci_unmap_sg(struct pci_dev *pdev, struct scatterlist *list,
                  int nents, int direction);

Note that nents must be the number of entries that you originally passed to pci_map_sg, and not the number of DMA buffers that function returned to you.

Scatter-gather mappings are streaming DMA mappings, and the same access rules apply to them as to the single variety. If you must access a mapped scatter-gather list, you must synchronize it first:

void pci_dma_sync_sg(struct pci_dev *pdev, struct scatterlist *sg,
                     int nents, int direction);

How different architectures support PCI DMA

As we stated at the beginning of this section, DMA is a very hardware-specific operation. The PCI DMA interface we have just described attempts to abstract out as many hardware dependencies as possible. There are still some things that show through, however.

M68K , S/390 , Super-H

These architectures do not support the PCI bus as of 2.4.0.

IA-32 (x86) , MIPS , PowerPC , ARM

These platforms support the PCI DMA interface, but it is mostly a false front. There are no mapping registers in the bus interface, so scatterlists cannot be combined and virtual addresses cannot be used. There is no bounce buffer support, so mapping of high-memory addresses cannot be done. The mapping functions on the ARM architecture can sleep, which is not the case for the other platforms.

IA-64

The Itanium architecture also lacks mapping registers. This 64-bit architecture can easily generate addresses that PCI peripherals cannot use, though. The PCI interface on this platform thus implements bounce buffers, allowing any address to be (seemingly) used for DMA operations.

Alpha , MIPS64 , SPARC

These architectures support an I/O memory management unit. As of 2.4.0, the MIPS64 port does not actually make use of this capability, so its PCI DMA implementation looks like that of the IA-32. The Alpha and SPARC ports, though, can do full-buffer mapping with proper scatter-gather support.

The differences listed will not be problems for most driver writers, as long as the interface guidelines are followed.

A simple PCI DMA example

The actual form of DMA operations on the PCI bus is very dependent on the device being driven. Thus, this example does not apply to any real device; instead, it is part of a hypothetical driver called dad (DMA Acquisition Device). A driver for this device might define a transfer function like this:

int dad_transfer(struct dad_dev *dev, int write, void *buffer, 
                 size_t count)
{
    dma_addr_t bus_addr;
    unsigned long flags;

    /* Map the buffer for DMA */
    dev->dma_dir = (write ? PCI_DMA_TODEVICE : PCI_DMA_FROMDEVICE);
    dev->dma_size = count;
    bus_addr = pci_map_single(dev->pci_dev, buffer, count, 
                              dev->dma_dir);
    dev->dma_addr = bus_addr;

    /* Set up the device */
    writeb(dev->registers.command, DAD_CMD_DISABLEDMA);
    writeb(dev->registers.command, write ? DAD_CMD_WR : DAD_CMD_RD);
    writel(dev->registers.addr, cpu_to_le32(bus_addr));
    writel(dev->registers.len, cpu_to_le32(count));

    /* Start the operation */
    writeb(dev->registers.command, DAD_CMD_ENABLEDMA);
    return 0;
}

This function maps the buffer to be transferred and starts the device operation. The other half of the job must be done in the interrupt service routine, which would look something like this:

void dad_interrupt(int irq, void *dev_id, struct pt_regs *regs)
{
    struct dad_dev *dev = (struct dad_dev *) dev_id;

    /* Make sure it's really our device interrupting */

    /* Unmap the DMA buffer */
    pci_unmap_single(dev->pci_dev, dev->dma_addr, dev->dma_size,
		         dev->dma_dir);

    /* Only now is it safe to access the buffer, copy to user, etc. */
    ...
}

Obviously a great deal of detail has been left out of this example, including whatever steps may be required to prevent attempts to start multiple simultaneous DMA operations.

A quick look at SBus

SPARC-based systems have traditionally included a Sun-designed bus called the SBus. This bus is beyond the scope of this chapter, but a quick mention is worthwhile. There is a set of functions (declared in <asm/sbus.h>) for performing DMA mappings on the SBus; they have names like sbus_alloc_consistent and sbus_map_sg. In other words, the SBus DMA API looks almost exactly like the PCI interface. A detailed look at the function definitions will be required before working with DMA on the SBus, but the concepts will match those discussed earlier for the PCI bus.

DMA for ISA Devices

The ISA bus allows for two kinds of DMA transfers: native DMA and ISA bus master DMA. Native DMA uses standard DMA-controller circuitry on the motherboard to drive the signal lines on the ISA bus. ISA bus master DMA, on the other hand, is handled entirely by the peripheral device. The latter type of DMA is rarely used and doesn’t require discussion here because it is similar to DMA for PCI devices, at least from the driver’s point of view. An example of an ISA bus master is the 1542 SCSI controller, whose driver is drivers/scsi/aha1542.c in the kernel sources.

As far as native DMA is concerned, there are three entities involved in a DMA data transfer on the ISA bus:

The 8237 DMA controller (DMAC)

The controller holds information about the DMA transfer, such as the direction, the memory address, and the size of the transfer. It also contains a counter that tracks the status of ongoing transfers. When the controller receives a DMA request signal, it gains control of the bus and drives the signal lines so that the device can read or write its data.

The peripheral device

The device must activate the DMA request signal when it’s ready to transfer data. The actual transfer is managed by the DMAC; the hardware device sequentially reads or writes data onto the bus when the controller strobes the device. The device usually raises an interrupt when the transfer is over.

The device driver

The driver has little to do: it provides the DMA controller with the direction, bus address, and size of the transfer. It also talks to its peripheral to prepare it for transferring the data and responds to the interrupt when the DMA is over.

The original DMA controller used in the PC could manage four “channels,” each associated with one set of DMA registers. Four devices could store their DMA information in the controller at the same time. Newer PCs contain the equivalent of two DMAC devices:[53] the second controller (master) is connected to the system processor, and the first (slave) is connected to channel 0 of the second controller.[54]

The channels are numbered from 0 to 7; channel 4 is not available to ISA peripherals because it is used internally to cascade the slave controller onto the master. The available channels are thus 0 to 3 on the slave (the 8-bit channels) and 5 to 7 on the master (the 16-bit channels). The size of any DMA transfer, as stored in the controller, is a 16-bit number representing the number of bus cycles. The maximum transfer size is therefore 64 KB for the slave controller and 128 KB for the master.

Because the DMA controller is a system-wide resource, the kernel helps deal with it. It uses a DMA registry to provide a request-and-free mechanism for the DMA channels and a set of functions to configure channel information in the DMA controller.

Registering DMA usage

You should be used to kernel registries—we’ve already seen them for I/O ports and interrupt lines. The DMA channel registry is similar to the others. After <asm/dma.h> has been included, the following functions can be used to obtain and release ownership of a DMA channel:

int request_dma(unsigned int channel, const char *name); 
void free_dma(unsigned int channel);

The channel argument is a number between 0 and 7 or, more precisely, a positive number less than MAX_DMA_CHANNELS. On the PC, MAX_DMA_CHANNELS is defined as 8, to match the hardware. The name argument is a string identifying the device. The specified name appears in the file /proc/dma, which can be read by user programs.

The return value from request_dma is 0 for success and -EINVAL or -EBUSY if there was an error. The former means that the requested channel is out of range, and the latter means that another device is holding the channel.

We recommend that you take the same care with DMA channels as with I/O ports and interrupt lines; requesting the channel at open time is much better than requesting it from the module initialization function. Delaying the request allows some sharing between drivers; for example, your sound card and your analog I/O interface can share the DMA channel as long as they are not used at the same time.

We also suggest that you request the DMA channel after you’ve requested the interrupt line and that you release it before the interrupt. This is the conventional order for requesting the two resources; following the convention avoids possible deadlocks. Note that every device using DMA needs an IRQ line as well; otherwise, it couldn’t signal the completion of data transfer.

In a typical case, the code for open looks like the following, which refers to our hypothetical dad module. The dad device as shown uses a fast interrupt handler without support for shared IRQ lines.

int dad_open (struct inode *inode, struct file *filp)
{
    struct dad_device *my_device; 

    /* ... */
    if ( (error = request_irq(my_device.irq, dad_interrupt,
                              SA_INTERRUPT, "dad", NULL)) )
        return error; /* or implement blocking open */

    if ( (error = request_dma(my_device.dma, "dad")) ) {
        free_irq(my_device.irq, NULL);
        return error; /* or implement blocking open */
    }
    /* ... */
    return 0;
}

The close implementation that matches the open just shown looks like this:

void dad_close (struct inode *inode, struct file *filp)
{
    struct dad_device *my_device;

    /* ... */
    free_dma(my_device.dma);
    free_irq(my_device.irq, NULL);
    /* ... */
}

As far as /proc/dma is concerned, here’s how the file looks on a system with the sound card installed:

merlino% cat /proc/dma
 1: Sound Blaster8
 4: cascade

It’s interesting to note that the default sound driver gets the DMA channel at system boot and never releases it. The cascade entry shown is a placeholder, indicating that channel 4 is not available to drivers, as explained earlier.

Talking to the DMA controller

After registration, the main part of the driver’s job consists of configuring the DMA controller for proper operation. This task is not trivial, but fortunately the kernel exports all the functions needed by the typical driver.

The driver needs to configure the DMA controller either when read or write is called, or when preparing for asynchronous transfers. This latter task is performed either at open time or in response to an ioctl command, depending on the driver and the policy it implements. The code shown here is the code that is typically called by the read or write device methods.

This subsection provides a quick overview of the internals of the DMA controller so you will understand the code introduced here. If you want to learn more, we’d urge you to read <asm/dma.h> and some hardware manuals describing the PC architecture. In particular, we don’t deal with the issue of 8-bit versus 16-bit data transfers. If you are writing device drivers for ISA device boards, you should find the relevant information in the hardware manuals for the devices.

The DMA controller is a shared resource, and confusion could arise if more than one processor attempts to program it simultaneously. For that reason, the controller is protected by a spinlock, called dma_spin_lock. Drivers should not manipulate the lock directly, however; two functions have been provided to do that for you:

unsigned long claim_dma_lock();

Acquires the DMA spinlock. This function also blocks interrupts on the local processor; thus the return value is the usual “flags” value, which must be used when reenabling interrupts.

void release_dma_lock(unsigned long flags);

Returns the DMA spinlock and restores the previous interrupt status.

The spinlock should be held when using the functions described next. It should not be held during the actual I/O, however. A driver should never sleep when holding a spinlock.

The information that must be loaded into the controller is made up of three items: the RAM address, the number of atomic items that must be transferred (in bytes or words), and the direction of the transfer. To this end, the following functions are exported by <asm/dma.h>:

void set_dma_mode(unsigned int channel, char mode);

Indicates whether the channel must read from the device (DMA_MODE_READ) or write to it (DMA_MODE_WRITE). A third mode exists, DMA_MODE_CASCADE, which is used to release control of the bus. Cascading is the way the first controller is connected to the top of the second, but it can also be used by true ISA bus-master devices. We won’t discuss bus mastering here.

void set_dma_addr(unsigned int channel, unsigned int addr);

Assigns the address of the DMA buffer. The function stores the 24 least significant bits of addr in the controller. The addr argument must be a bus address (see Section 13.4.3 earlier in this chapter).

void set_dma_count(unsigned int channel, unsigned int count);

Assigns the number of bytes to transfer. The count argument represents bytes for 16-bit channels as well; in this case, the number must be even.

In addition to these functions, there are a number of housekeeping facilities that must be used when dealing with DMA devices:

void disable_dma(unsigned int channel);

A DMA channel can be disabled within the controller. The channel should be disabled before the controller is configured, to prevent improper operation (the controller is programmed via eight-bit data transfers, and thus none of the previous functions is executed atomically).

void enable_dma(unsigned int channel);

This function tells the controller that the DMA channel contains valid data.

int get_dma_residue(unsigned int channel);

The driver sometimes needs to know if a DMA transfer has been completed. This function returns the number of bytes that are still to be transferred. The return value is 0 after a successful transfer and is unpredictable (but not 0) while the controller is working. The unpredictability reflects the fact that the residue is a 16-bit value, which is obtained by two 8-bit input operations.

void clear_dma_ff(unsigned int channel)

This function clears the DMA flip-flop. The flip-flop is used to control access to 16-bit registers. The registers are accessed by two consecutive 8-bit operations, and the flip-flop is used to select the least significant byte (when it is clear) or the most significant byte (when it is set). The flip-flop automatically toggles when 8 bits have been transferred; the programmer must clear the flip-flop (to set it to a known state) before accessing the DMA registers.

Using these functions, a driver can implement a function like the following to prepare for a DMA transfer:

int dad_dma_prepare(int channel, int mode, unsigned int buf,
                    unsigned int count)
{
    unsigned long flags;

    flags = claim_dma_lock();
    disable_dma(channel);
    clear_dma_ff(channel);
    set_dma_mode(channel, mode);
    set_dma_addr(channel, virt_to_bus(buf));
    set_dma_count(channel, count);
    enable_dma(channel);
    release_dma_lock(flags);

    return 0;
}

A function like the next one, then, is used to check for successful completion of DMA:

int dad_dma_isdone(int channel)
{
    int residue;
    unsigned long flags = claim_dma_lock ();
    residue = get_dma_residue(channel);
    release_dma_lock(flags);
    return (residue == 0);
}

The only thing that remains to be done is to configure the device board. This device-specific task usually consists of reading or writing a few I/O ports. Devices differ in significant ways. For example, some devices expect the programmer to tell the hardware how big the DMA buffer is, and sometimes the driver has to read a value that is hardwired into the device. For configuring the board, the hardware manual is your only friend.



[52] The word fragmentation is usually applied to disks, to express the idea that files are not stored consecutively on the magnetic medium. The same concept applies to memory, where each virtual address space gets scattered throughout physical RAM, and it becomes difficult to retrieve consecutive free pages when a DMA buffer is requested.

[53] These circuits are now part of the motherboard’s chipset, but a few years ago they were two separate 8237 chips.

[54] The original PCs had only one controller; the second was added in 286-based platforms. However, the second controller is connected as the master because it handles 16-bit transfers; the first transfers only 8 bits at a time and is there for backward compatibility.

Get Linux Device Drivers, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.