Thus far, we have used kmalloc and kfree for the allocation and freeing of memory. The Linux kernel offers a richer set of memory allocation primitives, however. In this chapter we look at other ways of making use of memory in device drivers and at how to make the best use of your system’s memory resources. We will not get into how the different architectures actually administer memory. Modules are not involved in issues of segmentation, paging, and so on, since the kernel offers a unified memory management interface to the drivers. In addition, we won’t describe the internal details of memory management in this chapter, but will defer it to Section 13.1 in Chapter 13.
The kmalloc allocation engine is a powerful tool, and easily learned because of its similarity to malloc. The function is fast—unless it blocks—and it doesn’t clear the memory it obtains; the allocated region still holds its previous content. The allocated region is also contiguous in physical memory. In the next few sections, we talk in detail about kmalloc, so you can compare it with the memory allocation techniques that we discuss later.
The first argument to kmalloc is the size of the block to be allocated. The second argument, the allocation flags, is much more interesting, because it controls the behavior of kmalloc in a number of ways.
The most-used flag,
GFP_KERNEL, means that the
allocation (internally performed by calling, eventually,
get_free_pages, which is the source of the
GFP_ prefix) is performed on behalf of a process
running in kernel space. In other words, this means that the calling
function is executing a system call on behalf of a process. Using
GFP_KERNEL means that kmalloc
can put the current process to sleep waiting for a page when called in
low-memory situations. A function that allocates memory using
GFP_KERNEL must therefore be reentrant. While the
current process sleeps, the kernel takes proper action to retrieve a
memory page, either by flushing buffers to disk or by swapping out
memory from a user process.
GFP_KERNEL isn’t always the right allocation flag
to use; sometimes kmalloc is called from outside
a process’s context. This type of call can happen, for instance, in
interrupt handlers, task queues, and kernel timers. In this case, the
current process should not be put to sleep, and the
driver should use a flag of
GFP_ATOMIC instead. The
kernel normally tries to keep some free pages around in order to
fulfill atomic allocation. When
used, kmalloc can use even the last free page. If
that last page does not exist, however, the allocation will fail.
Other flags can be used in place of or in addition to
although those two cover most of the needs of device drivers. All the
flags are defined in
flags are prefixed with a double underscore, like
__GFP_DMA; collections of flags lack the
prefix and are sometimes called allocation priorities.
Normal allocation of kernel memory. May sleep.
Used in managing the buffer cache, this priority allows the allocator
to sleep. It differs from
GFP_KERNEL in that fewer
attempts will be made to free memory by flushing dirty pages to disk;
the purpose here is to avoid deadlocks when the I/O subsystems
themselves need memory.
Used to allocate memory from interrupt handlers and other code outside of a process context. Never sleeps.
Version 2.4 of the kernel knows about three memory zones: DMA-capable memory, normal memory, and high memory. While allocation normally happens in the normal zone, setting either of the bits just mentioned requires memory to be allocated from a different zone. The idea is that every computer platform that must know about special memory ranges (instead of considering all RAM equivalent) will fall into this abstraction.
DMA-capable memory is the only memory that can be involved in DMA data transfers with peripheral devices. This restriction arises when the address bus used to connect peripheral devices to the processor is limited with respect to the address bus used to access RAM. For example, on the x86, devices that plug into the ISA bus can only address memory from 0 to 16 MB. Other platforms have similar needs, although usually less stringent than the ISA one.
High memory is memory that requires special handling to be accessed. It made its appearance in kernel memory management when support for the Pentium II Virtual Memory Extension was implemented during 2.3 development to access up to 64 GB of physical memory. High memory is a concept that only applies to the x86 and SPARC platforms, and the two implementations are different.
Whenever a new page is allocated to fulfill the
kmalloc request, the kernel builds a list of
zones that can be used in the search. If
__GFP_DMA is specified, only the DMA zone
is searched: if no memory is available at low addresses, allocation
fails. If no special flag is present, both normal and DMA memory is
__GFP_HIGHMEM is set, then all
three zones are used to search a free page.
If the platform has no concept of high memory or it has been disabled
in the kernel configuration,
is defined as
0 and has no effect.
The mechanism behind memory zones is implemented in
mm/page_alloc.c, while initialization of the zone
resides in platform-specific files, usually in
mm/init.c within the
tree. We’ll revisit these topics in Chapter 13.
The kernel manages the system’s physical memory, which is available only in page-sized chunks. As a result, kmalloc looks rather different than a typical user-space malloc implementation. A simple, heap-oriented allocation technique would quickly run into trouble; it would have a hard time working around the page boundaries. Thus, the kernel uses a special page-oriented allocation technique to get the best use from the system’s RAM.
Linux handles memory allocation by creating a set of pools of memory objects of fixed sizes. Allocation requests are handled by going to a pool that holds sufficiently large objects, and handing an entire memory chunk back to the requester. The memory management scheme is quite complex, and the details of it are not normally all that interesting to device driver writers. After all, the implementation can change—as it did in the 2.1.38 kernel—without affecting the interface seen by the rest of the kernel.
The one thing driver developers should keep in mind, though, is that the kernel can allocate only certain predefined fixed-size byte arrays. If you ask for an arbitrary amount of memory, you’re likely to get slightly more than you asked for, up to twice as much. Also, programmers should remember that the minimum memory that kmalloc handles is as big as 32 or 64, depending on the page size used by the current architecture.
The data sizes available are generally powers of two. In the 2.0 kernel, the available sizes were actually slightly less than a power of two, due to control flags added by the management system. If you keep this fact in mind, you’ll use memory more efficiently. For example, if you need a buffer of about 2000 bytes and run Linux 2.0, you’re better off asking for 2000 bytes, rather than 2048. Requesting exactly a power of two is the worst possible case with any kernel older than 2.1.38—the kernel will allocate twice as much as you requested. This is why scull used 4000 bytes per quantum instead of 4096.
You can find the exact values used for the allocation blocks in
mm/kmalloc.c (with the 2.0 kernel) or
mm/slab.c (in current kernels), but remember that
they can change again without notice. The trick of allocating less
than 4 KB works well for scull with all
2.x kernels, but it’s not guaranteed to be
optimal in the future.
In any case, the maximum size that can be allocated by kmalloc is 128 KB—slightly less with 2.0 kernels. If you need more than a few kilobytes, however, there are better ways than kmalloc to obtain memory, as outlined next.
 It’s interesting to note that the limit is only in force for the ISA bus; an x86 device that plugs into the PCI bus can perform DMA with all normal memory.