vmalloc and Friends

The next memory allocation function that we’ll show you is vmalloc, which allocates a contiguous memory region in the virtual address space. Although the pages are not necessarily consecutive in physical memory (each page is retrieved with a separate call to __get_free_page), the kernel sees them as a contiguous range of addresses. vmalloc returns 0 (the NULL address) if an error occurs, otherwise, it returns a pointer to a linear memory area of size at least size.

The prototypes of the function and its relatives (ioremap, which is not strictly an allocation function, will be discussed shortly) are as follows:

#include <linux/vmalloc.h>

void * vmalloc(unsigned long size);
void vfree(void * addr);
void *ioremap(unsigned long offset, unsigned long size);
void iounmap(void * addr);

It’s worth stressing that memory addresses returned by kmalloc and get_free_pages are also virtual addresses. Their actual value is still massaged by the MMU (memory management unit, usually part of the CPU) before it is used to address physical memory.[30] vmalloc is not different in how it uses the hardware, but rather in how the kernel performs the allocation task.

The (virtual) address range used by kmalloc and get_free_pages features a one-to-one mapping to physical memory, possibly shifted by a constant PAGE_OFFSET value; the functions don’t need to modify the page tables for that address range. The address range used by vmalloc and ioremap, on the other hand, is completely synthetic, and each allocation builds the (virtual) memory area by suitably setting up the page tables.

This difference can be perceived by comparing the pointers returned by the allocation functions. On some platforms (for example, the x86), addresses returned by vmalloc are just greater than addresses that kmalloc addresses. On other platforms (for example, MIPS and IA-64), they belong to a completely different address range. Addresses available for vmalloc are in the range from VMALLOC_START to VMALLOC_END. Both symbols are defined in <asm/pgtable.h>.

Addresses allocated by vmalloc can’t be used outside of the microprocessor, because they make sense only on top of the processor’s MMU. When a driver needs a real physical address (such as a DMA address, used by peripheral hardware to drive the system’s bus), you can’t easily use vmalloc. The right time to call vmalloc is when you are allocating memory for a large sequential buffer that exists only in software. It’s important to note that vmalloc has more overhead than __get_free_pages because it must both retrieve the memory and build the page tables. Therefore, it doesn’t make sense to call vmalloc to allocate just one page.

An example of a function that uses vmalloc is the create_module system call, which uses vmalloc to get space for the module being created. Code and data of the module are later copied to the allocated space using copy_from_user, after insmod has relocated the code. In this way, the module appears to be loaded into contiguous memory. You can verify, by looking in /proc/ksyms, that kernel symbols exported by modules lie in a different memory range than symbols exported by the kernel proper.

Memory allocated with vmalloc is released by vfree, in the same way that kfree releases memory allocated by kmalloc.

Like vmalloc, ioremap builds new page tables; unlike vmalloc, however, it doesn’t actually allocate any memory. The return value of ioremap is a special virtual address that can be used to access the specified physical address range; the virtual address obtained is eventually released by calling iounmap. Note that the return value from ioremap cannot be safely dereferenced on all platforms; instead, functions like readb should be used. See Section 8.4.1 in Chapter 8for the details.

ioremap is most useful for mapping the (physical) address of a PCI buffer to (virtual) kernel space. For example, it can be used to access the frame buffer of a PCI video device; such buffers are usually mapped at high physical addresses, outside of the address range for which the kernel builds page tables at boot time. PCI issues are explained in more detail in Section 15.1 in Chapter 15.

It’s worth noting that for the sake of portability, you should not directly access addresses returned by ioremap as if they were pointers to memory. Rather, you should always use readb and the other I/O functions introduced in Section 8.4, in Chapter 8. This requirement applies because some platforms, such as the Alpha, are unable to directly map PCI memory regions to the processor address space because of differences between PCI specs and Alpha processors in how data is transferred.

There is almost no limit to how much memory vmalloc can allocate and ioremap can make accessible, although vmalloc refuses to allocate more memory than the amount of physical RAM, in order to detect common errors or typos made by programmers. You should remember, however, that requesting too much memory with vmalloc leads to the same problems as it does with kmalloc.

Both ioremap and vmalloc are page oriented (they work by modifying the page tables); thus the relocated or allocated size is rounded up to the nearest page boundary. In addition, the implementation of ioremap found in Linux 2.0 won’t even consider remapping a physical address that doesn’t start at a page boundary. Newer kernels allow that by “rounding down” the address to be remapped and by returning an offset into the first remapped page.

One minor drawback of vmalloc is that it can’t be used at interrupt time because internally it uses kmalloc(GFP_KERNEL) to acquire storage for the page tables, and thus could sleep. This shouldn’t be a problem—if the use of __get_free_page isn’t good enough for an interrupt handler, then the software design needs some cleaning up.

A scull Using Virtual Addresses: scullv

Sample code using vmalloc is provided in the scullv module. Like scullp, this module is a stripped-down version of scull that uses a different allocation function to obtain space for the device to store data.

The module allocates memory 16 pages at a time. The allocation is done in large chunks to achieve better performance than scullp and to show something that takes too long with other allocation techniques to be feasible. Allocating more than one page with __get_free_pages is failure prone, and even when it succeeds, it can be slow. As we saw earlier, vmalloc is faster than other functions in allocating several pages, but somewhat slower when retrieving a single page, because of the overhead of page-table building. scullv is designed like scullp. order specifies the “order” of each allocation and defaults to 4. The only difference between scullv and scullp is in allocation management. These lines use vmalloc to obtain new memory:

/* Allocate a quantum using virtual addresses */
if (!dptr->data[s_pos]) {
    dptr->data[s_pos] =
        (void *)vmalloc(PAGE_SIZE << dptr->order);
    if (!dptr->data[s_pos])
        goto nomem;
    memset(dptr->data[s_pos], 0, PAGE_SIZE << dptr->order);
}

And these lines release memory:

/* Release the quantum set */
for (i = 0; i < qset; i++)
    if (dptr->data[i])
        vfree(dptr->data[i]);

If you compile both modules with debugging enabled, you can look at their data allocation by reading the files they create in /proc. The following snapshots were taken on two different systems:

salma% cat /tmp/bigfile > /dev/scullp0; head -5 /proc/scullpmem

Device 0: qset 500, order 0, sz 1048576
  item at e00000003e641b40, qset at e000000025c60000
       0:e00000003007c000
       1:e000000024778000
salma% cat /tmp/bigfile > /dev/scullv0; head -5 /proc/scullvmem

Device 0: qset 500, order 4, sz 1048576
  item at e0000000303699c0, qset at e000000025c87000
       0:a000000000034000
       1:a000000000078000
salma% uname -m
ia64

rudo% cat /tmp/bigfile > /dev/scullp0; head -5 /proc/scullpmem

Device 0: qset 500, order 0, sz 1048576
  item at c4184780, qset at c71c4800
       0:c262b000
       1:c2193000
rudo%  cat /tmp/bigfile > /dev/scullv0; head -5 /proc/scullvmem

Device 0: qset 500, order 4, sz 1048576
  item at c4184b80, qset at c71c4000
       0:c881a000
       1:c882b000
rudo% uname -m
i686

The values show two different behaviors. On IA-64, physical addresses and virtual addresses are mapped to completely different address ranges (0xE and 0xA), whereas on x86 computers vmalloc returns virtual addresses just above the mapping used for physical memory.



[30] Actually, some architectures define ranges of “virtual” addresses as reserved to address physical memory. When this happens, the Linux kernel takes advantage of the feature, and both the kernel and get_free_pages addresses lie in one of those memory ranges. The difference is transparent to device drivers and other code that is not directly involved with the memory-management kernel subsystem.

Get Linux Device Drivers, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.