The mm Directory

The last major directory of kernel source files is devoted to memory management. The files in this directory implement all the data structures that are used throughout the system to manage memory-related issues. While memory management is founded on registers and features specific to a given CPU, we’ve already seen in Chapter 13 how most of the code has been made platform independent. Interested users can check how asm/arch- arch /mm implements the lowest level for a specific computer platform.

The kmalloc/kfree memory allocation engine is defined in slab.c. This file is a completely new implementation that replaces what used to live in kmalloc.c. The latter file doesn’t exist anymore after version 2.0.

While most programmers are familiar with how an operating system manages memory in blocks and pages, Linux (taking an idea from Sun Microsystem’s Solaris) uses an additional, more flexible concept called a slab. Each slab is a cache that contains multiple memory objects of the same size. Some slabs are specialized and contain structs of a certain type used by a certain part of the kernel; others are more general and contain memory regions of 32 bytes, 64 bytes, and so on. The advantage of using slabs is that structs or other regions of memory can be cached and reused with very little overhead; the more ponderous technique of allocating and freeing pages is invoked less often.

The other important allocation tool, vmalloc, and the function that lies behind them all, get_free_pages, are defined in vmalloc.c and page_alloc.c respectively. Both are pretty straightforward and make interesting reading.

In addition to allocation services, a memory management system must offer memory mappings. After all, mmap is the foundation of many system activities, including the execution of a file. The actual sys_mmap function doesn’t live here, though. It is buried in architecture-specific code, because system calls with more than five arguments need special handling in relation to CPU registers. The function that implements mmap for all platforms is do_mmap_pgoff, defined in mmap.c. The same file implements sys_sendfile and sys_brk. The latter may look unrelated, because brk is used to raise the maximum virtual address usable by a process. Actually, Linux (and most current Unices) creates new virtual address space for a process by mapping pages from /dev/zero.

The mechanisms for mapping a regular file into memory have been placed in filemap.c; the file acts on pretty low-level data structures within the memory management system. mprotect and remap are implemented in two files of the same names; memory locking appears in mlock.c.

When a process has several memory maps active, you need an efficient way to look for free areas in its memory address space. To this end, all memory maps of a process are laid out in an Adelson-Velski-Landis (AVL) tree. The software structure is implemented in mmap_avl.c.

Swap file initialization and removal (i.e., the swapon and swapoff system calls) are in swapfile.c. The scope of swap_state.c is the swap cache, and page aging is in swap.c. What is known as swapping is not defined here. Instead, it is part of managing memory pages, implemented by the kswapd thread.

The lowest level of page-table management is implemented by the memory.c file, which still carries the original notes by Linus when he implemented the first real memory management features in December 1991. Everything that happens at lower levels is part of architecture-specific code (often hidden as macros in the header files).

Code specific to high-memory management (the memory beyond that which can be addressed directly by the kernel, especially used in the x86 world to accommodate more than 4 GB of RAM without abandoning the 32-bit architecture) is in highmem.c, as you may imagine.

vmscan.c implements the kswapd kernel thread. This is the procedure that looks for unused and old pages in order to free them or send them to swap space, as already suggested. It’s a well-commented source file because fine-tuning these algorithms is the key factor to overall system performance. Every design choice in this nontrivial and critical section needs to be well motivated, which explains the good amount of comments.

The rest of the source files found in the mm directory deal with minor but sometimes important details, like the oom_killer, a procedure that elects which process to kill when the system runs out of memory.

Interestingly, the uClinux port of the Linux kernel to MMU-less processors introduces a separate mmnommu directory. It closely replicates the official mm while leaving out any MMU-related code. The developers chose this path to avoid adding a mess of conditional code in the mm source tree. Since uClinux is not (yet) integrated with the mainstream kernel, you’ll need to download a uClinux CVS tree or tar ball if you want to compare the two directories (both included in the uClinux tree).

Get Linux Device Drivers, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.