The last major directory of kernel source files is devoted to memory
management. The files in this directory implement all the data
structures that are used throughout the system to manage
memory-related issues. While memory management is founded on registers
and features specific to a given CPU, we’ve already seen in Chapter 13 how most of the code has been made platform
independent. Interested users can check how
asm/arch-
arch
/mm
implements the lowest level for a specific computer platform.
The kmalloc/kfree memory
allocation engine is defined in slab.c
. This file
is a completely new implementation that replaces what used to live in
kmalloc.c
. The latter file doesn’t exist anymore
after version 2.0.
While most programmers are familiar with how an operating system manages memory in blocks and pages, Linux (taking an idea from Sun Microsystem’s Solaris) uses an additional, more flexible concept called a slab. Each slab is a cache that contains multiple memory objects of the same size. Some slabs are specialized and contain structs of a certain type used by a certain part of the kernel; others are more general and contain memory regions of 32 bytes, 64 bytes, and so on. The advantage of using slabs is that structs or other regions of memory can be cached and reused with very little overhead; the more ponderous technique of allocating and freeing pages is invoked less often.
The other important allocation tool, vmalloc, and
the function that lies behind them all,
get_free_pages, are defined in
vmalloc.c
and page_alloc.c
respectively. Both are pretty straightforward and make interesting
reading.
In addition to allocation services, a memory management system must
offer memory mappings. After all, mmap is the
foundation of many system activities, including the execution of a
file. The actual sys_mmap function doesn’t live
here, though. It is buried in architecture-specific code, because
system calls with more than five arguments need special handling in
relation to CPU registers. The function that implements
mmap for all platforms is
do_mmap_pgoff, defined in
mmap.c
. The same file implements
sys_sendfile and
sys_brk. The latter may look unrelated, because
brk is used to raise the maximum virtual address
usable by a process. Actually, Linux (and most current Unices) creates
new virtual address space for a process by mapping pages from
/dev/zero
.
The mechanisms for mapping a regular file into memory have been placed
in filemap.c
; the file acts on pretty low-level
data structures within the memory management
system. mprotect and remap
are implemented in two files of the same names; memory locking appears
in mlock.c
.
When a process has several memory maps active, you need an efficient
way to look for free areas in its memory address space. To this end,
all memory maps of a process are laid out in an Adelson-Velski-Landis
(AVL) tree. The software structure is implemented in
mmap_avl.c
.
Swap file initialization and removal (i.e., the
swapon and swapoff system
calls) are in swapfile.c
. The scope of
swap_state.c
is the swap cache, and page aging is
in swap.c
. What is known as
swapping is not defined here. Instead, it is part
of managing memory pages, implemented by the
kswapd thread.
The lowest level of page-table management is implemented by the
memory.c
file, which still carries the original
notes by Linus when he implemented the first real memory management
features in December 1991. Everything that happens at lower levels is
part of architecture-specific code (often hidden as macros in the
header files).
Code specific to high-memory management (the memory beyond that which
can be addressed directly by the kernel, especially used in the x86
world to accommodate more than 4 GB of RAM without abandoning the
32-bit architecture) is in highmem.c
, as you may
imagine.
vmscan.c
implements the
kswapd kernel thread. This is the procedure
that looks for unused and old pages in order to free them or send them
to swap space, as already suggested. It’s a well-commented source file
because fine-tuning these algorithms is the key factor to overall
system performance. Every design choice in this nontrivial and
critical section needs to be well motivated, which explains the good
amount of comments.
The rest of the source files found in the mm
directory deal with minor but sometimes important details, like the
oom_killer, a procedure that elects which process
to kill when the system runs out of memory.
Interestingly, the uClinux port of the
Linux kernel to MMU-less processors introduces a separate
mmnommu
directory. It closely replicates the
official mm
while leaving out any MMU-related
code. The developers chose this path to avoid adding a mess of
conditional code in the mm
source tree. Since
uClinux is not (yet) integrated with the
mainstream kernel, you’ll need to download a
uClinux CVS tree or tar ball if you want to
compare the two directories (both included in the
uClinux tree).
Get Linux Device Drivers, Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.