The last resort in debugging modules is using a debugger to step through the code, watching the value of variables and machine registers. This approach is time-consuming and should be avoided whenever possible. Nonetheless, the fine-grained perspective on the code that is achieved through a debugger is sometimes invaluable.
Using an interactive debugger on the kernel is a challenge. The kernel runs in its own address space on the behalf of all the processes on the system. As a result, a number of common capabilities provided by user-space debuggers, such as breakpoints and single-stepping, are harder to come by in the kernel. In this section we look at several ways of debugging the kernel; each of them has advantages and disadvantages.
gdb can be quite useful for looking at the system internals. Proficient use of the debugger at this level requires some confidence with gdb commands, some understanding of assembly code for the target platform, and the ability to match source code and optimized assembly.
The debugger must be invoked as though the kernel were an application.
In addition to specifying the filename for the uncompressed kernel
image, you need to provide the name of a core file on the command
line. For a running kernel, that core file is the kernel core image,
/proc/kcore. A typical invocation of
gdb looks like the following:
gdb /usr/src/linux/vmlinux /proc/kcore
The first argument is the name of the uncompressed kernel executable,
The second argument on the gdb command line
is the name of the core file. Like any file in
generated when it is read. When the read system
call executes in the
/proc filesystem, it maps to
a data-generation function rather than a data-retrieval one; we’ve
already exploited this feature in Section 4.2.1
earlier in this chapter.
kcore is used to
represent the kernel “executable” in the format of a core file; it
is a huge file because it represents the whole kernel address space,
which corresponds to all physical memory. From within
gdb, you can look at kernel variables by
issuing the standard gdb commands. For
example, p jiffies prints the number of clock
ticks from system boot to the current time.
When you print data from gdb, the kernel is
still running, and the various data items have different values at
different times; gdb, however, optimizes
access to the core file by caching data that has already been read. If
you try to look at the
jiffies variable once again,
you’ll get the same answer as before. Caching values to avoid extra
disk access is a correct behavior for conventional core files, but is
inconvenient when a “dynamic” core image is used. The solution is to
issue the command core-file /proc/kcore whenever
you want to flush the gdb cache; the
debugger prepares to use a new core file and discards any old
information. You won’t, however, always need to issue
core-file when reading a new datum;
gdb reads the core in chunks of a few
kilobytes and caches only chunks it has already referenced.
Numerous capabilities normally provided by gdb are not available when you are working with the kernel. For example, gdb is not able to modify kernel data; it expects to be running a program to be debugged under its own control before playing with its memory image. It is also not possible to set breakpoints or watchpoints, or to single-step through kernel functions.
If you compile the kernel with debugging support
(-g), the resulting
vmlinux file turns out to work better with
gdb than the same file compiled without
-g. Note, however, that a large amount of
disk space is needed to compile the kernel with the
-g option (each object file and the kernel
itself are three or more times bigger than usual).
On non-PC computers, the game is different. On the Alpha,
make boot strips the kernel before creating the
bootable image, so you end up with both the
vmlinux and the
files. The former is usable by gdb, and
you can boot from the latter. On the SPARC, the kernel (at least the
2.0 kernel) is not stripped by default.
When you compile the kernel with -g
and run the debugger using
vmlinux together with
/proc/kcore, gdb can
return a lot of information about the kernel internals. You can, for
example, use commands such as p *module_list,
p *module_list->next, and p
*chrdevs->fops to dump structures. To get the best out
of p, you’ll need to keep a kernel map and the
source code handy.
Another useful task that gdb performs on
the running kernel is disassembling functions, via the
disassemble command (which can be abbreviated to
disass) or the “examine instructions”
(x/i) command. The
disassemble command can take as its argument
either a function name or a memory range, whereas
x/i takes a single memory address, also in the
form of a symbol name. You can invoke, for example,
x/20i to disassemble 20 instructions. Note that
you can’t disassemble a module function, because the debugger is
vmlinux, which doesn’t know about your
module. If you try to disassemble a module by address,
gdb is most likely to reply “Cannot access
memory at xxxx.” For the same reason, you can’t look at data items
belonging to a module. They can be read from
/dev/mem if you know the address of your
variables, but it’s hard to make sense out of raw data extracted from
If you want to disassemble a module function, you’re better off running the objdump utility on the module object file. Unfortunately, the tool runs on the disk copy of the file, not the running one; therefore, the addresses as shown by objdump will be the addresses before relocation, unrelated to the module’s execution environment. Another disadvantage of disassembling an unlinked object file is that function calls are still unresolved, so you can’t easily tell a call to printk from a call to kmalloc.
Many readers may be wondering why the kernel does not have any more advanced debugging features built into it. The answer, quite simply, is that Linus does not believe in interactive debuggers. He fears that they lead to poor fixes, those which patch up symptoms rather than addressing the real cause of problems. Thus, no built-in debuggers.
Other kernel developers, however, see an occasional use for interactive debugging tools. One such tool is the kdb built-in kernel debugger, available as a nonofficial patch from oss.sgi.com. To use kdb, you must obtain the patch (be sure to get a version that matches your kernel version), apply it, and rebuild and reinstall the kernel. Note that, as of this writing, kdb works only on IA-32 (x86) systems (though a version for the IA-64 existed for a while in the mainline kernel source before being removed).
Once you are running a kdb-enabled kernel, there are a couple of ways to enter the debugger. Hitting the Pause (or Break) key on the console will start up the debugger. kdb also starts up when a kernel oops happens, or when a breakpoint is hit. In any case, you will see a message that looks something like this:
Entering kdb (0xc1278000) on processor 1 due to Keyboard Entry kdb>
Note that just about everything the kernel does stops when kdb is running. Nothing else should be running on a system where you invoke kdb; in particular, you should not have networking turned on—unless, of course, you are debugging a network driver. It is generally a good idea to boot the system in single-user mode if you will be using kdb.
As an example, consider a quick scull debugging session. Assuming that the driver is already loaded, we can tell kdb to set a breakpoint in scull_read as follows:
kdb> bp scull_read Instruction(i) BP #0 at 0xc8833514 (scull_read) is enabled on cpu 1 kdb> go
The bp command tells kdb to stop the next time the kernel enters scull_read. We then type go to continue execution. After putting something into one of the scull devices, we can attempt to read it by running cat under a shell on another terminal, yielding the following:
Entering kdb (0xc3108000) on processor 0 due to Breakpoint @ 0xc8833515 Instruction(i) breakpoint #0 at 0xc8833514 scull_read+0x1: movl %esp,%ebp kdb>
We are now positioned at the beginning of scull_read. To see how we got there, we can get a stack trace:
kdb> bt EBP EIP Function(args) 0xc3109c5c 0xc8833515 scull_read+0x1 0xc3109fbc 0xfc458b10 scull_read+0x33c255fc( 0x3, 0x803ad78, 0x1000, 0x1000, 0x804ad78) 0xbffffc88 0xc010bec0 system_call kdb>
kdb attempts to print out the arguments to every function in the call trace. It gets confused, however, by optimization tricks used by the compiler. Thus it prints five arguments for scull_read, which only has four.
Time to look at some data. The mds command
manipulates data; we can query the value of the
scull_devices pointer with a command like:
kdb> mds scull_devices 1 c8836104: c4c125c0 ....
Here we asked for one (four-byte) word of data starting at the
scull_devices; the answer tells us that
our device array was allocated starting at the address
c4c125c0. To look at a device structure itself we
need to use that address:
kdb> mds c4c125c0 c4c125c0: c3785000 .... c4c125c4: 00000000 .... c4c125c8: 00000fa0 .... c4c125cc: 000003e8 .... c4c125d0: 0000009a .... c4c125d4: 00000000 .... c4c125d8: 00000000 .... c4c125dc: 00000001 ....
The eight lines here correspond to the eight fields in the
Scull_Dev structure. Thus we see that the memory
for the first device is allocated at
that there is no next item in the list, that the quantum is
4000 (hex fa0) and the array size is 1000 (hex 3e8), that there are
154 bytes of data in the device (hex 9a), and so on.
kdb can change data as well. Suppose we wanted to trim some of the data from the device:
kdb> mm c4c125d0 0x50 0xc4c125d0 = 0x50
A subsequent cat on the device will now return less data than before.
kdb has a number of other capabilities,
including single-stepping (by instructions, not lines of C source
code), setting breakpoints on data access, disassembling code,
stepping through linked lists, accessing register data, and more.
After you have applied the kdb patch, a
full set of manual pages can be found in the
Documentation/kdb directory in your kernel source
A number of kernel developers have contributed to an unofficial patch called the integrated kernel debugger, or IKD. IKD provides a number of interesting kernel debugging facilities. The x86 is the primary platform for this patch, but much of it works on other architectures as well. As of this writing, the IKD patch can be found at ftp://ftp.kernel.org/pub/linux/kernel/people/andrea/ikd. It is a patch that must be applied to the source for your kernel; the patch is version specific, so be sure to download the one that matches the kernel you are working with.
One of the features of the IKD patch is a kernel stack debugger. If you turn this feature on, the kernel will check the amount of free space on the kernel stack at every function call, and force an oops if it gets too small. If something in your kernel is causing stack corruption, this tool may help you to find it. There is also a “stack meter” feature that you can use to see how close to filling up the stack you get at any particular time.
The IKD patch also includes some tools for finding kernel lockups. A “soft lockup” detector forces an oops if a kernel procedure goes for too long without scheduling. It is implemented by simply counting the number of function calls that are made and shutting things down if that number exceeds a preconfigured threshold. Another feature can continuously print the program counter on a virtual console for truly last-resort lockup tracking. The semaphore deadlock detector forces an oops if a process spends too long waiting on a down call.
Other debugging capabilities in IKD include the kernel trace capability, which can record the paths taken through the kernel code. There are some memory debugging tools, including a leak detector and a couple of “poisoners,” that can be useful in tracking down memory corruption problems.
Finally, IKD also includes a version of the kdb debugger discussed in the previous section. As of this writing, however, the version of kdb included in the IKD patch is somewhat old. If you need kdb, we recommend that you go directly to the source at http://www.oss.sgi.com for the current version.
kgdb is a patch that allows the full use of the gdb debugger on the Linux kernel, but only on x86 systems. It works by hooking into the system to be debugged via a serial line, with gdb running on the far end. You thus need two systems to use kgdb—one to run the debugger and one to run the kernel of interest. Like kdb, kgdb is currently available from http://www.oss.sgi.com.
Setting up kgdb involves installing a
kernel patch and booting the modified kernel. You need to connect the
two systems with a serial cable (of the null modem variety) and to
install some support files on the gdb side
of the connection. The patch places detailed instructions in the file
Documentation/i386/gdb-serial.txt; we won’t
reproduce them here. Be sure to read the instructions on debugging
modules: toward the end there are some nice
gdb macros that have been written for this
Crash dump analyzers enable the system to record its state when an oops occurs, so that it may be examined at leisure afterward. They can be especially useful if you are supporting a driver for a user at a different site. Users can be somewhat reluctant to copy down oops messages for you so installing a crash dump system can let you get the information you need to track down a user’s problem without requiring work from him. It is thus not surprising that the available crash dump analyzers have been written by companies in the business of supporting systems for users.
There are currently two crash dump analyzer patches available for Linux. Both were relatively new when this section was written, and both were in a state of flux. Rather than provide detailed information that is likely to go out of date, we’ll restrict ourselves to providing an overview and pointers to where more information can be found.
The first analyzer is LKCD (Linux Kernel Crash Dumps). It’s available, once again, from oss.sgi.com. When a kernel oops occurs, LKCD will write a copy of the current system state (memory, primarily) into the dump device you specified in advance. The dump device must be a system swap area. A utility called LCRASH is run on the next reboot (before swapping is enabled) to generate a summary of the crash, and optionally to save a copy of the dump in a conventional file. LCRASH can be run interactively and provides a number of debugger-like commands for querying the state of the system.
LKCD is currently supported for the Intel 32-bit architecture only, and only works with swap partitions on SCSI disks.
Another crash dump facility is available from
http://www.missioncriticallinux.com. This crash dump
subsystem creates crash dump files directly in
/var/dumps and does not use the swap area. That
makes certain things easier, but it also means that the system will be
modifying the file system while in a state where things are known to
have gone wrong. The crash dumps generated are in a standard core
file format, so tools like gdb can be used
for post-mortem analysis. This package also provides a separate
analyzer that is able to extract more information than
gdb from the crash dump files.
User-Mode Linux is an interesting concept. It is structured as a
separate port of the Linux kernel, with its own
arch/um subdirectory. It does not run on a new
type of hardware, however; instead, it runs on a virtual machine
implemented on the Linux system call interface. Thus, User-Mode Linux
allows the Linux kernel to run as a separate, user-mode process on a
Having a copy of the kernel running as a user-mode process brings a number of advantages. Because it is running on a constrained, virtual processor, a buggy kernel cannot damage the “real” system. Different hardware and software configurations can be tried easily on the same box. And, perhaps most significantly for kernel developers, the user-mode kernel can be easily manipulated with gdb or another debugger. After all, it is just another process. User-Mode Linux clearly has the potential to accelerate kernel development.
As of this writing, User-Mode Linux is not distributed with the mainline kernel; it must be downloaded from its web site (http://user-mode-linux.sourceforge.net). The word is that it will be integrated into an early 2.4 release after 2.4.0; it may well be there by the time this book is published.
User-Mode Linux also has some significant limitations as of this writing, most of which will likely be addressed soon. The virtual processor currently works in a uniprocessor mode only; the port runs on SMP systems without a problem, but it can only emulate a uniprocessor host. The biggest problem for driver writers, though, is that the user-mode kernel has no access to the host system’s hardware. Thus, while it can be useful for debugging most of the sample drivers in this book, User-Mode Linux is not yet useful for debugging drivers that have to deal with real hardware. Finally, User-Mode Linux only runs on the IA-32 architecture.
Because work is under way to fix all of these problems, User-Mode Linux will likely be an indispensable tool for Linux device driver programmers in the very near future.
The Linux Trace Toolkit (LTT) is a kernel patch and a set of related utilities that allow the tracing of events in the kernel. The trace includes timing information and can create a reasonably complete picture of what happened over a given period of time. Thus, it can be used not only for debugging but also for tracking down performance problems.
LTT, along with extensive documentation, can be found on the Web at http://www.opersys.com/LTT.
Dynamic Probes (or DProbes) is a debugging tool released (under the GPL) by IBM for Linux on the IA-32 architecture. It allows the placement of a “probe” at almost any place in the system, in both user and kernel space. The probe consists of some code (written in a specialized, stack-oriented language) that is executed when control hits the given point. This code can report information back to user space, change registers, or do a number of other things. The useful feature of DProbes is that once the capability has been built into the kernel, probes can be inserted anywhere within a running system without kernel builds or reboots. DProbes can also work with the Linux Trace Toolkit to insert new tracing events at arbitrary locations.
The DProbes tool can be downloaded from IBM’s open source site: http://www-124.ibm.com/developerworks/oss.