In the previous section, we treated start_kernel as the first kernel function. However, you might be interested in what happens before that point, so we’ll step back to take a quick look at that topic. The uninterested reader can jump directly to the next section.
As suggested, the code that runs before start_kernel is, for the most part, assembly code, but several platforms call library C functions from there (most commonly, inflate, the core of gunzip).
On most common platforms, the code that runs before start_kernel is mainly devoted to moving the kernel around after the computer’s firmware (possibly with the help of a boot loader) has loaded it into RAM from some other storage, such as a local disk or a remote workstation over the network.
It’s not uncommon, though, to find some rudimentary boot loader code
inside the boot
directory of an
architecture-specific tree. For example,
arch/i386/boot
includes code that can load the
rest of the kernel off a floppy disk and activate it. The file
bootsect.S
that you will find there, however, can
run only off a floppy disk and is by no means a complete boot loader
(for example, it is unable to pass a command line to the kernel it
loads). Nonetheless, copying a new kernel to a floppy is still a handy
way to quickly boot it on the PC.
A known limitation of the x86 platform is that the CPU can see only
640 KB of system memory when it is powered on, no matter how large
your installed memory is. Dealing with the limitation requires the
kernel to be compressed, and support for decompression is available in
arch/i386/boot
together with other code such as
VGA mode setting. On the PC, because of this limit, you can’t do
anything with a vmlinux
kernel image, and the
file you actually boot is called zImage
or
bzImage
; the boot sector described earlier is
actually prepended to this file rather than to
vmlinux
. We won’t spend more time on the booting
process on the x86 platform, since you can choose from several boot
loaders, and the topic is generally well discussed elsewhere.
Some platforms differ greatly in the layout of their boot code from the PC. Sometimes the code must deal with several variations of the same architecture. This is the case, for example, with ARM, MIPS, and M68k. These platforms cover a wide variety of CPU and system types, ranging from powerful servers and workstations down to PDAs or embedded appliances. Different environments require different boot code and sometimes even different ld scripts to compile the kernel image. Some of this support is not included in the official kernel tree published by Linus and is available only from third-party Concurrent Versions System (CVS) trees that closely track the official tree but have not yet been merged. Current examples include the SGI CVS tree for MIPS workstations and the LinuxCE CVS tree for MIPS-based palm computers. Nonetheless, we’d like to spend a few words on this topic because we feel it’s an interesting one. Everything from start_kernel onward is based on this extra complexity but doesn’t notice it.
Specific ld scripts and makefile rules are needed especially for embedded systems, and particularly for variants without a memory management unit, which are supported by uClinux. When you have no hardware MMU that maps virtual addresses to physical ones, you must link the kernel to be executed from the physical address where it will be loaded in the target platform. It’s not uncommon in small systems to link the kernel so that it is loaded into read-only memory (usually flash memory), where it is directly activated at power-on time without the help of any boot loader.
When the kernel is executed directly from flash memory, the makefiles, ld scripts, and boot code work in tight cooperation. The ld rules place the code and read-only segments (such as the init calls information) into flash memory, while placing the data segments (data and block started by symbol (BSS)) in system RAM. The result is that the two sets are not consecutive. The makefile, then, offers special rules to coalesce all these sections into consecutive addresses and convert them to a format suitable for upload to the target system. Coalescing is mandatory because the data segment contains initialized data structures that must get written to read-only memory or otherwise be lost. Finally, assembly code that runs before start_kernel must copy over the data segment from flash memory to RAM (to the address where the linker placed it) and zero out the address range associated with the BSS segment. Only after this remapping has taken place can C-language code run.
When you upload a new kernel to the target system, the firmware there retrieves the data file from the network or from a serial channel and writes it to flash memory. The intermediate format used to upload the kernel to a target computer varies from system to system, because it depends on how the actual upload takes place. But in each case, this format is a generic container of binary data used to transfer the compiled image using standardized tools. For example, the BIN format is meant to be transferred over a network, while the S3 format is a hexadecimal ASCII file sent to the target system through a serial cable.[65] Most of the time, when powering on the system, the user can select whether to boot Linux or to type firmware commands.
[65] We are not describing the formats or the tools in detail, because the information is readily available to people researching embedded Linux.
Get Linux Device Drivers, Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.