By Daniel P. Bovet, Marco Cesati
Cover | Table of Contents | Colophon
http://www.kernel.org)
or check the sources on a Linux CD, you will be able to explore, from
top to bottom, one of the most successful, modern operating systems.
This book, in fact, assumes you have the source code on hand and can
apply what we say to your own explorations.
/). Names must be different within the same
directory, but the same name may be used in different directories.
0x00000000 to
0xffffffff.
cs,
ss, ds, es,
fs, and gs. Although there are
only six of them, a program can reuse the same segmentation register
for different purposes by saving its content in memory and then
restoring it later.
cs
ss
ds
cs register has another important function: it
includes a 2-bit field that specifies the
Current Privilege
Level (CPL) of the CPU. The value 0 denotes the highest privilege
level, while the value 3 denotes the lowest one. Linux uses only
levels 0 and 3, which are respectively called Kernel Mode and User
Mode.
gdt_table referred to by the
gdt variable.
modify_ldt( ) exists that allows
processes to create their own LDTs. This turns out to be useful to
applications (such as Wine) that execute segment-oriented Microsoft
Windows applications.
Base = 0x00000000
PG flag of a
control register named
cr0. When PG
=
0, linear addresses are
interpreted as physical addresses.
cr3 control register in the
descriptor of the process previously in execution and then loads
cr3 with the value stored in the descriptor of the
process to be executed next. Thus, when the new process resumes its
execution on the CPU, the paging unit refers to the correct set of
Page Tables.
task_struct type structure whose fields contain
all the information related to a single process. As the repository of
so much information, the process descriptor is rather complex. In
addition to a large number of fields containing process attributes,
the process descriptor contains several pointers to other data
structures that, in turn, contain pointers to other structures. Figure 3-1 describes the Linux process descriptor
schematically.
state field of the
process descriptor describes what is currently happening to the
process. It consists of an array of flags, each of which describes a
possible process state. In the current Linux version, these states
are mutually exclusive, and hence exactly one flag of
state is set; the remaining flags are cleared. The
following are the possible process states:
TASK_RUNNING
TASK_INTERRUPTIBLE
prev local variable refers to the process
descriptor of the process being switched out and
next refers to the one being switched in to
replace it. We can thus define a process switch
as the activity consisting of saving the hardware context of
prev and replacing it with the hardware context of
next. Since process switches occur quite often, it
is important to minimize the time spent in saving and loading
hardware contexts.
far
jmp instruction to the selector of the
Task State Segment Descriptor of the next process.
While executing the instruction, the CPU performs a
hardware context
switch
by automatically saving the old
hardware context and loading a new one. But Linux 2.4 uses software
to perform a process switch for the following reasons:
execve( ) and wipes out the address
space that was so carefully copied.
vfork( ) system call creates a process that
shares the memory address space of its parent. To prevent the parent
from overwriting data needed by the child, the parent's execution is
blocked until the child exits or executes a new program.
We'll learn more about the vfork(
) system call in the following section.
clone( ), which
uses four parameters:
exit( )
library function, which releases the
resources allocated by the C library, executes each function
registered by the programmer, and ends up invoking the
_exit( ) system call. The exit(
) function may be inserted by the programmer explicitly.
Additionally, the C compiler always inserts an exit(
) function call right after the last statement of the
main( ) function.
do_exit( )
function, which removes most references to the terminating process
from kernel data structures. The do_exit( )
function executes the following actions:
PF_EXITING flag in the
flag field of the process descriptor to indicate
that the process is being eliminated.
sem_exit( ) function (see Chapter 19) or from a dynamic timer queue via the
del_timer_sync( ) function (see Chapter 6).
_ _exit_mm( ), _
_exit_files( ), _ _exit_fs( ), and
exit_sighand( ) functions. These functions also
remove each of these data structures if no other process are sharing
them.
int instruction)
for a kernel service.
eip and
cs registers) in the Kernel Mode stack and by
placing an address related to the interrupt type into the program
counter.
eip register that is
saved on the Kernel Mode stack when the CPU control unit raises the
exception.
idtr
register and initialize all the entries of that table. This activity
is done while initializing the system (see Appendix A).
int
instruction allows a User Mode
process to issue an interrupt signal that has an arbitrary vector
ranging from 0 to 255. Therefore, initialization of the IDT must be
done carefully, to block illegal interrupts and exceptions simulated
by User Mode processes via int instructions. This
can be achieved by setting the DPL field of the Interrupt or Trap
Gate Descriptor to 0. If the process attempts to issue one of these
interrupt signals, the control unit checks the CPL value against the
DPL field and issues a "General
protection" exception.
SIGFPE
signal to the current process, which then takes the necessary steps
to recover or (if no signal handler is set for that signal) abort.
cr0 register to force the kernel to load the
floating point registers of the CPU with new values. A second case
refers to the Page Fault exception, which is used to defer allocating
new page frames to the process until the last possible moment. The
corresponding handler is complex because the exception may, or may
not, denote an error condition (see Section 8.4).
ret_from_exception(
) function.
trap_init( ) function to insert
the final values—the functions that handle the
exceptions—into all IDT entries that refer to nonmaskable
interrupts and exceptions. This is accomplished through the
set_trap_gate, set_intr_gate,
and set_system_gate macros:
set_trap_gate(0,÷_error); set_trap_gate(1,&debug); set_intr_gate(2,&nmi); set_system_gate(3,&int3); set_system_gate(4,&overflow); set_system_gate(5,&bounds); set_trap_gate(6,&invalid_op); set_trap_gate(7,&device_not_available); set_trap_gate(8,&double_fault); set_trap_gate(9,&coprocessor_segment_overrun); set_trap_gate(10,&invalid_TSS); set_trap_gate(11,&segment_not_present); set_trap_gate(12,&stack_segment); set_trap_gate(13,&general_protection); set_intr_gate(14,&page_fault); set_trap_gate(16,&coprocessor_error); set_trap_gate(17,&alignment_check); set_trap_gate(18,&machine_check); set_trap_gate(19,&simd_coprocessor_error); set_system_gate(128,&system_call);