At this point, we know the basics of how to write a full-featured char module. Real-world drivers, however, need to do more than implement the necessary operations; they have to deal with issues such as timing, memory management, hardware access, and more. Fortunately, the kernel makes a number of facilities available to ease the task of the driver writer. In the next few chapters we’ll fill in information on some of the kernel resources that are available, starting with how timing issues are addressed. Dealing with time involves the following, in order of increasing complexity:
Understanding kernel timing
Knowing the current time
Delaying operation for a specified amount of time
Scheduling asynchronous functions to happen after a specified time lapse
The first point we need to cover is the timer interrupt, which is the mechanism the kernel uses to keep track of time intervals. Interrupts are asynchronous events that are usually fired by external hardware; the CPU is interrupted in its current activity and executes special code (the Interrupt Service Routine, or ISR) to serve the interrupt. Interrupts and ISR implementation issues are covered in Chapter 9.
Timer interrupts are generated by the system’s timing hardware at
regular intervals; this interval is set by the kernel according to the
HZ, which is an architecture-dependent
value defined in
Linux versions define
HZ to be 100 for most
platforms, but some platforms use 1024, and the IA-64 simulator uses
20. Despite what your preferred platform uses, no driver writer should
count on any specific value of
Every time a timer interrupt occurs, the value of the variable
jiffies is incremented.
is initialized to 0 when the system boots, and is thus the number of
clock ticks since the computer was turned on. It is declared in
unsigned long volatile, and will possibly overflow after a long time of
continuous system operation (but no platform features jiffy overflow
in less than 16 months of uptime). Much effort has gone into ensuring
that the kernel operates properly when
overflows. Driver writers do not normally have to worry about
jiffies overflows, but it is good to be aware of
It is possible to change the value of
HZ for those
who want systems with a different clock interrupt frequency. Some
people using Linux for hard real-time tasks have been known to raise
the value of
HZ to get better response times; they
are willing to pay the overhead of the extra timer interrupts to
achieve their goals. All in all, however, the best approach to the
timer interrupt is to keep the default value for
HZ, by virtue of our complete trust in the kernel
developers, who have certainly chosen the best value.
Most modern CPUs include a high-resolution counter that is incremented every clock cycle; this counter may be used to measure time intervals precisely. Given the inherent unpredictability of instruction timing on most systems (due to instruction scheduling, branch prediction, and cache memory), this clock counter is the only reliable way to carry out small-scale timekeeping tasks. In response to the extremely high speed of modern processors, the pressing demand for empirical performance figures, and the intrinsic unpredictability of instruction timing in CPU designs caused by the various levels of cache memories, CPU manufacturers introduced a way to count clock cycles as an easy and reliable way to measure time lapses. Most modern processors thus include a counter register that is steadily incremented once at each clock cycle.
The details differ from platform to platform: the register may or may not be readable from user space, it may or may not be writable, and it may be 64 or 32 bits wide—in the latter case you must be prepared to handle overflows. Whether or not the register can be zeroed, we strongly discourage resetting it, even when hardware permits. Since you can always measure differences using unsigned variables, you can get the work done without claiming exclusive ownership of the register by modifying its current value.
The most renowned counter register is the TSC (timestamp counter), introduced in x86 processors with the Pentium and present in all CPU designs ever since. It is a 64-bit register that counts CPU clock cycles; it can be read from both kernel space and user space.
The former atomically reads the 64-bit value into two 32-bit variables; the latter reads the low half of the register into a 32-bit variable and is sufficient in most cases. For example, a 500-MHz system will overflow a 32-bit counter once every 8.5 seconds; you won’t need to access the whole register if the time lapse you are benchmarking reliably takes less time.
These lines, for example, measure the execution of the instruction itself:
unsigned long ini, end; rdtscl(ini); rdtscl(end); printk("time lapse: %li\n", end - ini);
Some of the other platforms offer similar functionalities, and kernel headers offer an architecture-independent function that you can use instead of rdtsc. It is called get_cycles, and was introduced during 2.1 development. Its prototype is
#include <linux/timex.h> cycles_t get_cycles(void);
The function is defined for every platform, and it always returns 0 on
the platforms that have no cycle-counter register. The
cycles_t type is an appropriate unsigned type that
can fit in a CPU register. The choice to fit the value in a single
register means, for example, that only the lower 32 bits of the
Pentium cycle counter are returned by get_cycles.
The choice is a sensible one because it avoids the problems with
multiregister operations while not preventing most common uses of the
counter—namely, measuring short time lapses.
Despite the availability of an architecture-independent function, we’d like to take the chance to show an example of inline assembly code. To this aim, we’ll implement a rdtscl function for MIPS processors that works in the same way as the x86 one.
We’ll base the example on MIPS because most MIPS processors feature a 32-bit counter as register 9 of their internal “coprocessor 0.” To access the register, only readable from kernel space, you can define the following macro that executes a “move from coprocessor 0” assembly instruction:
#define rdtscl(dest) \ __asm__ __volatile__("mfc0 %0,$9; nop" : "=r" (dest))
With this macro in place, the MIPS processor can execute the same code shown earlier for the x86.
What’s interesting with gcc inline assembly
is that allocation of general-purpose registers is left to the
compiler. The macro just shown uses
%0 as a
placeholder for “argument 0,” which is later specified as “any
r) used as output
=).” The macro also states that the output
register must correspond to the C expression
The syntax for inline assembly is very powerful but somewhat complex,
especially for architectures that have constraints on what each
register can do (namely, the x86 family). The complete syntax is
described in the gcc documentation, usually
available in the info documentation tree.
The short C-code fragment shown in this section has been run on a K7-class x86 processor and a MIPS VR4181 (using the macro just described). The former reported a time lapse of 11 clock ticks, and the latter just 2 clock ticks. The small figure was expected, since RISC processors usually execute one instruction per clock cycle.
 The trailing nop instruction is required to prevent the compiler from accessing the target register in the instruction immediately following mfc0. This kind of interlock is typical of RISC processors, and the compiler can still schedule useful instructions in the delay slots. In this case we use nop because inline assembly is a black box for the compiler and no optimization can be performed.