Work Queues

The work queues have been introduced in Linux 2.6 and replace a similar construct called “task queue” used in Linux 2.4. They allow kernel functions to be activated (much like deferrable functions) and later executed by special kernel threads called worker threads .

Despite their similarities, deferrable functions and work queues are quite different. The main difference is that deferrable functions run in interrupt context while functions in work queues run in process context. Running in process context is the only way to execute functions that can block (for instance, functions that need to access some block of data on disk) because, as already observed in the section "Nested Execution of Exception and Interrupt Handlers" earlier in this chapter, no process switch can take place in interrupt context. Neither deferrable functions nor functions in a work queue can access the User Mode address space of a process. In fact, a deferrable function cannot make any assumption about the process that is currently running when it is executed. On the other hand, a function in a work queue is executed by a kernel thread, so there is no User Mode address space to access.

Work queue data structures

The main data structure associated with a work queue is a descriptor called workqueue_struct, which contains, among other things, an array of NR_CPUS elements, the maximum number of CPUs in the system.[*] Each element is a descriptor of type cpu_workqueue_struct, whose fields are shown in Table 4-12.

Table 4-12. The fields of the cpu_workqueue_struct structure

Field name



Spin lock used to protect the structure


Sequence number used by flush_workqueue( )


Sequence number used by flush_workqueue( )


Head of the list of pending functions


Wait queue where the worker thread waiting for more work to be done sleeps


Wait queue where the processes waiting for the work queue to be flushed sleep


Pointer to the workqueue_struct structure containing this descriptor


Process descriptor pointer of the worker thread of the structure


Current execution depth of run_workqueue( ) (this field may become greater than one when a function in the work queue list blocks)

The worklist field of the cpu_workqueue_struct structure is the head of a doubly linked list collecting the pending functions of the work queue. Every pending function is represented by a work_struct data structure, whose fields are shown in Table 4-13.

Table 4-13. The fields of the work_struct structure

Field name



Set to 1 if the function is already in a work queue list, 0 otherwise


Pointers to next and previous elements in the list of pending functions


Address of the pending function


Pointer passed as a parameter to the pending function


Usually points to the parent cpu_workqueue_struct descriptor


Software timer used to delay the execution of the pending function

Work queue functions

The create_workqueue("foo" ) function receives as its parameter a string of characters and returns the address of a workqueue_struct descriptor for the newly created work queue. The function also creates n worker threads (where n is the number of CPUs effectively present in the system), named after the string passed to the function: foo/0, foo/1, and so on. The create_singlethread_workqueue( ) function is similar, but it creates just one worker thread, no matter what the number of CPUs in the system is. To destroy a work queue the kernel invokes the destroy_workqueue( ) function, which receives as its parameter a pointer to a workqueue_struct array.

queue_work( ) inserts a function (already packaged inside a work_struct descriptor) in a work queue; it receives a pointer wq to the workqueue_struct descriptor and a pointer work to the work_struct descriptor. queue_work( ) essentially performs the following steps:

  1. Checks whether the function to be inserted is already present in the work queue (work->pending field equal to 1); if so, terminates.

  2. Adds the work_struct descriptor to the work queue list, and sets work->pending to 1.

  3. If a worker thread is sleeping in the more_work wait queue of the local CPU’s cpu_workqueue_struct descriptor, the function wakes it up.

The queue_delayed_work( ) function is nearly identical to queue_work( ), except that it receives a third parameter representing a time delay in system ticks (see Chapter 6). It is used to ensure a minimum delay before the execution of the pending function. In practice, queue_delayed_work( ) relies on the software timer in the timer field of the work_struct descriptor to defer the actual insertion of the work_struct descriptor in the work queue list. cancel_delayed_work( ) cancels a previously scheduled work queue function, provided that the corresponding work_struct descriptor has not already been inserted in the work queue list.

Every worker thread continuously executes a loop inside the worker_thread( ) function; most of the time the thread is sleeping and waiting for some work to be queued. Once awakened, the worker thread invokes the run_workqueue( ) function, which essentially removes every work_struct descriptor from the work queue list of the worker thread and executes the corresponding pending function. Because work queue functions can block, the worker thread can be put to sleep and even migrated to another CPU when resumed.[*]

Sometimes the kernel has to wait until all pending functions in a work queue have been executed. The flush_workqueue( ) function receives a workqueue_struct descriptor address and blocks the calling process until all functions that are pending in the work queue terminate. The function, however, does not wait for any pending function that was added to the work queue following flush_workqueue( ) invocation; the remove_sequence and insert_sequence fields of every cpu_workqueue_struct descriptor are used to recognize the newly added pending functions.

The predefined work queue

In most cases, creating a whole set of worker threads in order to run a function is overkill. Therefore, the kernel offers a predefined work queue called events, which can be freely used by every kernel developer. The predefined work queue is nothing more than a standard work queue that may include functions of different kernel layers and I/O drivers; its workqueue_struct descriptor is stored in the keventd_wq array. To make use of the predefined work queue, the kernel offers the functions listed in Table 4-14.

Table 4-14. Helper functions for the predefined work queue

Predefined work queue function

Equivalent standard work queue function




queue_delayed_work(keventd_wq,w,d) (on any CPU)


queue_delayed_work(keventd_wq,w,d) (on a given CPU)

flush_scheduled_work( )


The predefined work queue saves significant system resources when the function is seldom invoked. On the other hand, functions executed in the predefined work queue should not block for a long time: because the execution of the pending functions in the work queue list is serialized on each CPU, a long delay negatively affects the other users of the predefined work queue.

In addition to the general events queue, you’ll find a few specialized work queues in Linux 2.6. The most significant is the kblockd work queue used by the block device layer (see Chapter 14).

[*] The reason for duplicating the work queue data structures in multiprocessor systems is that per-CPU local data structures yield a much more efficient code (see the section "Per-CPU Variables" in Chapter 5).

[*] Strangely enough, a worker thread can be executed by every CPU, not just the CPU corresponding to the cpu_workqueue_struct descriptor to which the worker thread belongs. Therefore, queue_work( ) inserts a function in the queue of the local CPU, but that function may be executed by any CPU in the systems.

Get Understanding the Linux Kernel, 3rd Edition now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.