|
|
|
|
Windows NT File System InternalsA Developer's GuideBy Rajeev Nagar1st Edition September 1997 1-56592-249-2, Order Number: 2492 794 pages, $49.95, Includes diskette |
Chapter 4.
The NT I/O Manager
In this chapter:
The NT I/O Subsystem
Common Data Structures
I/O Requests: A Discussion
System Boot Sequence
Successfully interfacing with external devices is essential for any computing system. A general-purpose commercial operating system like Windows NT must also interact with a variety of peripherals, the common ones most of us use each day, as well as the more uncommon external devices that might be useful in some specific settings. For example, we expect the NT operating system to provide us with built-in support for our hard disks, keyboard, mouse, and video monitor. If, however, I wish to attach a programmable toaster device to my system (my new invention), and I would like to control this device using my computer, which is running Windows NT, I suspect that I will have to develop a driver to control the device. Furthermore, if I expect to be successful in developing this driver, I will obviously have to look to the operating system to provide an appropriate environment and support structure that makes developing, installing, testing, and using this driver a task that might be difficult but not insurmountable.
Although some might argue that such expectations of support from an operating system are unreasonable, the Windows NT operating system does provide such a framework, so that mere mortals like you and me can develop necessary drivers to control such esoteric devices as a programmable toaster. In fact, the NT operating system provides a consistent, well-defined I/O subsystem within which all code required to interface with external devices can reside. The I/O subsystem is extensive, encompassing file system drivers, intermediate drivers, device drivers, and services to support and interface with such drivers. It is also consistent in its treatment of external devices.
In this chapter, I will present an introduction to the NT I/O Manager, the component responsible for creating, maintaining, and managing the NT I/O subsystem. To develop any kind of driver for the Windows NT operating system, an understanding of the framework provided by the I/O Manager is extremely important. First, I will describe some of the services provided by the I/O Manager. Next, I will present an overview of the components comprising the I/O subsystem, including a discussion of the various types of drivers that can exist within the I/O subsystem. I will then describe some common data structures that kernel-mode developers should be familiar with. Following this is a discussion on some common issues involving I/O requests sent to kernel-mode drivers. Finally, I will present a description of the system boot sequence, with emphasis on the activities of the I/O Manager and the drivers within the kernel.
The NT I/O Subsystem
The NT I/O subsystem is the framework within which all kernel-mode drivers controlling and interfacing with peripheral devices reside. This subsystem is composed of the following components (see Figure 4-1):
Figure 4-1. Kernel-mode components, including the I/O subsystem
![]()
- The NT I/O Manager, which defines and manages the entire framework.
- File system drivers that are responsible for local, disk-based file systems.
- Network redirectors that accept I/O requests and issue them over the network to a file server. The redirectors are implemented similarly to other file system drivers.
- Network file servers that accept requests sent to them by redirectors on other nodes, and reissue these requests to local file system drivers. Although file servers do not need to be implemented as kernel-mode drivers, typically they are implemented as such for performance reasons.
- Intermediate drivers, such as SCSI class drivers. These drivers provide generic functionality that is common to a set of devices. Intermediate drivers also include drivers that provide added functionality, such as software mirroring or fault tolerance, by using the services of device drivers.
- Device drivers that interface directly with hardware, such as controller cards, network interface cards, and disk drives. These are typically the lowest-level kernel-mode drivers.
- Filter drivers that insert themselves into the driver hierarchy to perform functionality that is not directly available using the existing set of drivers. For example, a filter driver can layer itself above a file system driver, intercepting all requests that are issued to the file system driver. A filter driver could just as well layer itself below the file system driver, but above a device driver, intercepting all requests targeted to the device driver. Note that conceptually, the only tangible difference between filter drivers and other intermediate drivers is that filter drivers typically intercept requests targeted to some existing device and then provide their own functionality, either in lieu of or in addition to the functionality provided by the driver that was the original recipient of the request.
Functionality Provided by the NT I/O Manager
The NT I/O Manager oversees the NT I/O subsystem. The following is a list of some of the functionality provided by the I/O Manager:
- The I/O Manager defines and supports a framework that allows the operating system to use peripherals connected to the system.
The type and number of peripherals that can potentially be used with a Windows NT system is not limited, since new types of peripheral devices are continuously being designed and developed. Therefore, the I/O subsystem for a commercial operating system like Windows NT must be well-designed and extensible, such that it can easily accommodate the myriad devices, each with its own set of unique characteristics, that could be used.
- The NT I/O Manager provides a comprehensive set of generic system services used by the various subsystems to actually perform I/O or request other services from kernel-mode drivers.
Consider a read request initiated by a user process. This read request is directed to the controlling subsystem, such as the Win32 subsystem. Note that the Win32 subsystem does not actually direct the read request to the file system driver or device driver itself; instead it invokes a system service called
NtReadFile(), supplied by the I/O Manager. TheNtReadFile()system service then assumes the responsibility for directing the request to the appropriate driver and conveying the results to the Win32 subsystem. Also note that the buffer supplied by the user process requesting the read operation usually cannot be used directly by the kernel-mode drivers that will eventually satisfy the request. The I/O Manager provides the support to automatically perform the necessary operations that would allow the kernel-mode drivers to use a buffer address that is accessible in kernel-mode. Later in this chapter, I will describe this operation of manipulating user-mode buffers in further detail.Although the native NT system services are very poorly documented (if at all), you can find a detailed description of these services in Appendix A, Windows NT System Services, in this book.
- The NT I/O Manager defines a single I/O model that all drivers in the system must conform to. As mentioned above, this model consists of objects and a set of associated methods used to manipulate the objects. Kernel-mode drivers do not need to be concerned with the originator of an I/O request, since they respond to all I/O requests in the same manner.
This results in a consistent interface provided to users of the I/O subsystem, such as the Win32 or POSIX subsystem, and also protects the kernel-mode drivers from having to worry about the vagaries associated with the particular subsystem that issued the I/O request.
Furthermore, since every kernel-mode driver must conform to this single I/O model, kernel-mode drivers can use services provided by each other, since a kernel-mode driver does not really care whether the I/O request originates in kernel-mode or user-mode. That said, if you do invoke the services of another kernel-mode driver from your kernel-mode driver, there are certain considerations that you must be aware of. These will be described later in this chapter.
Finally, the single I/O model allows for the implementation of layered kernel-mode drivers, which are supported by the NT I/O Manager. Each kernel-mode driver in a layered hierarchy can utilize the services of the underlying driver to complete a specific operation. In turn, the underlying driver can satisfy the issued request without concerning itself with whether the request came to it directly from some user process or from a driver that resides above it in the hierarchy of layered drivers.
- The I/O Manager supports installable file system implementations that use the peripheral devices connected to the system.
The NT operating system includes support for the CD-ROM file system, the NTFS log-based file system, the legacy FAT file system, the LAN Manager File System Redirector, as well as the HPFS file system. In addition to supporting such native local- and network-based file systems, the I/O Manager provides the infrastructure for development of external, installable file systems, i.e., file system implementations from third-party vendors. You can purchase commercial implementations of NFS (the Network File System), DFS (the Distributed File System), and other file system and network redirector implementations.
- The NT I/O Manager supports dynamically loadable kernel-mode drivers.
- The I/O Manager provides support for device-independent services that can be utilized by other components of the NT operating system, as well as by kernel-mode drivers that are implemented by third-party vendors.
If a kernel-mode driver needs to invoke the dispatch routine for another kernel-mode driver, it can use the
IoCallDriver()service provided by the I/O Manager. Similarly, if a kernel-mode driver has to allocate a Memory Descriptor List (MDL) structure, theIoAllocateMdl()routine, can be used. There are other such services that are commonly used by kernel-mode components (including kernel-mode drivers), provided by the NT I/O Manager. The list of services is available in the Windows NT Device Drivers Kit (DDK).- The NT I/O Manager interacts with the NT Cache Manager to support virtual block caching of file data.
Later in this book, you will learn more about the functionality provided by the NT Cache Manager.
- The NT I/O Manager interacts with the NT Virtual Memory Manager and file system implementations to support memory-mapped files.
In the next chapter, you will read in detail about memory-mapped files. Support for memory-mapped files is provided jointly by the NT I/O Manager, the NT Virtual Memory Manager, and the appropriate file system driver.
If you wish to develop kernel-mode drivers for Windows NT, your driver must conform to the specifications provided by the NT I/O Manager. This includes creating and maintaining some data structures defined by the I/O Manager and also supplying the methods that manipulate such objects. Furthermore, your driver must respond appropriately to requests issued by the NT I/O Manager, and your driver must return results of each operation back to the I/O Manager. It is extremely unlikely that you can successfully develop a kernel-mode driver that does not use any of the services provided by the NT I/O Manager. Therefore, you will need to understand well the framework provided by the NT I/O Manager. The remainder of this chapter addresses some of these issues in further detail.
Concepts in I/O Manager Design
The design of the NT I/O subsystem exhibits a number of characteristics described in the following sections.
Packet-based I/O
The I/O subsystem is packet-based; i.e., all I/O requests are submitted using I/O Request Packets (IRPs). IRPs are typically constructed by the I/O Manager in response to user requests and sent to the targeted kernel-mode driver. However, any kernel-mode component can create an IRP and issue it to a kernel-mode driver using the
IoAllocateIrp()andIoCallDriver()I/O Manager routines described in the DDK.The I/O Request Packet is the only method you can use to request services from an I/O subsystem driver. By strictly conforming to this packet-based I/O model, the NT I/O Manager ensures consistency across the I/O subsystem and enables the layered driver model, described later in this section.
Each IRP sent to a kernel-mode driver represents a pending I/O request to that driver. An IRP will continue to be outstanding until the recipient of the IRP invokes the
IoCompleteRequest()service routine for that particular IRP. InvokingIoCompleteRequest()results in that I/O operation being marked as completed, and the I/O Manager then triggers any post-completion processing that was awaiting completion of the I/O request. A particular IRP can be completed only once; i.e., only one kernel-mode driver can invokeIoCompleteRequest()for any outstanding IRP in the system.You should be aware that, although packet-based I/O is the rule in Windows NT, the NT I/O Manager, NT Cache Manager, and the various NT file system implementations collaborate to implement functionality called the fast I/O path, which is an exception to this rule. The fast I/O method of I/O operations is only valid for file system drivers. These operations are implemented using direct function calls into the file system drivers and the NT Cache Manager instead of using the normal IRP method. The fast I/O path is described in detail later in this book.
NT object model
The I/O Manager conforms to the NT Object Model defined and implemented by the Object Manager component of the NT Executive.
Kernel-mode drivers, peripheral devices, controller cards, adapter cards, interrupts, and instances of open files are all represented in memory as objects that can be manipulated. These objects also have a set of methods, a set of operations that can be performed on the object, associated with them. For example, each controller card in the system is represented by a controller object, while each instance of an open file is represented by the file object data structure. The controller object can only be accessed using one of the methods associated with the object. This same restriction also applies to the file object structure, as well as to all other object types defined by the I/O Manager.
Note that kernel-mode drivers developed for Windows NT have to conform to this object-based model along with the rest of the I/O subsystem. All drivers must initialize a driver object structure representing the loaded instance of the device driver itself. In addition, if the driver manages devices or peripherals attached to the system, it must create and initialize one or more device object structures.
Since the I/O Manager uses the NT object model, it can also use the services of the Security Subsystem to control access to objects. The I/O Manager supports named object structures. For example, file objects have a name associated with them indicating the on-disk file that they represent. You can also create other named objects, such as device objects, that can then be opened by other processes or kernel-mode drivers.
Layered drivers
The I/O Manager supports layered kernel-mode drivers. Each driver in the hierarchy accepts an I/O Request Packet, processes it, and then invokes the next driver in the hierarchy.
Drivers lower in the hierarchy are closer to the actual hardware. However, only the lowest drivers typically interact directly with hardware devices or cards. The layered driver model is a boon to designers who wish to provide value-added functionality not supplied with the base operating system. This feature enables intermediate and filter drivers to be inserted into the driver hierarchy whenever required, and therefore allows new functionality to be easily added to the system. Furthermore, since each driver in the hierarchy interacts with drivers above and below it in a consistent fashion, development, debugging, and maintenance of kernel-mode drivers is a lot easier than on most other operating system implementations.
Asynchronous I/O
The NT I/O Manager supports asynchronous I/O,[1] allowing a thread to request I/O operations and continue performing other computational tasks until the previously requested I/O operations have been completed. This makes for greater parallelism in completing computational tasks as opposed to the purely sequential model in which a thread must wait for an I/O operation to proceed before it proceeds with other activity.
Figure 4-2 graphically illustrates the sequence of activities that occur when performing synchronous and asynchronous I/O operations. As you can see from the illustration, the thread using asynchronous I/O can continue performing computational activity in parallel with the servicing of the I/O request that it has initiated. This results in higher performance and higher net throughput for the system. Note that the default I/O mechanism is the synchronous model.
Figure 4-2. Synchronous/asynchronous processing
![]()
Preemptible and interruptible
The I/O subsystem is preemptible and interruptible. It is extremely important for all kernel-mode driver developers to understand these two concepts.
Every thread executing in kernel mode executes at a certain system-defined Interrupt Request Level (IRQL). Each IRQL has an interrupt vector assigned to it by the system, and there are a total of 32 different IRQLs defined by Windows NT. Any thread can have its execution interrupted due to an interrupt at a higher IRQL than the IRQL at which that thread is executing. When such an interrupt occurs, the Interrupt Service Routines (ISRs) associated with that particular interrupt are executed in the context of the currently executing thread. This results in a suspension of the current flow of execution so that thread can execute the ISR code.[2]
IRQ levels range from
PASSIVE_LEVEL(defined as numeric value 0), which is the default level at which all user threads and system worker threads execute, to IRQLHIGH_LEVEL(defined as numeric value 31), which is the highest possible hardware IRQL in the system. Most file system dispatch routines are executed at IRQLPASSIVE_LEVEL. However, most lower-level device driver routines (for example, SCSI class driver read/write dispatch entry points) are executed at higher IRQ levels--typically at IRQLDISPATCH_LEVEL(defined as numeric value 2).Since all code in the I/O subsystem is interruptible, drivers developed for the NT operating system must use appropriate synchronization and protection mechanisms to prevent data corruption for data accessed at different IRQ levels. For example, if your kernel-mode driver accesses a data structure at IRQL
PASSIVE_LEVELin the context of a system worker thread, and if this driver also needs to access this same data structure at IRQLDISPATCH_LEVELwhen servicing an interrupt request, the driver will have to use a spin lock that is always acquired at IRQLDISPATCH_LEVEL, which is the highest-level IRQL at which the spin lock could possibly be acquired, to provide mutually exclusive access to the data structure.[3]Threads executing I/O subsystem code in the kernel are also preemptible. The Windows NT operating system associates execution priorities with threads. These priorities are typically variable, and most user-level threads and system worker threads execute at relatively lower priorities, which allow them to be preempted by the NT scheduling code (in the NT Kernel) when a higher-priority thread is scheduled to run.
The fact that such threads could be preempted while executing kernel-mode code also necessitates synchronization mechanisms to ensure data consistency. This requirement is not present in other operating systems, such as the Windows 3.1 operating environment, or some versions of UNIX (e.g., HPUX, or SunOS), which currently do not allow preemption of threads or processes executing in kernel mode.
Kernel-mode driver designers must be extremely careful when acquiring common resources (e.g., read/write locks, semaphores) from within the context of different threads, because the Windows NT Kernel does not provide any built-in safeguards against programming errors resulting in situations like the priority inversion scenario described in Chapter 1, Windows NT System Components.
If you develop a driver that needs to acquire more than one synchronization resource at an IRQL that is less than or equal to
DISPATCH_LEVEL, you must also be careful to define a strict locking hierarchy. For example, assume that your kernel-mode driver has to lock twoFAST_MUTEXobjects, fast_mutex_1 and fast_mutex_2. You must define the order in which all threads in your driver can acquire both of these mutex objects. This order could be "acquire fast_mutex_1 followed by fast_mutex_2 or vice-versa. The reason for strictly defining and maintaining a locking hierarchy is to avoid a situation like one where thread-a acquired fast_mutex_1, wants to acquire fast_mutex_2, and gets preempted. Thread-b in the meantime gets scheduled to execute, acquires fast_mutex_2, and now needs to acquire fast_mutex_1. This scenario would cause a deadlock condition.Portable and hardware independent
The I/O subsystem is portable and hardware independent. Kernel-mode drivers developed for Windows NT environments are also required to be portable and hardware independent.
The NT Hardware Abstraction Layer (HAL) is responsible for providing an abstraction of the underlying processor and bus characteristics to the rest of the system. NT drivers must be careful to use the appropriate HAL, NT Executive, and I/O Manager support routines to ensure portability across Alpha, MIPS, PowerPC, and Intel platforms.
The vast majority of the code in the NT I/O subsystem is written in C, a high-level and portable language. NT currently also requires kernel-mode driver developers to write their code in the C language, though it is possible with some extra work to write and link drivers in assembly. However, development in low-level languages, such as assembly, is highly discouraged, because assembly languages are inherently processor/architecture specific, and therefore such drivers cannot execute on more than one type of processor architecture.[4]
Multiprocessor safe
The I/O subsystem is multiprocessor safe. Windows NT was designed from the ground up to be able to execute on symmetric multiprocessing environments.
Execution of NT kernel-mode code and drivers on multiprocessor machines requires careful synchronization by kernel designers to avoid data consistency problems. For example, on uniprocessor machines, a common practice used to avoid data consistency problems while servicing an interrupt is to disable all other interrupts on the same machine (e.g., via a
cliassembly instruction on x86 architectures). However, this same mechanism will fail on symmetric multiprocessor systems, because it is possible to encounter an interrupt on another processor, even though all interrupts had been disabled on the current processor. Similarly, on uniprocessor systems, it can be guaranteed (e.g., via usage of a critical section) that only one thread at a time can access a particular data structure. However, on symmetric multiprocessor architectures, even if preemption of a thread from a single processor were temporarily suspended, other threads executing on other processors could conceivably try to simultaneously access the same data structure.Typically, spin locks and other higher level (Executive) synchronization mechanisms must be used consistently and correctly in Windows NT drivers to ensure correct functionality on multiprocessor systems.
Modular
The NT I/O subsystem is modular. Any driver within the NT I/O subsystem can be easily replaced by another driver that provides support for the same dispatch entry points supported by the original driver. The use of I/O Request Packets to submit I/O requests and an object-based model where all I/O operations are invoked via standard methods (or well-defined dispatch routine entry points) allows easy replacement of one kernel-mode driver with another that responds appropriately to the same dispatch routines.
All drivers also invoke the services of the I/O Manager using a well-defined and consistent set of service and utility functions. Theoretically, therefore, the I/O Manager is also easily replaceable. In practice, however, the I/O Manager is an extremely complex and integral component of the core NT operating system, and would be extremely difficult to replace easily, even by developers at Microsoft itself.
One obvious benefit of the modularity in the I/O subsystem, however, is the relative ease with which I/O Manager support functions and driver functionality can be reimplemented without affecting any clients that use the services of the I/O Manager or such drivers. As long as the interfaces are maintained consistently, the internals of any implementation can be changed whenever required.
Configurable
All components of the I/O subsystem are configurable. The I/O Manager and all components that comprise the I/O subsystem try to maximize run-time configurability. The NT I/O Manager works with the HAL to determine the set of peripherals connected to the system at boot time. It then initializes the appropriate data structures to support these connected devices. This process avoids any requirements for hardcoding device configurations into the operating system. Windows NT does not as yet support true plug-and-play, though it should in the near future.
Kernel-mode drivers can be developed to manipulate devices; each driver is dynamically loadable and unloadable, minimizing unnecessary kernel overhead. The I/O Manager determines the drivers to be loaded, and the order in which they should be loaded, based upon the entries in the Windows NT Registry. I/O Manager configuration parameters, as well as those required by kernel-mode drivers, are obtained from the Windows NT Registry.
Any drivers that you develop should be as configurable as possible. This includes avoiding any hardcoded values in the driver code and instead obtaining these values from the system Registry, maximizing user configurability.
Process and Thread Context
Before discussing other details specific to the I/O Manager and the I/O subsystem, it would be useful for you to understand the concepts underlying thread/process contexts and to realize why a good grasp of these concepts is essential to understanding the operation of the various components in the Windows NT Kernel. To design and develop kernel-mode drivers under Windows NT successfully, you will need a solid grasp of these issues.
Every process in a Windows NT operating environment is represented by a process object structure and has an execution context that is unique to that process. The execution context for the process includes the process virtual address space (described in greater detail in the next chapter), a set of resources visible to that process, and a set of threads that belong to the process. Examples of resources owned by a process include file handles for files opened by that process, any synchronization objects created by that process, and any other objects that are created either by the process or on behalf of that process. Each process has at least one thread that is created and belongs to the process, although the process certainly could have numerous threads that belong to it. Note that in Windows NT, the fundamental scheduleable entity is a thread object and not the process object.
Each process is described internally by the Windows NT Kernel by a Process Environment Block (PEB) structure, which is opaque to the rest of the system. The PEB contains process global context, such as startup parameters, image base address, synchronization objects for process-wide synchronization, and loader data structures. Upon creation, the process is also assigned an access token called the primary token of the process. This token is used, by default, by threads associated with the process to validate themselves when they access any Windows NT object.
An object table is created for each new process object structure. This object table is either empty or a clone of the parent process object table, depending upon the arguments supplied to the system's create process routine and the inheritance attributes (
OBJ_INHERIT) for each of the objects contained within the object table for the parent process. The default access token and the base priority for a new process is the same as that of the parent process.A thread object is the entity that actually executes program code and is scheduled for execution by the Windows NT Kernel. Every thread object is associated with a process object; several threads can be associated with a single process object, which enables concurrent execution of multiple threads in a single address space. On uniprocessor systems, threads can never be executed concurrently; however, on multiprocessor systems, concurrent execution is possible and does occur.
Each thread object has a thread context unique to it. This context is architecture-dependent and is typically composed of the following:
- Distinct user and kernel stacks for the thread, identified by a user stack pointer and a kernel stack pointer
- Program counter
- Processor status
- Integer and floating-point registers
- Architecture-dependent registers
You will notice that object handles and other related information about open object structures stored in the process' object table are global to all threads associated with the process. Therefore, all threads in a process can access all open handles for the process, even those opened by other threads within the process. Threads belonging to other processes can only access objects that belong to the process to which they are affiliated; any attempt to access a resource owned by another process will result in an error returned by the Object Manager component in Windows NT.[5]
Threads are typically referred to as user-mode or kernel-mode threads. Note that there is no difference in the internal representation of such threads, as far as the Windows NT operating system is concerned. The only conceptual difference between such threads is the mode of the processor when the thread typically executes code, and the virtual address range that is therefore accessible by the thread. For example, a Win32 application process contains threads that execute code while the processor is in user mode and therefore are referred to as user-mode threads. On the other hand, there is a global pool of worker threads created by the Windows NT Executive in the context of a special system process that are used to execute operating system or driver code when the processor is in kernel mode; these threads are typically referred to as kernel-mode threads.
Although user-mode threads typically execute code with the processor in user mode, they often request system services, such as file I/O, which result in the processor executing a trap and entering kernel mode to execute the file system code that will service the I/O request. Notice that the user-mode thread is now executing operating system (file system driver) code with the processor in kernel mode, with all the rights and privileges that exist while the processor in this state. While executing in kernel mode, the thread can access kernel virtual addresses and perform operations that are otherwise always denied while the processor is in user mode.
Execution contexts
Consider a kernel-mode driver that you develop. The fact that this is a kernel-mode driver tells us that, while the code is being executed, the processor will be in kernel mode and will therefore be able to access the kernel virtual address range. You might wonder which set of threads will execute the code that you develop. Will it be some special thread that you would have to create, or will it be a user-mode thread that requests services from your driver, or will it be a thread on loan from the pool of system worker threads I referred to earlier?
The answer is, it depends. Your driver might always execute code in the context of a special thread that you may have created at driver initialization time, or it might execute code in the context of a user thread that has requested I/O services, or it might be invoked in the context of system worker threads. It is quite possible that, if you develop a file system driver, your driver will execute code in the context of all three types of threads. Furthermore, if you develop device drivers or other lower-level drivers that have their dispatch routines invoked in response to interrupts, your code will execute in the context of whichever thread was executing on that processor at the particular instant when the interrupt occurs. This is referred to as execution of code in the context of an arbitrary thread, i.e., a thread whose context is unknown to your driver. The operating system temporarily "borrows" the execution context of this thread to execute your driver routines simply because this thread happened to be executing code on the processor at the time the interrupt occurred.
As a kernel-mode driver designer, you must, therefore, always be aware of the execution context in which your code will execute. This execution context is always one of the following:
- The context of a user-mode thread that has requested system services
- If you develop a file system driver or a filter driver that resides above the file system in the driver hierarchy, then your code will often execute in the context of the user-mode thread that requested, say, a read operation. Your code will then be able to access the kernel virtual address range, as well as the virtual addresses in the lower 2GB of the virtual address space belonging to the user-mode process to which the requesting user-mode thread belongs.[6]
- Typically, only file system drivers or filter drivers that intercept file system requests should expect that their dispatch routines[7] will be executed directly in the context of user-mode threads. Other drivers cannot expect this, simply because higher-level drivers might have posted the user request to be executed asynchronously in the context of a worker thread, or your driver code might be executed in response to an interrupt as discussed previously.
- The context of a dedicated worker thread created by your driver or by some kernel-mode component (typically a component belonging to the I/O subsystem)
- File system drivers sometimes create special threads in the context of the system process (using the
PsCreateSystemThread()system service routine described in the DDK) that they subsequently use to perform operations that cannot otherwise be performed in the context of user-mode threads requesting I/O services. Filter drivers might also choose to create such dedicated worker threads; or for that matter, any kernel-mode component can choose to create one or more worker threads.
- If you write a file system driver, you might occasionally request that certain operations be carried out by such threads created by you. Your code will then execute in the context of your special threads. If, however, you write lower-level drivers, and if the file system uses a special thread to process I/O requests, your driver might now be invoked in the context of the special thread created by the file system driver. Either way, you can see that the code executes in the context of specially created threads belonging to the system process.
- The context of system worker threads specially created by the I/O Manager to serve I/O subsystem components
- It is possible for certain I/O operations to be performed in the context of system worker threads that are created by the I/O Manager. These worker threads are often used by file system driver implementations, or by device drivers or other kernel-mode components that need thread context to perform their operations. For example, consider asynchronous I/O requests from user-mode applications. Typically, a file system driver will service such a request by "posting" the request to be picked up and handled by a system worker thread. Control is immediately returned to the calling application once the request has been posted, and the I/O Manager will notify the application once the request has been serviced in the context of the system worker thread. In such a situation, all lower-level drivers will have their dispatch routines invoked in the context of the system worker thread. Note that a system worker thread belongs to the system process, just like the dedicated worker threads created by kernel-mode components described earlier.
- The important point to note here is that once the request has been posted to the system worker thread, the virtual address space now accessible in the context of the system worker thread is not the same as the virtual address space that was accessible in the context of the original, user-mode thread that requested the I/O operation. Similarly, the resources that were valid in the context of the original user-mode thread are no longer valid in the context of the system worker thread. The reason for this is obvious: the system worker thread executes in the context of the system process, and the user-mode thread that requested the I/O operation belongs to a distinct application process with its own object table, virtual address space, and process environment block.
- The context of some arbitrary thread
- Consider now a device driver able to service one IRP at any given point in time. Typically, most device drivers respond to I/O requests by queuing the IRP for delayed processing, and by returning control immediately to the driver above it in the hierarchy. The IRP will be processed later when the driver can get to it, which is when I/O Request Packets before it in the queue have been processed.
- So how is an IRP taken off the queue? Once the current I/O operation is completed by the target device, the device informs the operating system via a hardware interrupt. The operating system responds to this interrupt by invoking the Interrupt Service Routines that various drivers have associated with that specific interrupt. One of these Interrupt Service Routines will be the ISR specified by your driver. As part of ISR execution, the current IRP will complete, and the next IRP will be taken off the device queue and scheduled for actual I/O.[8]
- The point to note here is that the ISR is executed asynchronously, in the context of the currently executing thread--an arbitrary thread. Therefore, when responding to such an interrupt, the driver cannot assume that the virtual address space accessible to it is the same as that of the user thread that requested the IRP now being completed. Resources associated with that thread are not available to the driver code either, because the driver does not know which thread's context is being borrowed to execute the ISR code.
Importance of thread and process contexts
Your kernel-mode driver code will be invoked in one of the execution contexts described previously. The code you develop should be aware of the execution context in which it will be invoked, since that determines the restrictions under which your driver must operate.
Consider the case where you develop a kernel-mode driver that needs to open some object; for example, your driver may perform file I/O itself and may therefore open a file and receive a file handle in return.[9] If you open this file in your driver initialization code (the
DriverEntry()routine that every kernel-mode driver must have), you should be aware that this handle will only be valid in the context of the kernel process and the threads associated with the kernel process. So, if you use this handle in the context of system worker threads, the handle will be valid. However, if you attempt to use the handle in the context of a user thread, or an arbitrary thread context, your handle will not be valid. Similarly, if your driver opens an object while servicing a read request in the context of a user thread, the handle can be used only in the context of that thread. Any attempt to use the handle in the context of a system worker thread, for example, will result in an error.You must be also be aware of when you can safely use the user buffer address, passed to your driver, for a read or write I/O operation. The user specifies a virtual address pointer that is perfectly valid in the context of that particular user thread. However, if the I/O operation is not performed in the context of that user thread (e.g., the I/O operation is performed asynchronously), the virtual address passed in by the user application will no longer be valid and therefore cannot be used by the kernel-mode driver. The I/O Manager provides support for accessing user buffers in other contexts besides that of the requesting thread. I will discuss this support in detail later in this chapter.
As discussed above, there are certain restrictions on the resources that can be used by your driver, depending on the thread context in which your code executes. This thread context depends on the circumstances under which your code is invoked, and this context will determine the resources that your driver can utilize.
Objects and handles
All objects created by kernel-mode components in the Windows NT Executive can be referred to in two ways, either by using an object handle returned by the NT Object Manager when the object is created or opened, or by using a pointer to the object. Note that the pointer to an object allocated by a kernel-mode component will typically be valid in all execution contexts, because the virtual address referring to the object will be from the kernel virtual address range (more on this in the next chapter). However, as mentioned earlier, object handles are specific to the execution context in which the handle is obtained and hence are valid only in that particular execution context.
Remember that each object created by the NT Object Manager has a reference count associated with it. When the object is initially created, this reference count is set to 1. The reference count is incremented whenever a kernel-mode component requests the Object Manager to do so, typically via an invocation of
ObReferenceObjectByHandle(), which is described in the DDK. The reference count is decremented whenever a close operation is performed on the object handle. Kernel-mode drivers use theZwClose()system service routine to close a handle to any system-created object. The reference count is also decremented when a kernel-mode component invokesObDereferenceObject(),which requires the object pointer to be passed in. When the object count goes to zero, the object will be deleted by the NT Object Manager.In the course of this book, you will often find places where we open an object and receive a handle, then obtain a pointer to the object and stash it away someplace (possibly in global memory), reference the object, and close the handle. This allows us two advantages:
- By saving a pointer to the object, we can always reobtain a handle to the same object in the context of a thread other than the one that originally opened the object. You can find concrete examples of this later in the book.
- By referencing the object and closing the original handle, we are assured the object will not be deleted (until we finally dereference it for the last time), yet we are also assured that, once the last dereference operation is performed, the object will automatically be deleted.
Keep the above discussion in mind as you go through the discussion and code presented throughout this book. This methodology of working with objects and object handles will probably be used extensively by you when you develop your own kernel-mode driver.
Common Data Structures
Data structures are the heart of any computer application or operating system. The NT I/O Manager defines certain data structures that are important to kernel-mode driver designers and developers. Often, your driver will have to create and maintain one or more instances of these data structures to provide driver functionality. In this section, I will briefly discuss the structure and uses of some of the data structures that are important to file system driver and filter driver developers. Note that all of these structures are well documented in the Windows NT DDK. However, our objective here is to understand the reason for creating and working with these data structures, as well as to get a good understanding of the important fields that comprise these data structures.
Driver Object
The DRIVER_OBJECT structure represents an instance of a loaded driver in memory. Note that a kernel-mode driver can only be loaded once; i.e., multiple instances of the same driver will not be loaded by the Windows NT I/O Manager. The driver object structure is defined as follows:
typedef struct _DRIVER_OBJECT {CSHORT Type;CSHORT Size;/* a linked list of all device objects created by the driver */PDEVICE_OBJECT DeviceObject;ULONG Flags;PVOID DriverStart;ULONG DriverSize;PVOID DriverSection;/*********************************************************************the following field is provided only in NT Version 4.0 and later**********************************************************************/PDRIVER_EXTENSION DriverExtension;/*********************************************************************the following field is only provided in NT Version 3.51 and before**********************************************************************/ULONG Count;/*********************************************************************/UNICODE_STRING DriverName;PUNICODE_STRING HardwareDatabase;PFAST_IO_DISPATCH FastIoDispatch;PDRIVER_INITIALIZE DriverInit;PDRIVER_STARTIO DriverStartIo;PDRIVER_UNLOAD DriverUnload;PDRIVER_DISPATCH MajorFunction[IRP_MJ_MAXIMUM_FUNCTION + 1];} DRIVER_OBJECT;Earlier in this chapter, I discussed the NT packet-based I/O model. Each I/O Request Packet describes an I/O request. The major function of an I/O request packet is to request functionality from a driver.
We know that the IRPs will have to be dispatched to some I/O driver routines. If you examine the driver object structure, you will notice that it contains memory allocated for an array of function pointers called the
MajorFunctionarray. It is the responsibility of the kernel-mode driver to initialize the contents of this array for each major function that the kernel-mode driver supports. There are no restrictions on the number of functions that your driver must support, nor are there any restrictions specifying that each function pointer should point to a unique function; you could initialize the entry points for all major functions to point to a single routine and this would work perfectly (as long as your driver routine handled all the IRPs that would be directed to it). If you develop a kernel-mode driver, you will probably support at least one major function and should therefore initialize the function pointers appropriately.The
DriverStartIoand theDriverUnloadfields are also left for the driver to initialize. Lower-level Windows NT drivers typically provide aStartIofunction, which is invoked either when an IRP is dispatched to the driver, or when an IRP has just been popped off a queue. TheDriverStartIofield is initialized by lower-level drivers to point to this driver-suppliedStartIOfunction. Typically, as you will see in code presented later in this book, file system drivers and filter drivers will not need aDriverStartIoroutine, because such drivers manage their pending I/O Request Packets via other internal queue management implementations. TheDriverUnloadfield should point to a routine that is executed just before the driver is unloaded. This allows your kernel-mode driver an opportunity to ensure that any on-disk information is in a consistent state, as well as to allow lower-level drivers to put the device(s) they control into a known state. Note that it is not required that your driver be unloadable; in particular, file system drivers are extremely difficult to design so that they can be unloaded on demand. If your driver cannot be unloaded, you must not initialize theDriverUnloadfield in the driver object structure (the field is initialized to NULL by the I/O Manager and therefore your driver entry routine need not do anything to this field).Many kernel-mode drivers create one or more device object structures. These structures are linked in the
DeviceObjectfield in the driver object structure. At driver load time, this linked list is empty. However, the NT I/O Manager fills the list with pointers to device objects created by your driver as such device objects are created using theIoCreateDevice()service routine.To load a driver, the I/O Manager executes an internal routine called
IopLoadDriver(). This routine performs the following functionality:
- Determines the name of the driver to be loaded and checks whether the driver has already been loaded by the system.
The I/O Manager checks to see whether the driver has already been loaded by examining a global linked list of loaded kernel modules. If the driver is already loaded, the I/O Manager immediately returns success; otherwise, it continues with the process of loading the driver. To have your driver loaded, your installation utility must have created an appropriate entry in the Registry. See Part 3 for more information on how the Registry must be configured for kernel-mode file system and filter drivers.
- If the driver is not loaded, the I/O Manager requests the Virtual Memory Manager (VMM) to map in the driver executable. As part of mapping in the driver code, the VMM checks to see that the file contains a valid Windows NT executable format. If the driver was built incorrectly, the VMM will fail the map request and the I/O Manager, in turn, will fail the driver load request.
- Now the I/O Manager invokes the Object Manager, requesting that a new driver object be created. Note that the
DRIVER_OBJECTtype is an I/O Manager-defined object type, which was previously created by the I/O Manager at system initialization time; it is therefore recognized as a valid object type by the NT Object Manager. Note also that the returned driver object structure is allocated from nonpaged system memory and is, therefore, accessible at all IRQ levels.
- The I/O Manager zeroes out the driver object structure returned by the Object Manager. Each entry in the
MajorFunctionarray is initialized toIopInvalidDeviceRequest(). This is the default dispatch routine for the various entry points. This routine simply sets a return status ofSTATUS_INVALID_DEVICE_REQUESTand returns control to the calling process.
- The I/O Manager initializes the
DriverInitfield to refer to the initialization routine in your driver (theDriverEntryroutine).DriverSectionis initialized to the section object pointer[10] for the mapped executable,DriverStartis initialized to the base address to which the driver image was mapped, andDriverSizeis initialized to the size of the driver image.
- The I/O Manager requests that the object be inserted into the linked list of driver objects maintained by the NT Object Manager. In return, the I/O Manager gets a handle to the object. This handle is referenced by the I/O Manager and closed, thereby ensuring that the object will be deleted when dereferenced at driver unload time.
- The
HardwareDatabasefield is initialized with a pointer to the Configuration Manager's hardware configuration information; this field could be used by lower-level drivers to determine the hardware configuration for the current boot cycle. The I/O Manager also initializes theDriverNamefield so that it can be used by the error logging component when required.
- Finally, the I/O Manager invokes the driver initialization routine, which is where your driver gets the opportunity to initialize itself, including initializing the function pointers in the driver object structure. You should note that your driver initialization routine is always invoked at IRQL
PASSIVE_LEVEL, allowing you to use pretty much all of the system services available. Furthermore, your initialization routine will be invoked in the context of the system process; this is especially important to keep in mind if you open any objects or create any objects resulting in a handle being returned to you. Any such handles will only be valid in the context of the system process. In order to be able to use such objects in the context of other threads, you will have to use the methodology described earlier in the chapter, where you obtain a pointer to the object and then subsequently obtain handles in the context of other threads as and when required.
If your driver fails the initialization routine it will automatically be unloaded by the Windows NT I/O Manager. Remember to deallocate any allocated memory prior to returning control to the I/O Manager and also to close and dereference any open objects, or else you will leave a trail behind you that could lead to degraded or impaired system behavior.
The driver entry routine is the initialization routine for a kernel-mode driver and is invoked by the I/O Manager. Each kernel-mode driver can also register a re-initialization routine that is invoked after all other drivers have been loaded and the rest of the I/O subsystem, as well as other kernel-mode components, have been initialized. In NT 3.51 and earlier, the
Countfield in the driver object structure contained a count of the number of times the reinitialization routine had been invoked.Beginning with NT 4.0 and later, the NT I/O Manager allocates an additional structure that is an extension of the original driver object structure. This driver extension structure is defined below and contains fields to support plug-and-play for lower-level drivers that manage hardware devices and peripherals. The
Countfield has been moved to the driver extension structure with the new release; however, it still provides the same functionality as it did in earlier releases. Plug-and-play support is provided by lower-level drivers and will not be covered in this book.typedef struct _DRIVER_EXTENSION {// back pointer to driver objectstruct _DRIVER_OBJECT *DriverObject;// driver routine invoked when new device addedPDRIVER_ADD_DEVICE AddDevice;ULONG Count;UNICODE_STRING ServiceKeyName;} DRIVER_EXTENSION, *PDRIVER_EXTENSION;Finally, notice that there is a pointer to a fast I/O dispatch table in the driver entry structure. Currently, only file system driver implementations provide support via the fast I/O path. Essentially, the fast path is simply a way to avoid the abstract, clean, modular, yet relatively slow method of using packet-based I/O. Using the function pointers provided by the file system driver in this structure, the NT I/O Manager can either directly invoke the file system dispatch routines or call directly into the NT Cache Manager to request I/O without having to set up an IRP structure. The
FastIoDispatchfield should be initialized by the driver entry routine to refer to an appropriate structure containing initialized file system entry points. In the coverage of the NT Cache Manager, provided later in this book, you will see a detailed discussion of the entry points that comprise the fast I/O method of I/O.Device Object
Device object structures are created by kernel-mode drivers to represent logical, virtual, or physical devices. For example, a physical device, such as a disk drive, is represented in memory by a device object. Similarly, consider the situation where you develop an intermediate driver that presents a large physical disk as three smaller disks or partitions. Now, there will be one device object, representing a large physical disk, that is created by the lower-level disk driver, and your intermediate driver should create three additional device objects, each of which represents a virtual disk. Finally, a driver might choose to create a device object to represent a logical device; for example, the file system drivers create a device object to represent the file system implementation. This device object can be opened by other processes and can be used to send specific commands targeted to the file system driver itself.
Without a device object, a kernel-mode driver will not receive any I/O requests, since there must be a target device for every I/O request dispatched by the I/O Manager. For example, if you develop a disk driver and do not create a device object structure representing this particular disk device, no user process can access this disk. Once you do create a device object for the disk, however, file system drivers can potentially mount any volumes present on the physical media and user-mode processes can try to read and write data from the disk.
Unnamed device objects are rarely created by kernel-mode drivers, since such device objects are not easily accessible to other kernel-mode or user-mode components. If you create an unnamed device object, none of the other components in the system will be able to open it, and therefore, no component will direct any I/O to it. However, one common example of unnamed device objects are those created by file system drivers to represent mounted file system volumes. In this case, there is a device object, created by the disk driver representing the physical or virtual disk, on which the file system volume resides, and a Volume Parameter Block (VPB) structure (described later) performs the association between the named physical disk device object and the unnamed logical volume device object created by the file system driver. I/O requests are sent to the device object representing the physical disk. However, the I/O Manger checks to see whether the disk has a mounted volume on it (mounted volumes are identified by an appropriate flag in the VPB structure for the device object that represents the physical disk), and if so, it redirects the I/O to the unnamed device object representing the instance of the mounted volume.
When your driver issues a call to
IoCreateDevice()to request creation of a device object, it can specify an additional amount of nonpaged memory to be allocated and associated with the newly created device object. The reason is to have a global memory area reserved for and associated with that particular device object. This memory is called the device object extension and will be allocated by the I/O Manager on behalf of your driver. The I/O Manager initializes theDeviceExtensionfield to point to this allocated memory. There are no constraints mandated by the I/O Manager on how this memory object should be used by your driver. You may wonder what the difference is between requesting a device extension and declaring global static variables. The answer can be summed up as potentially cleaner code design. Another important benefit is that device-specific global variables stored in a device object extension become logically associated with the device object immediately, and therefore you can avoid unnecessary acquisition of synchronization resources before accessing this device-object-specific data.Any static variables declared by your kernel-mode driver are global to the entire Windows NT operating system. They are also not logically associated with any particular device object, so if your driver creates and manages multiple device object structures, you will have to design some method where the global structures can be associated with specific device objects. Note, however, that both statically declared global variables and the device extensions are allocated from nonpaged pool, although you can request that your static variables be made pageable (typically, this is never done). Many kernel-mode drivers make use of both statically declared global variables that are required by the entire driver, and a driver extension containing global variables that are specific to the context of a certain device object structure.
The device object structure is defined as follows:
typedef struct _DEVICE_OBJECT {CSHORT Type;USHORT Size;LONG ReferenceCount;struct _DRIVER_OBJECT *DriverObject;struct _DEVICE_OBJECT *NextDevice;struct _DEVICE_OBJECT *AttachedDevice;struct _IRP *CurrentIrp;PIO_TIMER Timer;ULONG Flags;ULONG Characteristics;PVPB Vpb;PVOID DeviceExtension;DEVICE_TYPE DeviceType;CCHAR StackSize;union {LIST_ENTRY ListEntry;WAIT_CONTEXT_BLOCK Wcb;} Queue;ULONG AlignmentRequirement;KDEVICE_QUEUE DeviceQueue;KDPC Dpc;ULONG ActiveThreadCount;PSECURITY_DESCRIPTOR SecurityDescriptor;KEVENT DeviceLock;USHORT SectorSize;USHORT Spare1;/**********************************************************************the following fields only exist in NT 4.0 and later**********************************************************************/struct _DEVOBJ_EXTENSION *DeviceObjectExtension;PVOID Reserved;/**********************************************************************the following field only exists in NT 3.51 and earlier versions**********************************************************************/LARGE_INTEGER Spare2;} DEVICE_OBJECT;Any kernel-mode driver can direct the I/O Manager to create a device object using the
IoCreateDevice()routine. This routine, if successful, will return a pointer to the device object structure that is allocated from nonpaged memory. Many of the fields in the device object structure are reserved for use by the I/O Manager. A brief description of the important fields is given below:
- As long as the
ReferenceCountfield is nonnull, two invariants hold true. First, the device object will never be deleted. Second, the driver object representing the driver that created this device object will never be deleted (i.e., the driver will never be unloaded as long as any of the device objects created by the driver has a positive reference count). TheReferenceCountfield is manipulated at various times by the I/O Manager and can also be manipulated by the driver.[11] An example of this field being incremented by the I/O Manager is whenever a new file stream is opened on a mounted volume; the reference count for the device object representing the mounted volume is incremented by 1 to ensure that the volume is not dismounted as long as any file is open. This also ensures that the file system driver is not unloaded as long as any file is open, since unloading the driver could lead to a system crash. Similarly, whenever a new volume is mounted, the device object representing the logical volume has its reference count incremented to ensure that both the device object and the corresponding driver object are not deleted.
- The I/O Manager initializes the
DriverObjectfield to refer to the driver object representing the loaded instance of the kernel-mode driver that invoked theIoCreateDevice()routine.
- All device objects created by a kernel-mode driver are linked together using the
NextDevicefield in the device object. Note that there is no particular order in which a kernel-mode driver, traversing this linked list, should expect to find created device objects. As it happens, the I/O Manager adds new device objects to the head of the linked list; therefore, you will probably find the last device object inserted at the beginning of the list.
- In this chapter, as well as in Chapter 12, Filter Drivers, you will be exposed to more detail about how filter drivers can be developed for Windows NT environments. These filter drivers are intermediate-level drivers that intercept I/O requests targeted to certain device objects by interjecting themselves into the driver hierarchy and by attaching themselves to the target device objects. The concept of attaching to a device object is simple, as illustrated in Figure 4-3.
Figure 4-3. Illustration of a device object being attached to another
![]()
When a device object is attached to another (via the I/O-Manager-provided
IoAttachDevice()or theIoAttachDeviceByPointer()routines), theAttachedDevicefield in the device being attached to (device object #1 in Figure 4-3) will be set to the address of the device object being attached (device object #2).- The
CurrentIrpfield is of interest to designers of device drivers or other lower-level drivers. Such drivers typically use the I/O-manager-suppliedIoStartNextPacket()orIoStartPacket()routines to queue and dequeue an IRP from the driver queue of pending IRPs. Once the I/O manager dequeues a new IRP, it makes the dequeued IRP the current IRP to be processed by the driver. To do this, it inserts the IRP pointer in theCurrent-Irpfield of the device object. The I/O manager subsequently passes a pointer toDeviceObject->CurrentIrpwhen invoking the device driverStartIo()dispatch routine.
This field is typically not of much interest to higher-level drivers.
- The
Timerfield is initialized when the driver invokesIoInitializeTimer(). This allows the I/O Manager to invoke the driver-supplied timer routine every second.
- The device object
Characteristicsfield describes some additional attributes for the physical, logical, or virtual device that the object represents. The possible values areFILE_REMOVABLE_MEDIA,FILE_READ_ONLY_DISK,FILE_FLOPPY_DISK,FILE_WRITE_ONCE_MEDIA,FILE_REMOTE_DEVICE,FILE_DEVICE_IS_MOUNTED, orFILE_VIRTUAL_VOLUME. This field is manipulated by the I/O Manager, as well as by the file system or kernel-mode driver that manages the device object.
- The
DeviceLockis a synchronization-type event object allocated by the I/O Manager. Currently, this object is acquired by the I/O Manager prior to dispatching a mount request to a file system driver. This allows synchronization of multiple requests to mount the volume. You should only be concerned with this event object if you design a file system driver that uses the I/O-Manager-suppliedIoVerifyVolume()routine (described in Part 3). In that case, you should be careful not to invoke that routine when you get a mount request from the I/O Manager, since theDeviceLockwould have been previously acquired by the I/O Manager prior to sending you the mount IRP; invoking the verify routine would cause the I/O Manager to try to reacquire this resource and cause a deadlock.
- The I/O Manager allocates memory for the device extension and initializes the
DeviceExtensionfield to point to this allocated memory.
I/O Request Packets (IRP)
As described earlier, the Windows NT I/O subsystem is packet-based. Kernel-mode drivers that comprise the I/O subsystem receive I/O Request Packets (IRP), which contain details of the operation being requested. The recipient of the IRP is responsible for processing the IRP, and either forwarding it on to another kernel-mode driver for additional processing, or completing the IRP, indicating that processing of the request described in the IRP has been terminated.
IRP allocation
All I/O requests are routed through the NT I/O Manager. Most often, a user process executes a Win32- or other subsystem-specific I/O request (e.g.,
CreateFile()), and this request gets translated to an NT system service call to the I/O Manager. Upon receiving the I/O request, the I/O Manager identifies the driver that should service the I/O request. Most likely, this will be a file system driver that will have mounted the file system on the physical device to which the I/O request is targeted.To dispatch the request to the kernel-mode driver, the I/O Manager allocates an I/O Request Packet using the routine
IoAllocateIrp().[12] This structure is always allocated from nonpaged pool. The method of allocation differs slightly in the various versions of Windows NT.NOTE: A zone is a system-defined structure supported by the Windows NT Executive and is used to efficiently manage allocation and deallocation of fixed-sized chunks of memory. Allocating and freeing memory using zones is more efficient than asking for small chunks of memory from the VMM, which could also lead to some internal memory fragmentation. Using a zone requires your driver to perform two steps: first, allocate the memory that will comprise the zone and inform the NT Executive about this allocated pool, as well as the size of entries you will allocate from the zone; second, use the available
ExAllocateFromZone()and other related support routines to allocate and free entries using the zone.Read Chapter 2, File System Driver Development, for a discussion on how to use zones in your driver.
In NT version 3.51 and earlier, the I/O Manager first attempts to allocate the IRP from a zone composed of fixed-sized IRP structures. As you will read later in this discussion of IRPs, the size of the IRP depends upon the number of stack locations that are required for the IRP. Therefore, the I/O Manager keeps two zones available, one for IRPs with relatively fewer stack locations, and the other for I/O Request Packets with a larger number of stack locations. If the zone from which allocation is attempted is found empty (this can happen in high-load situations where an extremely large amount of concurrent I/O is in progress), the I/O Manager requests memory for the IRP directly from the VMM (actually, the I/O Manager uses the
ExAllocatePool()support routine provided by the NT Executive). For I/O requests that originate in user-mode, if no memory is currently available, an error is returned to the user application indicating that the system is out of available resources. However, for I/O requests that originate in kernel-mode, the I/O Manager attempts to allocate memory for the IRP from theNonPagedPoolMustSucceedmemory pool. If this memory allocation request does not succeed, the attempt will result in a system bugcheck.The methodology used in NT version 4.0 is similar with one slight variation: the I/O Manager uses lookaside lists, a new structure used to manage fixed-sized pools of memory introduced in this new release, instead of zones. The reason for this new structure is to gain some efficiency, because lookaside lists do not always use spin locks to perform synchronization; instead they use an atomic 8-byte compare exchange instruction on architectures where such support is possible.
Other kernel-mode components besides the I/O Manager can use the I/O-Manager-supplied routine
IoAllocateIrp()to request a new IRP structure. This IRP can subsequently be used to send a I/O request to a kernel-mode driver. Other routines provided by the I/O Manager that also useIoAllocateIrp()to obtain a new IRP structure and then return these newly allocated IRPs after the initialization of certain fields areIoMakeAssociatedIrp(),IoBuildSynchronousFsdRequest(),IoBuildDeviceIoControlRequest(), andIo-BuildAsynchronousFsdRequest(). Consult the DDK for more information on these routines. Part 3 also uses some of these routines in implementing filter drivers.IRP structure
Logically, each I/O Request Packet is composed of the following:
- The IRP header
- I/O Stack Locations
The IRP header contains general information about the I/O request, useful to the I/O Manager as well as to the kernel-mode driver that is the target of the request. Many of the fields in the IRP header can be accessed by a kernel-mode driver; other fields exist solely for the convenience of the I/O Manager and should be considered off-limits by the drivers processing the IRP.
Here is a brief explanation of important fields that comprise the IRP header:
MdlAddress- A Memory Descriptor List (MDL) is a system-defined structure that describes a buffer in terms of the physical memory pages that back up the virtual address range comprising the buffer. There are different ways in which buffers used for I/O request handling can be passed down to the kernel-mode driver. Descriptions for the three methods will appear shortly. Remember for now, though, that if the DirectIo method is used, the
MdlAddressfield will contain a pointer to the MDL structure that can then be used in data transfer operations.
AssociatedIrp- This field is an union of three elements, defined as follows:
union {struct _IRP *MasterIrp;LONG IrpCount;PVOID SystemBuffer;} AssociatedIrp;- Any IRP structure that has been allocated can be categorized as either a master IRP or an associated IRP. An associated IRP is, by definition, associated with some master IRP, and can be created only by a higher-level kernel-mode driver. By creating one or more associated IRPs, the highest-level driver can split up the original I/O request and send each associated IRP to lower-level drivers in the hierarchy for further processing.
- For example, higher-level drivers sometimes execute the following loop:
while (more processing is required) {create an associated IRP using IoMakeAssociatedIrp();send the associated IRP to a lower-level driver usingIoCallDriver();if (STATUS_PENDING is returned) {wait on an event for the completion of the associated IRP;} else {associated IRP was completed;check result and determine whether to continue;}}- For an associated IRP, the union described here contains a pointer to the master IRP. For a master IRP, however, this union contains the count of the number of associated IRPs for this master IRP; or, if no associated IRPs have been created, the
SystemBufferpointer might be initialized to a buffer allocated in kernel virtual address space for data transfer. System buffers are allocated by the I/O Manager when a kernel-mode driver requests buffered I/O (described later in this book).
- Note that the
IrpCountfield is manipulated under the protection of an internal I/O Manager resource. Therefore, external kernel-mode drivers must not attempt to manipulate or access the contents of this field directly.
ThreadListEntry- This field is typically manipulated by the I/O Manager. Before invoking a driver dispatch routine via
IoCallDriver(), all I/O Manager routines insert the IRP into a linked list of IRPs for the thread in whose context the I/O operation is taking place. For example, if a user thread invokes a read request, the I/O Manager will allocate a new IRP structure, and insert it into the list of IRPs being processed by the user thread prior to invoking the file system read dispatch routine.NOTE: There is a field in each thread structure called
IrpList, which serves as the head of a linked list of pending I/O Request Packets. TheThreadListEntryfield, described earlier, is used to queue the IRP to this linked list. This list is used to track all pending I/O Request Packets for the thread in question; this is especially useful when the I/O subsystem tries to cancel IRPs for a particular thread.Note that the
IoAllocateIrp()routine does not queue the returned IRP to the linked list of outstanding IRPs for the current thread. Therefore, when a cancel request is posted, that IRP will not be found among the list of IRPs for the thread.IoStatus- This field should be appropriately updated by your kernel-mode driver before completing the I/O Request Packet. A description of the structure is provided later in this chapter. Note that this field is part of the IRP structure, and not part of the I/O status block structure passed in to the I/O Manager by the thread requesting the I/O operation. It is the I/O Manager's responsibility to transfer the results of the I/O operation from this field to the I/O status structure submitted by the requesting thread. This operation is performed by the I/O Manager as part of the postprocessing of the IRP, once the IRP has been completed by kernel-mode drivers.
RequestorMode- When code in your driver is executed, it would be useful if you knew whether the caller was a user-mode thread (e.g., an application requesting an I/O operation), or if the caller was a kernel component (some other driver requesting your services in the context of a system worker thread).
- You may wonder why such information could be useful. Think about the case where the caller is a user-mode thread; you know then that you cannot blindly assume that the arguments passed in to your driver are legitimate. If your driver uses the direct-IO method of passing buffer pointers (explained later), you will need to convert the passed-in addresses to something usable by your kernel-mode code. This is especially true if the request will be handled asynchronously by your driver.
- On the other hand, if your driver is invoked from a system worker thread, you could bypass these argument checks, because you could assume that addresses passed in to you are legitimate and usable directly by your driver.
- Similarly, the NT I/O Manager, as well as other kernel components such as the Virtual Memory Manager, need to identify and differentiate whether clients of their services are executing kernel-mode (operating system) code, or whether the request came from a user-space component. This information is used to check the legitimacy of the arguments passed in to these kernel-mode components.[13]
- The solution used throughout the NT Executive is to identify the processor mode in which the calling thread executed prior to invoking the services of the kernel-mode component. Note that the key concept here is that the previous mode of the calling thread is important; the very fact that the thread is executing kernel-mode code at the instant when the check is made tells us that the current mode will always be kernel mode. To obtain the previous mode information, the I/O Manager directly accesses a field in the thread structure. The
ExGetPreviousMode() function, declared in the DDK, provides the same functionality to third-party driver developers. This routine returns the previous mode of the thread being checked: user or kernel mode.
- The I/O Manager puts the information about the previous mode of the requesting thread into the
RequestorModefield prior to invoking theIoCallDriver()routine, which, in turn, invokes one of your driver dispatch routines. You should use this information both internally in your driver, as well as in invocations to system service routines such asMmProbeAndLockPages().
PendingReturned- Each IRP is typically handled by more than one driver in the hierarchy. To process an IRP asynchronously, a kernel-mode driver must execute the following steps:
- Mark the IRP pending by invoking the
IoMarkIrpPending()function.
- Queue the IRP internally.
Lower-level drivers may use a
StartIo()function instead.
- Return a status code of
STATUS_PENDING.The
IoMarkIrpPending()call (implemented as a macro) simply sets theSL_PENDING_RETURNEDflag in theControlfield of the current I/O stack location.[14]At the time of IRP completion processing, during the execution of the IoCompleteRequest()function, the I/O Manager traverses each stack location that had been used by drivers in the hierarchy, looking for any completion routines that may need to be invoked. This traversal of stack locations happens in reverse order from that used in processing the IRP. The most recently used stack location is processed first (the one used by the lowest-level driver in the hierarchy that processed the IRP), followed by the next one, and so on.
As each stack location is unwound, the I/O Manager notes whether the SL_PENDING_RETURNEDflag had been set in the I/O stack location, and sets thePendingReturnedflag to TRUE if the flag had been set. However, if the flag was not set in the stack location, the I/O Manager sets thePending-Returnedfield to FALSE.
WARNING: The value of the
PendingReturnedfield may change as the I/O stack locations are being traversed, while the I/O Manager looks for completion routines that may need to be invoked.
So why is the value of this field important? Well, later on in the IoCom-pleteRequest()function, the I/O Manager checks the value of thePendingReturnedfield to determine whether or not to queue a special kernel Asynchronous Procedure Call (APC) to the thread that originally requested the I/O operation. Your file system or filter driver will have to cooperate with the I/O Manager to ensure that the right course of action is adopted. You will see how your driver's actions affect the behavior of the I/O Manager later in this chapter.
Cancel,CancelIrql, andCancelRoutineKernel-mode drivers that process I/O Request Packets that might potentially require an indefinite time interval to be completed should provide appropriate IRP cancellation support. Our perspective is that of a file system driver or that of a filter driver. We would need to provide this functionality if we do not pass on IRPs to lower-level disk or network drivers but perform our own processing instead. Note that all three fields listed above are manipulated by either the driver or the I/O Manager to provide the capability to cancel pending I/O Request Packets when required.
ApcEnvironmentWhen an IRP is completed, the I/O Manager performs postprocessing on the IRP, the details of which are given below. The ApcEnvironmentfield is used internally by the I/O Manager in performing postprocessing on the IRP in the context of the thread that originally requested the I/O operation. This field is initialized by the I/O Manager when allocating the IRP and should not be accessed by driver designers.
Zoned/AllocationFlagsThe Zonedfield was replaced with theAllocationFlagsfield in NT version 4.0. Fundamentally, the field (called by whatever name) records internal bookkeeping information used by the I/O Manager during IRP completion to determine whether the IRP was allocated from a zone/lookaside list, or from system nonpaged pool, or from system nonpaged-must-succeed pool. This information is not useful from the kernel driver's perspective, except when debugging the driver and trying to locate all IRP structures allocated out of the global lookaside list or zone.
Caller-supplied arguments The following are part of the IRP:
PIO_STATUS_BLOCK UserIosb;PKEVENT UserEvent;union {struct {PIO_APC_ROUTINE UserApcRoutine;PVOID UserApcContext;} AsynchronousParameters;LARGE_INTEGER AllocationSize;} Overlay;
The UserIosbfield in the IRP is set by the I/O Manager to point to the I/O status block supplied by the thread requesting I/O. As part of the postprocessing performed by the NT I/O Manager upon completion of an IRP, the I/O Manager copies the contents of theIoStatusfield to the I/O status block pointed to by theUserIosbfield.
Most NT I/O system service routines (documented in Appendix A) accept an optional event argument. This argument (if supplied by the caller) is initialized by the NT I/O Manager to the not-signaled state and is set to the signaled state by the I/O Manager upon completion of I/O. The I/O Manager fills in the UserEventfield with the address of the caller-supplied event object.
The AllocationSizefield in theOverlaystructure is only valid for file create requests. The user is allowed to specify an optional initial size for a file being created. The I/O Manager initializes theAllocationSizefield with this caller-supplied size prior to invoking the file system driver create/open dispatch routine.
Many of the NT system services provided for I/O operations by the NT I/O Manager allow asynchronous operations. The caller thread can request that I/O be performed asynchronously and can also specify an APC to be invoked upon completion of the IRP. For these system services, the I/O Manager dutifully invokes the user-supplied APC, passing it the supplied APC context, as part of the postprocessing performed by the I/O Manager upon completion of the IRP by a kernel-mode driver. The I/O Manager stores the calling-thread-supplied APC function pointer in the UserApcRoutinefield. The context is stored in theUserApcContextfield. Some examples of asynchronous system services are directory control, read, write, and lock operations. Note that create/open requests are always processed synchronously, and therefore theAllocationSizefield and theAsynchronousParametersform part of theOverlayunion structure.
For I/O operations that involve transferring data, the caller supplies a data buffer. This buffer might serve as an input buffer, an output buffer, or both. In any case, the I/O Manager initializes the UserBufferfield with the caller-supplied buffer pointer before invokingIoCallDriver(). Upon IRP completion, if there is any data that needs to be copied back to the caller's buffer, the I/O Manager performs this function as part of postprocessing done on the IRP. If your driver does not specify either direct I/O or buffered I/O as the preferred method of user buffer manipulation, the I/O Manager will assume that you will handle the user-supplied buffer yourself and will therefore not allocate an MDL, or supply your driver with a system address. Your driver can subsequently use the buffer pointer in theUserBufferfield directly.[15]
TailAn IRP has a Tailstructure defined as follows:
union {struct {KDEVICE_QUEUE_ENTRY DeviceQueueEntry;PETHREAD Thread;PCHAR AuxiliaryBuffer;LIST_ENTRY ListEntry;struct _IO_STACK_LOCATION *CurrentStackLocation;PFILE_OBJECT OriginalFileObject;} Overlay;KAPC Apc;ULONG CompletionKey;} Tail;
This structure consists of fields that are manipulated and accessed directly only by the NT I/O Manager. It is not recommended that your driver try to directly access the contents of these fields.
The DeviceQueueEntryfield is used to queue IRPs for a specific lower-level driver. Most lower-level drivers allow the NT I/O Manager to maintain a list of pending I/O Request Packets. The I/O Manager uses theDeviceQueueEntryfield to queue the packet for the target device object, if the device object is found to be busy whenIoStartPacket()is invoked by the device driver dispatch routine. The DDK describes theIoStartPacket(),IoStartNextPacket(), andIoStartNextPacketByKey()support routines, which manipulate this field. Kernel-mode drivers should not try to directly access or manipulate the contents of theDeviceQueueEntryfield.
Before dispatching an IRP, the I/O Manager initializes the Threadfield to point to the thread in whose context the dispatch will occur. This field is subsequently used by both lower-level drivers and file system drivers.
Consider the situation when a hard error occurs. File systems use the IoRaiseInformationalHardError()call to place a pop-up message box on the system console to notify the user of the error situation. This call is blocking and it displays the error by delivering a special kernel APC to the target thread. The problem is that the thread in whose context the message box is displayed is blocked until a user physically dismisses the error message from the system console. If, however, no thread is specified in the argument list to theIoRaiseInformationalHardError()routine, the error message is delivered in the context of a special (single) system worker thread.
Typically, if an error occurs, a kernel-mode driver will examine the Overlay.Threadfield to determine if the thread is a system worker thread. If it is, then the driver will send in a NULLThreadargument toIoRaiseInformationalHardError(), because blocking system worker threads for an indefinite amount of time is clearly unacceptable.
Another instance when the Threadfield assumes importance is in the handling of removable media. If a user-induced error occurs when reading/writing removable media, the lower level device driver uses theIoSetHardErrorOrVerifyDevice()routine to indicate that something unexpected has occurred and that higher-level drivers should either report an error to the user or verify that the media in the drive is correct. In response to this call, the I/O Manager simply stores the device object to be verified in theDeviceToVerifyfield for the thread object pointed to by theOverlay.Threadfield in the IRP. The higher-level (file system) driver subsequently invokesIoGetDeviceToVerify(), supplying the thread object pointer obtained from theOverlay.Irpfield, and the I/O Manager, in response, hands back the stored device object pointer.
Note that the IoAllocateIrp()I/O Manager service routine does not set theThreadobject in the returned IRP. This is the responsibility of the caller of this routine.
The AuxiliaryBufferexists supposedly to pass additional information to a kernel-mode driver that is not contained elsewhere in the IRP. However, at this point, none of the I/O Manager routines use this field to pass information to a kernel-mode driver.[16]
The CurrentStackLocationfield is simply a pointer to the current stack location for the IRP. Stack locations are discussed later in this chapter. The important point to note for kernel-mode drivers is to always use I/O Manager-provided access functions to get the pointer to the current and the next stack locations in the IRP. To maintain portability, your driver should never try to access the contents of this field directly.
The OriginalFileObjectfield is initialized by the I/O Manager to the address of the file object to which an I/O operation is being targeted. The same information is available to the highest-level driver (typically, the file system driver) to which the I/O operation is sent from the current stack location. However, the I/O Manager keeps this information in the IRP header and can therefore access it independently of the manner in which stack locations are manipulated by lower-level drivers. The file object is used in the postprocessing of the IRP after it has been completed. For example, if the file object pointer is not NULL (i.e., theOriginalFileObjectfield is initialized at IRP allocation), the I/O Manager checks whether it needs to send a message to a completion port,[17] or dereference any event objects, or perform any similar notification or cleanup operation related to that file object. It is legitimate for this field to be NULL, in which case the I/O Manager will skip some of the postprocessing that it would otherwise perform.
The Apcfield is used internally by the I/O Manager after the IRP has been completed, to queue an APC request for final postprocessing of the IRP in the context of the thread that issued the I/O request.
As mentioned earlier, each I/O Request Packet is composed of the IRP header, and the stack locations for that IRP. Some of the fields in the IRP structure such as
StackCount,CurrentLocation, andCurrentStackLocationare related to stack location manipulation. IRP stack locations are discussed next.Stack locations and reusing IRP structures
Windows NT I/O request packets are reusable. In a layered driver environment, such as in the Windows NT I/O subsystem, each higher-level driver in the hierarchy invokes the next lower-level driver, until some driver actually completes the original IRP. It is quite possible, and is often the case, that the same IRP is passed down from driver to driver until it is completed.
Completing the IRP requires invoking
IoCompleteRequest(); after such a call is issued, no component, other than the I/O Manager, can touch that IRP, since it can be deallocated at any time.So how can a single IRP structure be reused cleanly? The solution provided by the NT I/O Manager is to use stack locations that contain descriptions of the I/O requests to the target device objects. When initially dispatching the IRP to a kernel-mode driver, the I/O Manager fills in one stack location with the parameters for the desired operation. Later, the driver to which the IRP is sent determines whether it can complete the IRP itself, or whether it needs to invoke another driver lower in the hierarchy. If it needs to invoke a lower-level driver, the current holder of the IRP can simply initialize the next IRP stack location, and then invoke the lower-level driver via
IoCallDriver(), passing it the IRP. This process is repeated until a driver in the chain performs all of the required processing and decides to complete the IRP.The NT I/O Manager allocates space for multiple associated stack locations when an IRP structure is allocated. Each of these stack locations can contain a complete description of an I/O request. For example, an IRP allocated for a read request should contain the following information:
- A function code, which will be examined by the kernel-mode driver to determine the type of request issued. In this example, the function code indicates a read request.
- An offset from which data should be read.
- The number of bytes that are requested.
- A pointer to the output buffer.
In addition to the above, other information relevant to the read request might also be passed to the driver that manages the device object that is the target of the read operation. All of this information is encapsulated into a single stack location structure.
The number of stack locations allocated for an IRP depends upon the
StackSizefield in the target device object to which the IRP is being issued. TheStackSizefield is initialized to 1 when the device object is created; it can then be set to any value by the driver managing the device object. TheStackSizefield is also changed when a device object is attached to another device object. As part of the attach process, theStackSizevalue is set to the value obtained from the device object being attached to, incremented by 1. The logic here is simple: an IRP sent to a device object needs one stack location for the initial target device object; it also needs one stack location for each filter and/or driver in the hierarchy that will perform some processing on the I/O Request Packet.As shown in Figure 4-4, if a read request is sent to the file system driver that has a volume mounted on disk A, the I/O Manager will allocate four stack locations when creating the read IRP. These stack locations are used in reverse order, similar to the last-in-first-out usage of a stack structure. When invoking a driver, the I/O Manager always pushes the stack location pointer to point to the next stack location; when the called driver releases the IRP, the stack location pointer is popped to once again point to the previous stack location. Therefore, when invoking the filter driver dispatch routine in Figure 4-4 below, the I/O Manager uses stack location #4, the last stack location allocated.
Figure 4-4. IRP stack locations used for a driver hierarchy
![]()
The NT I/O Manager initializes the
StackCountfield in the IRP header with the total number of stack locations allocated for that IRP. TheCurrentLocationfield in IRP header is initialized by the I/O Manager to (StackCount + 1). This value is decremented each time a driver dispatch routine is invoked viaIoCallDriver().Therefore, if the
StackCountis 4, the initial value ofCurrentLocationis set to 5, which is an invalid stack location pointer value. The reason for this, however, is that to dispatch an IRP to the next driver in the hierarchy, the kernel component must always get a pointer to the next stack location and then fill in appropriate parameters for the request.When an IRP is being dispatched to the first driver in the hierarchy, the next stack location will be (
CurrentStackLocation--1) equal to 4, the correct value for the stack location used for the filter driver above.The I/O Manager often performs sanity checks using this value to ensure that the IRP is being routed correctly through the I/O subsystem. For example, in
IoCallDriver(), the I/O Manager first decrements theCurrentLocationfield (since a new driver is being invoked, it requires the next IRP stack location), then checks to see if theCurrentLocationvalue is less than or equal to 0. If the value does become less than or equal to 0, it is obvious thatIoCall-Driver()is being invoked once too often for the number of stack locations that were initially allocated (or that there is some stray pointer corrupting memory), and therefore the I/O Manager performs a bugcheck with the error code ofNO_MORE_IRP_STACK_LOCATIONS.NOTE: The reason for a bugcheck is that, by the time the
IoCallDriver()is invoked, critical damage may have already been done, since the caller will in all likelihood have filled in the contents of the next stack location for the use of the driver being called. However, in this situation, the next stack location is some unallocated memory at the end of the IRP structure, which could literally be anything.Continuing execution at this time could lead to all sorts of problems, including the possible corruption of user data.
The I/O Manager maintains a pointer to the current stack location, in addition to the
CurrentLocationvalue mentioned previously. This pointer is maintained in theCurrentStackLocationfield in theTail.Overlaystructure that is contained in the IRP header. Kernel-mode drivers should never try to manipulate the contents of either theCurrentLocationor theCurrentStackLocationfields themselves.[18] The I/O Manager does provide routines for a driver to get a pointer to the current stack location, via a call toIoGetCurrentIrpStack-Location(), to get a pointer to the next stack location usingIoGetNext-IrpStackLocation()so that the driver can set up the contents of the stack location appropriately for the next driver in the hierarchy, and in rare cases to useIoSetNextIrpStackLocation()to set the stack location value.The stack location structure defined in the NT DDK is composed of some fields that are independent of the nature of the I/O request being described by the stack location. Here are these fields:
MajorFunction- The NT I/O Manager defines a set of major functions, each of which identifies a generic function that a kernel-mode driver can implement. Functions are identified by function codes or numbers, and the set of functions is deliberately comprehensive, since the function codes serve all types of NT kernel-mode drivers, including file system drivers, intermediate drivers, device drivers, and other lower level drivers.
- When an IRP is delivered to a kernel-mode driver, the driver must examine the
MajorFunctionfield in the current stack location to find out the functionality expected from the driver. The possible major function codes are shown below:
#define IRP_MJ_CREATE 0x00#define IRP_MJ_CREATE_NAMED_PIPE 0x01#define IRP_MJ_CLOSE 0x02#define IRP_MJ_READ 0x03#define IRP_MJ_WRITE 0x04#define IRP_MJ_QUERY_INFORMATION 0x05#define IRP_MJ_SET_INFORMATION 0x06#define IRP_MJ_QUERY_EA 0x07#define IRP_MJ_SET_EA 0x08#define IRP_MJ_FLUSH_BUFFERS 0x09#define IRP_MJ_QUERY_VOLUME_INFORMATION 0x0a#define IRP_MJ_SET_VOLUME_INFORMATION 0x0b#define IRP_MJ_DIRECTORY_CONTROL 0x0c#define IRP_MJ_FILE_SYSTEM_CONTROL 0x0d#define IRP_MJ_DEVICE_CONTROL 0x0e#define IRP_MJ_INTERNAL_DEVICE_CONTROL 0x0f#define IRP_MJ_SHUTDOWN 0x10#define IRP_MJ_LOCK_CONTROL 0x11#define IRP_MJ_CLEANUP 0x12#define IRP_MJ_CREATE_MAILSLOT 0x13#define IRP_MJ_QUERY_SECURITY 0x14#define IRP_MJ_SET_SECURITY 0x15#define IRP_MJ_QUERY_POWER 0x16#define IRP_MJ_SET_POWER 0x17#define IRP_MJ_DEVICE_CHANGE 0x18#define IRP_MJ_QUERY_QUOTA 0x19#define IRP_MJ_SET_QUOTA 0x1a#define IRP_MJ_PNP_POWER 0x1b#define IRP_MJ_MAXIMUM_FUNCTION 0x1cFunction codes beginning at
IRP_MJ_DEVICE_CHANGEand higher were introduced in NT version 4.0. Also, not all of the major function codes are implemented yet; for example, the quota-related function codes do not yet have any support from native NT file system drivers.None of the major functions listed above is mandatory for a kernel-mode driver to implement, except for the ability to open and close objects managed by the driver. Open and close operations are very important because, if open operations fail, no I/O requests can be submitted, since there does not exist any object that would be the target of the requests. Similarly, if opens succeed, the close operations will eventually be invoked, and close operations cannot fail (the I/O Manager does not check the return code from a close operation). Therefore, if you do not implement a close operation to complement your open, the system might eventually run out of resources, depending on what operations were previously performed during the open, and also depending on the data structures created during the open operation.
The major function codes in the context of a file system driver and a filter driver are discussed in Part 3.
MinorFunction- Minor function codes provide more information specific to the major function code in the I/O stack location. For example, consider the
IRP_MJ_DIRECTORY_CONTROLmajor function code above. An IRP containing this major function code is sent by the I/O Manager to file system drivers. The intent is to perform some file directory operation. The question, however, is what directory control operation does the I/O Manager want the file system driver to perform?
- The available operations include obtaining information about directory contents (
IRP_MN_QUERY_DIRECTORY) and notifying the I/O Manager when certain attributes of files or directories contained within the target directory change (IRP_MN_NOTIFY_CHANGE_DIRECTORY).
- Currently, only a few of the major functions have minor functions associated with them. However, for those few, the kernel-mode driver developer must examine this field to correctly determine the functionality it is expected to provide.
Flags- The
Flagsfield also provides additional information that qualifies the functionality expected from the target driver. For example, consider theIRP_MJ_DIRECTORY_CONTROLmajor function code previously discussed. If the minor function isIRP_MN_QUERY_DIRECTORY, theFlagsfield could contain additional information that might cause the file system to behave differently when returning the contents of the directory being queried.
- For example, if the
SL_RESTART_SCANflag is set, the file system driver will restart the scan from the beginning of the directory being queried. Or if theSL_RETURN_SINGLE_ENTRYflag is set, the file system driver will return only the first entry matching the specified search criteria.
- Lower-level drivers also have an interest in the settings for this flag. For example, removable media drivers will perform a read request dispatched to them from a file system driver if the
SL_OVERRIDE_VERIFY_VOLUMEflag has been set. If, however, the flag has not been set, and the device driver has recognized a media change (and informed the file system about it), it will fail all I/O requests, including all read requests.
Control- When a kernel-mode driver must process an IRP asynchronously, the driver can queue the IRP, mark it "pending" via a call to
IoMarkIrpPending()and subsequently return control back to the caller. The call toIoMarkIrpPending()simply sets theSL_PENDING_RETURNEDflag in theControlfield for the current stack location. Any kernel-mode driver can examine theControlfield for the existence of this flag.
- This flag is also used internally by the NT I/O Manager to store information about whether a completion routine associated with the current stack location should be invoked if the return code supplied at IRP completion indicates a success, a failure, or a cancel operation. These flags are designated as
SL_INVOKE_ON_SUCCESS,SL_INVOKE_ON_FAILURE, andSL_INVOKE_ON_CANCEL. Kernel-mode drivers typically should not need to be directly concerned with the state of these flags.
DeviceObject- This field is set by the NT I/O Manager as part of the processing performed in the
IoCallDriver()routine. The contents are set to the device object pointer for the target device object (i.e., the device object to which the IRP is being dispatched).
FileObject- The I/O Manager sets this field to point to the file object that is the target of an I/O operation. Note that just calling
IoAllocateIrp()from your driver will not result in this field being set. If you intend to use the returned IRP for an operation on a specific file object, your driver must set the field itself.
CompletionRoutine- The contents of this field are set by the I/O Manage