How do I keep my driver from running out of kernel-mode stack?
Driver development tip: The size of the kernel-mode stack varies among different hardware platforms, but it is always a scarce resource. Here are some tips for understanding and managing your driver's use of the kernel-mode stack.
The kernel-mode stack is a limited storage area that is often used for information that is passed from one function to another as well as for local variable storage. Although the stack is mapped into system space, it is considered part of the thread context of the original calling routine, not part of the driver itself. This means that the stack is guaranteed to be resident whenever the calling thread is running, but can be swapped out along with the thread. Code running on any kernel-mode thread (whether it is a system thread or a thread created by a driver) uses that thread's kernel-mode stack unless the code is a DPC, in which case it uses the processor's DPC stack on certain platforms (more about this later).
The size of the kernel-mode stack varies among different hardware platforms. For example:
- On x86-based platforms, the kernel-mode stack is 12K.
- On x64-based platforms, the kernel-mode stack is 24K. (x64-based platforms include systems with processors using the AMD64 architecture and processors using the Intel EM64T architecture).
- On Itanium-based platforms, the kernel-mode stack is 32K with a 32K backing store. (If the processor runs out of registers from its register file, it uses the backing store to hold the contents of registers until the allocating function returns. This doesn't affect stack allocations directly, but the operating system uses more registers on Itanium-based platforms than on other platforms, which makes relatively more stack available to drivers.)
Each thread is allocated a kernel-mode stack, so increasing the size of the stack by even a small amount would drastically increase the memory footprint of the system. Therefore, the size of the kernel-mode stack on a given platform is set by the operating system and cannot be modified.
Guidelines for using the kernel-mode stack
Drivers should use the kernel-mode stack conservatively and avoid deeply nested or recursive calls. Heavily recursive functions that are passed many bytes of data can quickly deplete stack space. Instead of passing large amounts of data on the stack, the driver should allocate system-space memory (from paged or non-paged pool, depending on where the data will be used) and pass a pointer to the data. For recursive functions, the driver should limit the number of recursive calls that can occur.
Wherever possible, design functions to take pointers to single structures rather than individual variables as parameters. If you need to pass a large number of stack-based parameters from one function to another, group the local variables into a structure and pass a pointer to a local copy of the structure to the target function. This will save kernel stack space on the subsequent call. For example, the following structure occupies the size of a single PVOID on the stack, rather than the six ULONGs that would be needed to pass the variables individually:
typedef struct _COMPUTE_AXIS_COUNT_PARAMS {
ULONG x;
ULONG y;
ULONG z;
ULONG xcount;
ULONG ycount;
ULONG zcount;
} COMPUTE_AXIS_COUNT_PARAMS;
COMPUTE_AXIS_COUNT_PARAMS params;
ComputeAxisCount(¶ms);
Whether or not a function makes nested or recursive calls, the driver should minimize kernel-mode stack usage by declaring only pointers or simple counters as local variables. Avoid declaring a local variable as a byte or string array to serve as a local buffer for the function. Instead, declare a pointer to a buffer that has been allocated in paged or nonpaged pool. (Remember that nonpaged pool is also a limited resource, and use it sparingly.) If you must declare a local copy of a structure on the stack, make sure the structure is relatively small. Avoid declaring local copies of large structures or aggregate structures such as C++ classes on the stack.
It is especially important to minimize kernel-mode stack usage in a driver's interrupt service routine (ISR). Functions called from an ISR are subject to the same stack limits as functions called in any thread. However, an ISR runs in an arbitrary thread context and thus uses the kernel-mode stack of the thread on which the ISR happens to be running. This might be the current thread's stack or, if a DPC is running, the processor's DPC stack. In any case, very little stack might be available to the ISR, depending on other users of the stack on that thread.
A driver can be a bit more liberal in its use of kernel-mode stack in a deferred procedure call (DPC). DPCs use a per-processor kernel-mode stack on all but Itanium-based platforms. (On Itanium-based platforms, DPCs use the stack of the current thread.) For each processor, the operating system allocates a single kernel-mode stack for use by any DPC running on that processor. Only one DPC runs at a time on a given processor, so a DPC effectively has its own stack.
To determine whether enough stack space remains to call a function to perform a task, a driver can call the IoGetStackLimits and IoGetRemainingStackSize routines. If not enough stack space is available, the driver can queue the task to a work item, which runs in a separate thread and therefore has its own kernel-mode stack. Keep in mind, however, that the work item will be run in a system worker thread at PASSIVE_LEVEL. Keep in mind also that IoGetStackLimits and IoGetRemainingStackSize must be called at IRQL PASSIVE_LEVEL or IRQL APC_LEVEL on the original release of Windows Server 2003 RTM and earlier versions of Windows, so they cannot be called from any routine that runs at DISPATCH_LEVEL or higher (such as a DPC routine). Starting with Windows Server 2003 SP1, IoGetStackLimits and IoGetRemainingStackSize can be called at any IRQL.
Important A driver should not allocate memory "on the side" and use it as a kernel-mode stack. This has never been a recommended practice for any platform, because it affects the stability and reliability of the operating system. On x64-based systems, if the operating system detects an unauthorized kernel-mode stack, it will generate a bug check and shut down the system.
Debugging kernel-mode stack usage in your driver
PREfast with driver-specific rules, a static source code analysis tool provided with the Windows Driver Kit (WDK), can be used to find functions that are using an excessive amount of kernel-mode stack. Use the PREfast build command option /STACKHOGTHRESHOLD to change PREfast's default stack usage threshold.
Running out of stack space will cause the operating system to crash with one of several possible bug checks. These might include the following:
- 0x7F: UNEXPECTED_KERNEL_MODE_TRAP with Parm1 set to EXCEPTION_DOUBLE_FAULT, which is caused by running off the end of a kernel stack.
- 0x1E: KMODE_EXCEPTION_NOT_HANDLED, 0x7E: SYSTEM_THREAD_EXCEPTION_NOT_HANDLED, or 0x8E: KERNEL_MODE_EXCEPTION_NOT_HANDLED, with an exception code of STATUS_ACCESS_VIOLATION, which indicates a memory access violation.
- 0x2B: PANIC_STACK_SWITCH, which usually occurs when a kernel-mode driver uses too much stack space.
To debug these problems, use the kf (Display Stack Backtrace) debugger command to display the amount of stack consumed by each function. If the stack trace appears incomplete, you can attempt to determine the stack trace manually by examining raw stack memory for symbols that might indicate return addresses. To establish bounds for your search, use the !thread debugger extension to find out the limits of the stack, then use the dps (Display Words and Symbols) debugger command to walk through memory and attempt to resolve symbols for each pointer-sized quantity. (Be aware that symbols found on the stack do not necessarily indicate valid return addresses.)
Another technique for checking stack usage on x86-based platforms is to look at the disassembly of the function and see how much the stack pointer moves on function entry. (This technique can also be used on x64-based platforms, although the disassembly is slightly different.) For example, the following typical function prologue indicates function local stack usage of 140 bytes (0x8c), not counting any additional stack consumed by subroutines that the function might call:
1e3bf1de 8bff mov edi,edi
1e3bf1e0 55 push ebp
1e3bf1e1 8bec mov ebp,esp
1e3bf1e3 81ec8c000000 sub esp,0x8c
You can use the uf (Unassemble Function) debugger command or the WinDBG Disassembly window to display function code in assembly language.
What should you do?
To minimize stack usage in your driver:
- Declare only local variables that are pointers or simple counters.
- Never allocate large structs or other aggregate structures such as C++ classes on the stack.
- Do not declare a local variable as a byte or string array to serve as a local buffer for the function. Instead, allocate a buffer in paged or nonpaged pool and declare a pointer to that buffer.
- Avoid deeply nested or recursive calls.
- Instead of passing large amounts of data on the stack to another function, allocate system-space memory and pass a pointer to the data.
- In a recursive function, limit the number of recursive calls that can occur.
- Wherever possible, design functions to take pointers to single structures rather than individual variables as parameters.
- Use the IoGetStackLimits and IoGetRemainingStackSize routines to determine whether enough stack space remains to call a function to perform a task and, if not, queue the task to a work item.
To analyze and debug kernel-mode stack usage in your driver:
- Use PREfast to find functions that use an excessive amount of kernel-mode stack.
- Use the kf debugger command to display the amount of stack consumed by each function. If the trace appears incomplete, use !thread and dps to identify possible return addresses.
- On x86-based and x64-based platforms, check the disassembly of the function to see how much the stack pointer moves on function entry.
Related topics
Memory Management: What Every Driver Writer Needs to Know
Patching Policy for x64-Based Systems