ISRs and DPCs, The Silent Killers
Hardware interrupt service routines and deferred procedure calls can be the silent killers of system performance. ISRs and DPCs are the highest priority code that runs in the system - they cannot be pre-empted by the OS and run to completion. ISRs and DPCs that run too long - or too often - can eat up significant amounts of CPU time and cause overall system performance to suffer. They can cause audio and video glitches, mouse freezes, UI hangs, and any number of other system wide problems.
From a user's standpoint, problems with long or frequent ISRs and DPCs often first show up as audio pops and drops (usually just called 'glitches'). This has long been a problem for PC audio: problematic ISRs or DPCs can cause enough delays in applications or in the audio engine so that audio data simply fails to get to the hardware in time causing audible glitches. Even Vista's new audio sub-system isn't immune to interference from ISR's and DPCs. The human ear is very sensitive to even the slightest problem in the audio stream. Chances are, if you are hearing regular audio glitches, long ISRs or DPCs are the problem.
We've even seen a few extreme cases of really awful DPC behavior; one bug in a new WDDM graphics driver has been particularly persistent. We first noticed it a few months ago when users reported periodic mouse freezes on some laptops. Taking a trace on the problem system quickly helped us find the problem: the graphics driver was periodically waking up to check for an external monitor change. This specific hardware doesn't provide an interrupt when a monitor is attached or removed, so the driver needs to periodically poll the adapter for the monitor state. It did this every few seconds. This would normally be fine, but the driver was spending several hundred milliseconds in a DPC every time it did this. Users saw this as a mouse freeze - in reality, it was a system freeze as the CPU was spending all its time doing nothing but polling for a monitor for a few hundred milliseconds. We worked with the vendor to get this fixed, only to have it show up a couple of more times as the vendor worked on the architecture of their driver.
Network interface cards (NICs) are a frequent culprit of ISR and DPC problems. Their type and age tend to vary widely – even between the same models of computers. Some NICs are well behaved, only interrupting the OS when absolutely necessary, such as when receive buffers are getting full. This is called interrupt aggregation. This lets the OS handle received packets with as few interrupts as possible. Other NICs are very badly behaved, interrupting the CPU for every packet received. Luckily, newer NICs support interrupt aggregation. The "bad NICs" we see tend to be older, circa 2003, 2004.
I call ISRs and DPCs problems the 'silent killers' for three reasons
The OS has no control over their execution. ISR's are triggered by physical hardware signals from devices to the CPU. When a device signals the CPU that it needs attention, the CPU immediately jumps to the driver's interrupt service routine. DPCs are scheduled by interrupt handlers and run at a priority only exceed by hardware interrupt service routines.
ISR and DPC activity usually increases with system activity. As a system becomes more active, interrupts and DPCs will generally become more frequent, taking up more CPU time. This can get to the point where they visibly (or audibly) affect system performance. In these cases, no single ISR or DPC routine is the problem - it is their cumulative usage of CPU time.
It is rarely obvious that ISRs and DPCs are causing performance problems. There is really only one place in the OS where information about them is visible surfaced, in the performance monitor. This UI is really useful, but a little hard to get to.
ISR's and DPC's can cause other secondary performance problems as well such as dirtying the processor caches and increasing interrupt and DPC latency.
Tracking down Interrupt and DPC problems is further complicated because their behavior is highly dependent on hardware configuration and drivers. A problem seen on one system may only be reproducible on that particular system.
The performance monitor UI is a bit buried, but once found provides a good run-time summary of ISR and DPC execution. My favorite way to get to the performance monitor is by entering 'perfmon' into the system run command dialog. Just press the Win-R key and type 'perfmon'. You can then use the 'performance monitor' tab to monitor ISR and DPC activity using these counters: % Interrupt Time, % DPC Time, Interrupts/Sec, DPC Rate, and DPCs Queued/Sec (see). This is quite handy as it is 'in-box' - available on every Windows XP and Vista system. This UI can help spot patently bad ISR and DPC activity relatively easily.
However, the performance counters only tell part of the story. They don't identify the modules that are generating the ISRs and DPCs, and they can’t be used to look at ISR and DPC execution times. The best way to get a clear picture of the CPU time spent in ISRs and DPCs is to use the kernel's ETW events which can be enabled and analyzed using ETW based performance tools.
Event Tracing for Windows (ETW) is a fundamental operating system feature that provides a very efficient method for logging events, mechanisms for controlling event providers (components that generate events, also often called 'loggers' ) and collecting those events for post processing. ETW is the primary tool we use to measure system performance. Probably the most important logger is the kernel logger itself which can generate events for DPCs, ISRs, context switches, disk I/O, hard and soft faults, process start/stop, thread start/stop, file I/O, disk I/O, TCP/IP and UDP traffic, registry access, and image (executable) loads. All of these can be controlled by the tracelog tool provided in the Windows Vista WDK.
Using the kernel logger to look at ISR and DPC execution statistics is very easy - here's how:
- Run the following command from an elevated command prompt window:
tracelog -start "NT Kernel Logger" -f krnl.etl -dpcisr -nodisk -nonet -b 1024 -min 4 -max 16 -ft 10 –UsePerfCounter
- Start, run then stop your test
- Run the following command to stop the kernel logger and save the data to 'krnl.etl'
- Use the following trace report command to generate a nicely formatted report of the ISR and DPC activity during your test
trcerpt krnl.etl -report isrdpc.xml
The isrdpc.xml report file will contain a nicely formatted report that lists the % utilization, counts, and histograms of the ISR and DPC execution time by module. This data is really helpful in finding problematic ISR and DPC activity.
Driver developers can take things a step further and add ETW based event logging directly to their driver. This is straight forward to do. Even better, ETW is very light weight and the logging features can be shipped in production (non-debug) drivers. This makes it much easier to debug performance problems as driver tracing can be enabled on any system. Since developers can define their own events, the driver can log additional information such as internal driver or hardware state.
It may even be appropriate for a driver developer to publish the drivers trace GUID, event flags and event structures so others can independently diagnose problems.
I’ve included some useful related links and other information below.
- The tracelog.exe tool is available in the Windows Vista driver development kit (The WDK). This article assumes the use of this version, other earlier versions don't support the kernel logger switches.
- Prior to vista shipping, you can get the WDK by singing up for the WDK Beta program here.
- The tracerpt.exe tool is built into Windows.
Scheduling, Thread Context, and IRQL: This document describes all the gory detail behind how hardware and software interrupts are scheduled and handled by the operating system.
Event Tracing from the WDK docs: Here is a good place to start on understanding the fundamental ETW infrastructure in windows.
CPU Performance Counters : This page describes the various CPU performance counters in windows.
Measuring ISR and DPC Time: A related article.
Windows Vista Display Driver Model: A detailed discussion of the virtues of WDDM.
WDDM In the WDK: The WDDM SDK - how to write a WDDM driver.
RATTV3 : This is a tool that we provided during WinHEC 2005 that allows 24 x 7 logging of ISR and DPC activity. Note, this currently only works on XP, we need to get this updated for Vista.