Debugging with NDISKD
Chapter three of a beginner’s guide to debugging with NDISKD
In Part 1 of the series, we set up a kernel debugger. In the second installment, we took a closer look at ndiskd’s output for miniports. Today, we will use what we know to debug an actual network issue.
The symptoms are thus:
- The network seems to work ok – you can download files fine.
- When you open Task Manager, most tabs work ok.
- As soon as you switch to the Network tab in Task Manager, the whole Task Manager window locks up.
- Task Manager can’t be closed, even with taskkill /f.
The fourth symptom suggests the problem is in kernel mode. The third symptom tells us to start the investigation with the network stack. The first symptom hints that the datapath is fine, and PNP state is probably also fine as well.
While taskmgr.exe is hung, we look for its UI thread in the debugger. The !process and .thread commands can find the stuck thread.
kd> !process 0 2 taskmgr.exe
PROCESS fffffa800b3a2360
SessionId: 1 Cid: 087c Peb: 7fffffdf000 ParentCid: 0354
DirBase: 14e05000 ObjectTable: fffff8a0015869f0 HandleCount: 127.
Image: taskmgr.exe
THREAD fffffa800bafaa60 Cid 087c.0884 Teb: 000007fffffdd000 Win32Thread: fffff900c061ac30 WAIT: (Suspended) KernelMode Non-Alertable
SuspendCount 1
FreezeCount 2
fffffa800bafad38 Semaphore Limit 0x2
kd> .thread fffffa800bafaa60
Implicit thread is now fffffa80`0bafaa60
kd> kn
*** Stack trace for last set context - .thread/.cxr resets it
# Child-SP RetAddr Call Site
00 fffff880`02bdbbe0 fffff800`0307f052 nt!KiSwapContext+0x7a
01 fffff880`02bdbd20 fffff800`030811af nt!KiCommitThreadWait+0x1d2
02 fffff880`02bdbdb0 fffff800`03056920 nt!KeWaitForSingleObject+0x19f
03 fffff880`02bdbe50 fffff800`03054969 nt!KiSuspendThread+0x54
04 fffff880`02bdbe90 fffff800`0307f25d nt!KiDeliverApc+0x211
05 fffff880`02bdbf10 fffff800`030811af nt!KiCommitThreadWait+0x3dd
06 fffff880`02bdbfa0 fffff800`03514711 nt!KeWaitForSingleObject+0x19f
07 fffff880`02bdc040 fffff880`014231e2 nt!VerifierKeWaitForSingleObject+0x151
08 fffff880`02bdc0c0 fffff880`01424a53 ndis!ndisQuerySetMiniportEx+0x162
09 fffff880`02bdc130 fffff880`01420769 ndis!ndisIfGetMiniportStatistics+0x163
0a fffff880`02bdc360 fffff880`014219e3 ndis!ndisIfQueryObject+0x389
0b fffff880`02bdc4f0 fffff880`0142261d ndis!ndisNsiGetInterfaceRodInformation+0x1c3
0c fffff880`02bdc590 fffff880`0153453d ndis!ndisNsiGetAllInterfaceInformation+0x42e
0d fffff880`02bdc650 fffff880`038029d6 NETIO!NsiGetAllParametersEx+0x44d
0e fffff880`02bdc750 fffff880`03804902 nsiproxy!NsippGetAllParameters+0x2b2
0f fffff880`02bdc940 fffff880`038049db nsiproxy!NsippDispatchDeviceControl+0x8a
10 fffff880`02bdc980 fffff800`0351dc16 nsiproxy!NsippDispatch+0x4b
11 fffff880`02bdc9b0 fffff800`033903a7 nt!IovCallDriver+0x566
Notice that the thread is waiting on something. The wait came from ndisQuerySetMiniportEx in frame 08. Hmm, it looks like NDIS is waiting for an OID request to come back. Let’s check out the miniport. Using what we learned last time, we’ll run !ndiskd.miniport once to get a list of all miniports.
kd> !ndiskd.miniport
MiniDriver Miniport Name _
fffffa800bd34560 fffffa800a4ac1a0 RAS Async Adapter
fffffa800acf72d0 fffffa800acd31a0 WAN Miniport (SSTP)
fffffa800acf5ce0 fffffa800ad0a1a0 WAN Miniport (PPTP)
fffffa800acf1ce0 fffffa800ad0c1a0 WAN Miniport (PPPOE)
fffffa800aceb120 fffffa800ad081a0 WAN Miniport (IPv6)
fffffa800aceb120 fffffa800ad061a0 WAN Miniport (IP)
fffffa800aceb120 fffffa800ad041a0 WAN Miniport (Network Monitor)
fffffa800acdc6e0 fffffa800acce1a0 WAN Miniport (L2TP)
fffffa800acba6e0 fffffa800ad001a0 MAC Bridge Miniport
fffffa800ac99ab0 fffffa800ac9c1a0 WAN Miniport (IKEv2)
fffffa800ac75020 fffffa800ac771a0 Microsoft Virtual Machine バス ネットワーク アダプター
fffffa800ac4a500 fffffa800ac5e1a0 Teredo Tunneling Pseudo-Interface
fffffa800ac4a500 fffffa800ac5c1a0 Microsoft ISATAP Adapter #6
fffffa800ac4a500 fffffa800ac5a1a0 Microsoft ISATAP Adapter #5
fffffa800ac4a500 fffffa800ac581a0 Microsoft ISATAP Adapter #2
fffffa800ac4a500 fffffa800ac531a0 Microsoft ISATAP Adapter
fffffa800ac4a500 fffffa800ac511a0 Microsoft 6to4 Adapter #2
fffffa800ac4a500 fffffa800ac4e1a0 Microsoft 6to4 Adapter
That’s a lot of miniports – where do we start? Well, usually a machine only has one or two miniports that are actually used for real network traffic. In my case, that’d be the VM NIC. But we are lucky – we don’t need to guess. Dump the hung thread with kv to see function parameters.
kd> kv
*** Stack trace for last set context - .thread/.cxr resets it
Child-SP RetAddr : Args to Child : Call Site
fffff880`02bdc0c0 fffff880`01424a53 : fffffa80`0ac771a0 fffffa80`0b2f3980 fffffa80`0bd3a0d8 00000000`00000000 : ndis!ndisQuerySetMiniportEx+0x162
(It’s a little hard to read the verbose stack dump; it helps if you have a wide screen). Notice that the first “Args to Child” is fffffa80`0ac771a0 – this value is familiar since it’s the same as the NDIS miniport handle for the VM NIC. Now we are confident the problem is that the VM NIC isn’t completing the OID. Let’s look at it.
kd> !ndiskd.miniport fffffa800ac771a0
<snip>
STATE
Miniport Running
Device PnP Started
Datapath Normal
Interface Up
Media Connected
Power D0
References 0n26
User Handles 2
Total Resets 0
Pending OID None
Flags 2c452000
↑ NOT_BUS_MASTER, DEFAULT_PORT_ACTIVATED, SUPPORTS_MEDIA_SENSE,
DOES_NOT_DO_LOOPBACK, MEDIA_CONNECTED
PnPFlags 00210000 ← RECEIVED_START, HARDWARE_DEVICE
Wait a minute – the status output shows that the Pending OID is None. (It’s even printed in green lettering, and green means “good”!) So our theory was wrong – the miniport did not lose an OID request.
Maybe the OID request is stuck somewhere else? Let’s start walking up the binding stack.
BINDINGS
Filter List Filter Filter Driver Context _
WFP LightWeight Filter-0000
fffffa800b2f3980 fffffa800ab13840 fffffa800b2f44f0
Buggy Filter Driver 1-0000
fffffa800b2e1c80 fffffa800ac1fc60 fffffa800b2e3d70
The WFP filter is a standard filter that comes with Windows. For now, let’s assume it is not the guilty party. We can get more details of the other filter instance by clicking on its NDIS Filter Instance handle. (In this example, the value of the handle is fffffa800b2e1c80.)
kd> !ndiskd.filter fffffa800b2e1c80
FILTER
Microsoft Virtual Machine バス ネットワーク アダプター-Buggy Filter Driver 1-0000
Ndis Handle fffffa800b2e1c80
Filter Driver fffffa800ac1fc60 - Buggy Filter Driver 1
Driver Context fffffa800b2e3d70
Miniport fffffa800ac771a0 - Microsoft Virtual Machine バス ネットワーク アダプター
State Running
Datapath Normal
References 1
Flags 00000810 ← RUNNING, PROCESSING_REQUEST
Pending OID OID_IP4_OFFLOAD_STATS
Higher Filter fffffa800b2f3980 - Microsoft Virtual Machine バス ネットワーク アダプター-WFP LightWeight Filter-0000
Lower Filter [None]
→ Driver handlers
Ah-ha, this filter instance has a pending OID. Now our new theory is that this filter is not completing the OID in a timely manner, which is causing other OIDs to languish in the OID queue. (Recall that a filter instance or miniport adapter instance can only have at most one outstanding OID request – all others are automatically queued by NDIS until the pending request is completed). We can confirm our theory by looking at the topmost filter to see its OID queue.
kd> !ndiskd.filter fffffa800b2f3980
FILTER
Microsoft Virtual Machine バス ネットワーク アダプター-WFP LightWeight Filter-0000
Ndis Handle fffffa800b2f3980
Filter Driver fffffa800ab13840 - WFP LightWeight Filter
Driver Context fffffa800b2f44f0
Miniport fffffa800ac771a0 - Microsoft Virtual Machine バス ネットワーク アダプター
State Running
Datapath Normal
References 1
Flags 00004810
↑ RUNNING, PROCESSING_REQUEST, IS_MANDATORY
Pending OID OID_IP4_OFFLOAD_STATS
More OIDs are queued → Show queued OIDs
Higher Filter [None]
Lower Filter fffffa800b2e1c80 - Microsoft Virtual Machine バス ネットワーク アダプター-Buggy Filter Driver 1-0000
→ Driver handlers
Indeed, the topmost filter (the WFP filter) is also processing an OID_IP4_OFFLOAD_STATS request, which is the same as the OID stuck in the other filter. Furthermore, we see that “More OIDs are queued”. Let’s click the “Show queued OIDs” link to see the full queue.
kd> !ndiskd.oid
ALL PENDING OIDs
Filter fffffa800b2e1c80 - Microsoft Virtual Machine バス ネットワーク アダプター-Buggy Filter Driver 1-0000
Current OID OID_IP4_OFFLOAD_STATS
Filter fffffa800b2f3980 - Microsoft Virtual Machine バス ネットワーク アダプター-WFP LightWeight Filter-0000
Current OID OID_IP4_OFFLOAD_STATS
Queued OIDs OID_GEN_STATISTICS
OID_GEN_STATISTICS
OID_GEN_STATISTICS
OID_802_3_MULTICAST_LIST
OID_GEN_STATISTICS
OID_GEN_NETWORK_LAYER_ADDRESSES
OID_GEN_STATISTICS
OID_GEN_STATISTICS
OID_GEN_STATISTICS
OID_GEN_XMIT_OK
OID_802_3_CURRENT_ADDRESS
Yikes, it’s no wonder that usermode is all clogged up. There are about a dozen OID requests queued behind the OID_IP4_OFFLOAD_STATS request, which in turn is stuck in a buggy filter driver.
We’ve found the culprit filter driver, and the next step is to code review its OID request handler, especially for OID_IP4_OFFLOAD_STATS, to see if there are any bugs that result in the driver forgetting to complete the OID. Meanwhile, we have one last ndiskd tip for you today.
We actually could have saved some debugging steps with the !ndiskd.oid command. This command, when run without parameters, scans the entire system for any pending or queued OID requests on any miniport or filter instance. Since it’s easy to scan the system, it’s not usually necessary to manually check each miniport or filter one-by-one. As always, you can find several other useful commands, like !ndiskd.oid, on the !ndiskd.help menu.
Comments
Anonymous
May 19, 2011
Hello, I have a question about "current OID" and "pending OID" for a miniport. I have met a problem where "current OID" for a miniport is null, while "pending OID" for this miniport is "OID_GEN_CURRENT_PACKET_FILTER". Could you explain more on this? And the Flags for the miniport has a "REQUEST_TIMEOUT" flag, could you explain when "REQUEST_TIMEOUT" will happen? Thanks very much!Anonymous
May 26, 2011
> I have met a problem where "current OID" for a miniport is null, while "pending OID" for this miniport is "OID_GEN_CURRENT_PACKET_FILTER". Could you explain more on this? When an OID is issued, it is first queued on the miniport. Then, if the miniport is ready to receive OID requests, the first OID is removed from the queue and delivered to the miniport. So there are two reasons that you might see no OID is pending while some OIDs are queued:
- The OID was recently issued, and it hasn't had a chance to be sent to the miniport yet;
- The miniport is not ready to receive OID requests (e.g., the miniport is currently being reset). > And the Flags for the miniport has a "REQUEST_TIMEOUT" flag, could you explain when "REQUEST_TIMEOUT" will happen? The "REQUEST_TIMEOUT" flag is not good. It means that the miniport has spent too long to process an OID request. NDIS detected the timeout and will attempt to reset the miniport. This would probably explain you first question -- NDIS isn't delivering new OIDs because your miniport is being reset.