[UNRESOLVED] Win2008R2: STOP 0x9E netft!NetftWatchdogTimerDpc+b9 (USER_MODE_HEALTH_MONITOR) related to iSCSI

Status: Unresolved, customer discontinued.

Opening the dump shows:

BugCheck 9E, {fffffa807d98bb30, 4b0, 0, 0}
Probably caused by : netft.sys ( netft!NetftWatchdogTimerDpc+b9 )

Note that the 0x9E indicates a timeout, triggered by the netft driver. This is likely the result of an underlying condition, not a problem by itself. The root causes for 0x9E can be very diverse. The stack shows:

12: kd> knL
 # Child-SP RetAddr Call Site
00 fffff880`0230a518 fffff880`04cb76a5 nt!KeBugCheckEx
01 fffff880`0230a520 fffff800`018d2fa6 netft!NetftWatchdogTimerDpc+0xb9
02 fffff880`0230a570 fffff800`018d2326 nt!KiProcessTimerDpcTable+0x66
03 fffff880`0230a5e0 fffff800`018d2e7e nt!KiProcessExpiredTimerList+0xc6
04 fffff880`0230ac30 fffff800`018d2697 nt!KiTimerExpiration+0x1be
05 fffff880`0230acd0 fffff800`018cf6fa nt!KiRetireDpcList+0x277
06 fffff880`0230ad80 00000000`00000000 nt!KiIdleLoop+0x5a

...but again, in this case this is not very useful. Debugging shows that a Persistent Reservation for one of the iSCSI LUNs is not handled ok:

12: kd> db fffffa8083aa0810+48 l8
fffffa80`83aa0858 5f 06 00 00 00 00 00 00 _....... // 5f is SCSIOP_PERSISTENT_RESERVE_OUT

For now, this looks to be addressed in a later version of msiscsi.sys. The customer will test SP1 somewhere this week. Current msiscsi version is:

12: kd> lmtm msiscsi
start end module name
fffff880`04d97000 fffff880`04dd2000 msiscsi Sat Oct 24 06:17:51 2009 (4AE27FEF)

This version.is quite outdated. When the update to SP1 does not resolve the issue, we will enable iSCSI ETW tracing, as discussed for example here.