BSOD on 4 servers in Hyper-V cluster

Anonymous
2024-03-12T02:42:28+00:00

I'm encountering some server crashes and hoping to get some insights from the community.

Problem:

  • Four servers have experienced crashes.
  • The crashes seem related to the fastfat kernel module, typically used for FAT file systems.
  • However, our servers use the NTFS file system.

Details:

  • SRV04: Crashed twice with error "FAT_FILE_SYSTEM (23)" suggesting the fastfat module might be interacting with the ESP (boot partition) since other file systems are NTFS.
  • SRV02 and SRV03: Crashed with error "DPC_WATCHDOG_VIOLATION (133)" and further investigation revealed calls to the fastfat module.
  • SRV01: Faced a different crash with error "RESOURCE_NOT_OWNED (e3)" which could be related to a resource access conflict.

Troubleshooting done:

  • The servers have been patched, but the crashes persist.

Questions:

  • Has anyone faced similar crashes with the fastfat module?
  • Could a mounted ESP partition be causing the fastfat module to activate despite the NTFS file system?
  • Are there any recommendations for further troubleshooting or potential solutions?

Additional Information:

  • OS: Windows Server 2022 Datacenter
  • System Model: Lenovo ThinkSystem SR630 V2
  • SAN: Storwize V3700

Details of BSOD

SRV01 server experienced a Blue Screen of Death (BSOD) crash which appear to triggered by a specific function (nt!ExpReleaseResourceSharedForThreadLite) within the Windows kernel. The function attempts to release a resource, but the crash occurred due to an error in this process.

RESOURCE_NOT_OWNED (e3)

A thread tried to release a resource it did not own.

Arguments:

Arg1: ffffb387895e6bf8, Address of resource

Arg2: ffffb38704a41040, Address of thread

Arg3: 0000000000000000, Address of owner table if there is one

Arg4: 0000000000000002

28: kd> u nt!ExpReleaseResourceSharedForThreadLite+22552b

nt!ExpReleaseResourceSharedForThreadLite+0x22552b:

fffff803`668627eb cc int 3

fffff803`668627ec 488b9424b8000000 mov rdx,qword ptr [rsp+0B8h]

fffff803`668627f4 498bcf mov rcx,r15

fffff803668627f7 e860e90f00 call nt!KiReleaseQueuedSpinLockInstrumented (fffff8036696115c)

fffff803`668627fc 90 nop

fffff803668627fd e98eadddff jmp nt!ExpReleaseResourceSharedForThreadLite+0x2d0 (fffff8036663d590)

fffff803`66862802 80792001 cmp byte ptr [rcx+20h],1

fffff80366862806 0f879dadddff ja nt!ExpReleaseResourceSharedForThreadLite+0x2e9 (fffff8036663d5a9)

SRV02 and SRV03 experienced a Blue Screen of Death (BSOD) crash with bug check DPC_WATCHDOG_VIOLATION (133) . This bug check indicates that the DPC watchdog executed, either because it detected a single long-running deferred procedure call (DPC), or because the system spent a prolonged time at an interrupt request level (IRQL) of DISPATCH_LEVEL or above.

TRAP_FRAME: ffffa60c46891c90 -- (.trap 0xffffa60c46891c90)

NOTE: The trap frame does not contain all registers.

Some register values may be zeroed or incorrect.

rax=0000000000000000 rbx=0000000000000000 rcx=f534581101050000

rdx=00000000f8b4d000 rsi=0000000000000000 rdi=0000000000000000

rip=fffff8017d82e7b3 rsp=ffffa60c46891e20 rbp=0000000000000000

r8=0000000000000000 r9=ffffa60c46892030 r10=0000000000000000

r11=ffffa60c46891f88 r12=0000000000000000 r13=0000000000000000

r14=0000000000000000 r15=0000000000000000

iopl=0 nv up ei pl nz na po nc

nt!KxWaitForLockChainValid+0x23:

fffff8017d82e7b3 488b07 mov rax,qword ptr [rdi] ds:0000000000000000=????????????????

Resetting default scope

19: kd> .trap 0xffffa60c46891c90

NOTE: The trap frame does not contain all registers.

Some register values may be zeroed or incorrect.

rax=0000000000000000 rbx=0000000000000000 rcx=f534581101050000

rdx=00000000f8b4d000 rsi=0000000000000000 rdi=0000000000000000

rip=fffff8017d82e7b3 rsp=ffffa60c46891e20 rbp=0000000000000000

r8=0000000000000000 r9=ffffa60c46892030 r10=0000000000000000

r11=ffffa60c46891f88 r12=0000000000000000 r13=0000000000000000

r14=0000000000000000 r15=0000000000000000

iopl=0 nv up ei pl nz na po nc

nt!KxWaitForLockChainValid+0x23:

fffff8017d82e7b3 488b07 mov rax,qword ptr [rdi] ds:0000000000000000=????????????????

STACK_TEXT:

ffff9581dc7acda8 fffff8017d886c01 : 0000000000000133 0000000000000001 0000000000001e00 fffff8017e30f328 : nt!KeBugCheckEx

ffff9581dc7acdb0 fffff8017d884ab4 : 000ca51cdf7f9699 00000000000012b8 ffffab8e5ce50000 fffff8017d9aca02 : nt!KeAccumulateTicks+0x541

ffff9581dc7ace20 fffff8017d88471a : 000000000e1c09f4 ffff9581dc7492b8 0000000000000000 fffff8017d9258ef : nt!KiUpdateRunTime+0x64

ffff9581dc7aceb0 fffff8017d8845a4 : ffffab8e5b57ecc0 0000000000000000 ffffab8e5b57ecc0 0000000000000000 : nt!KeClockInterruptNotify+0x10a

ffff9581dc7acf40 fffff8017d852ce0 : 0000000000000000 ffff60b5846bcfd5 0000000000000000 0000000000010032 : nt!HalpTimerClockIpiRoutine+0x14

ffff9581dc7acf70 fffff8017da222ea : ffffa60c46891d10 ffffab8e5b57ecc0 0000000000000000 0000000000000000 : nt!KiCallInterruptServiceRoutine+0xa0

ffff9581dc7acfb0 fffff8017da22b97 : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : nt!KiInterruptSubDispatchNoLockNoEtw+0xfa

ffffa60c46891c90 fffff8017d82e7b3 : ffffa60c46891ef9 0000000000010008 ffffa60c46891ed0 fffff8017dc87385 : nt!KiInterruptDispatchNoLockNoEtw+0x37

ffffa60c46891e20 fffff8017d83cf65 : 00000000f8b4d3a3 ffffb389cdbdbc08 ffffa60c46891fb0 0000000000000000 : nt!KxWaitForLockChainValid+0x23

ffffa60c46891e50 fffff8017d83d206 : 0000000000000000 ffffb389a2508040 0000000000000000 0000000000000000 : nt!ExpReleaseResourceExclusiveForThreadLite+0x535

ffffa60c46891f30 fffff8019a63b096 : ffffb389bebc1b90 ffffb389cdbdba10 0000000000000000 0000000000000000 : nt!ExReleaseResourceLite+0x146

ffffa60c46891f90 fffff8019a63a680 : ffffb389cdbdba10 0000000000000000 0000000000000000 fffff8017d848b00 : fastfat!FatCommonClose+0x466

ffffa60c468920a0 fffff8017d849025 : 0000000000000000 ffffb389a171fbe0 0000000000000001 0000000000000000 : fastfat!FatFsdClose+0x1b0

ffffa60c46892140 fffff8017dcbcc5f : ffffb389d7d04440 ffffb389bebc1b90 ffffb389d7d04440 ffffb389d7d04440 : nt!IofCallDriver+0x55

ffffa60c46892180 fffff8017dca7740 : ffffab8e5b5f76c0 ffffb389dddf3110 ffffb389d7d04410 0000000000000000 : nt!IopDeleteFile+0x14f

ffffa60c46892200 fffff8017d8360a7 : 0000000000000000 0000000000000000 ffffa60c46892300 ffffb389d7d04440 : nt!ObpRemoveObjectRoutine+0x80

ffffa60c46892260 fffff8017d960362 : 0000000000000000 ffffb389dddf3110 ffffb389dddf3110 fffff80100000000 : nt!ObfDereferenceObjectWithTag+0xc7

ffffa60c468922a0 fffff8017d8d7b41 : ffffb3898f60a2b0 ffffb389a2508040 ffff9581dc44d380 fffff80100000000 : nt!CcGetDeviceGuidAsync+0xb2

ffffa60c46892320 fffff8017d957925 : ffffb389a2508040 0000000000000001 ffffb389a2508040 0000000000000080 : nt!ExpWorkerThread+0x161

ffffa60c46892530 fffff8017da25198 : ffff9581dc840180 ffffb389a2508040 fffff8017d9578d0 0000000000000000 : nt!PspSystemThreadStartup+0x55

ffffa60c46892580 0000000000000000 : ffffa60c46893000 ffffa60c4688c000 0000000000000000 0000000000000000 : nt!KiStartSystemThread+0x28

AITHV04 server experienced a Blue Screen of Death (BSOD) with bug check FAT_FILE_SYSTEM (23) .

FAT_FILE_SYSTEM (23)

*If you see FatExceptionFilter on the stack then the 2nd and 3rd*

*parameters are the exception record and context record. Do a .cxr*

*on the 3rd parameter and then kb to obtain a more informative stack*

*trace.*

Arguments:

Arg1: 00000000001c0345

Arg2: 0000000000000000

Arg3: 0000000000000000

Arg4: 0000000000000000

Stack trace showing sequence of function calls that led up to the crash.

STACK_TEXT:

ffffbe0a8381a758 fffff8055040a919 : 0000000000000023 00000000001c0345 0000000000000000 0000000000000000 : nt!KeBugCheckEx

ffffbe0a8381a760 fffff805503c5d46 : ffff9c8ebbbf9a10 ffff9c8ebbbf9a10 00000000c0000101 0000000000000000 : fastfat!FatDeleteVcb+0x241

ffffbe0a8381a7a0 fffff80550402e30 : 0000000000000000 ffffbe0a8381ab00 ffffbe0a8381ab69 0000000000000000 : fastfat!FatCheckForDismount+0xea

ffffbe0a8381a7e0 fffff80550402187 : 0000000000000000 ffff9c8eab134800 ffff948200000001 0000000000000000 : fastfat!FatMountVolume+0xc74

ffffbe0a8381aa50 fffff805504020d2 : ffff9482521994d0 ffff9482719bbb01 ffff9482719bbb01 ffff9c8e7a5bba01 : fastfat!FatCommonFileSystemControl+0x57

ffffbe0a8381aa80 fffff80535041185 : 0000000000000000 ffff9482521994d0 ffff948252199401 ffff9482719bbbe0 : fastfat!FatFsdFileSystemControl+0xb2

ffffbe0a8381aac0 fffff80530c504c4 : ffff9c8e793bc010 ffff948200000000 0000000000000000 0000000000000000 : nt!IofCallDriver+0x55

ffffbe0a8381ab00 fffff80530c489ed : ffff9482313bcd40 ffff9482521994d0 ffff9482227dc820 ffff9c8ebeb40a70 : FLTMGR!FltpFsControlMountVolume+0x1f0

ffffbe0a8381abd0 fffff80535041185 : ffffbe0a8381ad31 ffff9482313bcd40 ffffbe0a8381ad31 fffff80535a51060 : FLTMGR!FltpFsControl+0x11d

ffffbe0a8381ac30 fffff805354fe387 : ffffbe0a8381ad31 ffff94822345f050 ffff9482313bcd40 0000000000000000 : nt!IofCallDriver+0x55

ffffbe0a8381ac70 fffff80535040f45 : 0000000000000000 ffff9c8e904e6640 0000000000000000 0000000000000000 : nt!IopMountVolume+0x3af

ffffbe0a8381ad90 fffff8053547be12 : ffff9482e9c37080 0000000000000000 ffffbe0a8381b0b0 0000000000000f25 : nt!IopCheckVpbMounted+0x205

ffffbe0a8381adf0 fffff80535481a85 : ffff94822339e060 fffff8053547b8e0 0000000000000000 ffff9481ddbfd7a0 : nt!IopParseDevice+0x532

ffffbe0a8381afb0 fffff80535480f21 : ffffa98697408bf0 ffffbe0a8381b1e0 0000000000000040 ffff9481ddbfdd20 : nt!ObpLookupObjectName+0x625

ffffbe0a8381b150 fffff8053551ca3f : ffff948100000000 0000000000000001 ffff9c8ed01e5af0 000000c57ecfef58 : nt!ObOpenObjectByNameEx+0x1f1

ffffbe0a8381b280 fffff8053551c619 : 000000c57ecfef18 000000c57ecfeaf8 000000c57ecfef58 000000c57ecfef20 : nt!IopCreateFile+0x40f

ffffbe0a8381b320 fffff80535231085 : 0000000000000001 000000c57ecff220 0000000000000000 000000c57ecff3e8 : nt!NtCreateFile+0x79

ffffbe0a8381b3b0 00007ffea973ff14 : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : nt!KiSystemServiceCopyEnd+0x25

000000c57ecfee98 0000000000000000 : 0000000000000000 0000000000000000 0000000000000000 0000000000000000 : 0x00007ffe`a973ff14

Windows for business | Windows Server | Performance | System performance

Locked Question. This question was migrated from the Microsoft Support Community. You can vote on whether it's helpful, but you can't add comments or replies or follow the question. To protect privacy, user profiles for migrated questions are anonymized.

0 comments No comments
{count} vote

2 answers

Sort by: Most helpful
  1. Anonymous
    2024-03-14T08:52:46+00:00

    Hello

    Thank you for posting in Microsoft Community forum.

    From the dump information you provided, we can see there are 3 different BSOD code from 4 server, unfortunately, the dump information you provided is not a completely dump analysis. We cannot get further information but there are some official documents for your reference.

    Bug Check 0xE3 RESOURCE_NOT_OWNED - Windows drivers | Microsoft Learn

    Bug Check 0x133 DPC_WATCHDOG_VIOLATION - Windows drivers | Microsoft Learn

    Bug Check 0x23 FAT_FILE_SYSTEM - Windows drivers | Microsoft Learn

    We understand that the BSOD issue may impact your work and caused some inconvenience, we suggest you contact Microsoft Support for further help.

    Best Regards,

    Zack Lu

    0 comments No comments
  2. Anonymous
    2024-05-22T08:12:53+00:00

    Hi,

    I have same problem with Netapp storage with Dell server.

    Microsoft have no solution for this and vendor also.

    0 comments No comments