MS Bug from latest January Update
ReFS filesystem not responding periodically
Environment:
- Windows 11
- Storage Spaces, Simple (no parity), Thin provisioning
- Virtual disk larger than 100 TB
- 10's of TBs used
- ReFS 3.7 formatted in Windows 11
The filesystem is not responding for about 1 minute, for every ~25 minutes according to the Microsoft-Windows-ReFS/Operational event log.
Message:
An IO took more than 30000 ms to complete.
Event ID: 147
Process Id: (any process accessing the volume)
Process name:
File name: (any file)
File offset:
IO Type: (Read, Write, Open, etc.)
IO Size: (varies)
Latency: (30~64 seconds)
A thread in the System process has ~100% core usage while the filesystem is hanging.
I took a stack trace using Process Hacker:
ntoskrnl.exe!KiDeliverApc+0x1b6
ntoskrnl.exe!KiApcInterrupt+0x328
ReFS.SYS!CmsRotatingSkipList<_RANGE,SmsAllocationRegionEx,OrderByStartOfRange,RegionLockPolicies>::InlineRebalance+0x98
ReFS.SYS!CmsRotatingSkipList<_RANGE,SmsAllocationRegionEx,OrderByStartOfRange,RegionLockPolicies>::Add+0x1d0
ReFS.SYS!CmsAllocator::SplitRangeOnlyRegion+0x386
ReFS.SYS!CmsAllocator::PinBitmapRegion+0x3176c
ReFS.SYS!CmsAllocator::TryProtectRangeIfDurablyFree+0x9b
ReFS.SYS!CmsThinProvisioning::UnmapWorkItemMethod+0x2e4
ReFS.SYS!<lambda_9a1cd484752ca8e9f6e914453bf80744>::<lambda_invoker_cdecl>+0x15
ReFS.SYS!MspWorkerRoutine+0x46
ntoskrnl.exe!ExpWorkerThread+0x14f
ntoskrnl.exe!PspSystemThreadStartup+0x55
ntoskrnl.exe!KiStartSystemThread+0x34
2 answers
Sort by: Oldest
-
-
Limitless Technology 39,356 Reputation points
2022-01-24T15:50:33.69+00:00 Hi there,
DPM uses loopback-mounted-VHDs. These appear like normal disks to the OS. Therefore, these disks are displayed in Windows Explorer, Diskmgt, and other GUI tools. These tools periodically poll the disks to make sure that they are functioning correctly.
This causes IOs to be sent down the loopback stack to the ReFS volume. If the ReFS volume is busy, these IOs will have to wait. Therefore, when ReFS performs a long-duration operation, such as flushing or a large block-clone call, these IOs will have to wait longer.
Here are some links to help you out
ReFS volume using DPM becomes unresponsive on Windows Server 2016
https://learn.microsoft.com/en-us/troubleshoot/windows-server/backup-and-storage/refs-volume-dpm-unresponsive---------------------------------------------------------------------------------------------------------------------
--If the reply is helpful, please Upvote and Accept it as an answer--