Hyper-V 2016, 5120 CSV has entered a paused state due to status_io_timeout

Ryan 1 Reputation point
2021-10-30T02:54:47.27+00:00

Looking for help as our issue reoccurs every 1-2 days over the past 2 months. We have already enabled jumbo frames on iscsi ports, switch and san. Also enabled VLT on our 2 Dell s4048t switches. Firmware and patches are up to date. No luck resolving thus far.

Issue: Every few days our hyper-v hosts are acting up and becoming somewhat unstable all while our guest VMs sporadically reboot and become unstable.

Environment:-
Cluster – Non S2D
Node – 4 Nodes
VM- Hyper V
Storage – ISCSI SAN (Dell Compellent)
OS – 2016
Hardware: Dell r730s
AV - Defender
Backups - MABS/DPM (guest vm level only)

Event IDs 5120, 5142, 1069, 1146 and 1230. Mainly 5120 and the CSVs are entering a paused state due to IO_STATUS_TIMEOUT

Also see this in cluster log: Microsoft Failover Cluster Virtual Adapter (NetFT) has missed more than 40 percent of consecutive heartbeats

No solution, the only thing to remediate the issue is powering down guest vms to remove load on the hosts (note that our hosts are underwhelmed from a memory and cpu standpoint)

Windows for business | Windows Client for IT Pros | Storage high availability | Virtualization and Hyper-V
Windows for business | Windows Server | Storage high availability | Clustering and high availability
Windows for business | Windows Server | User experience | Other
{count} votes

1 answer

Sort by: Most helpful
  1. Limitless Technology 40,076 Reputation points
    2021-12-20T17:17:18.303+00:00

    Hello @Ryan

    1. There could be some Latency between your CSV Storage network and Hyper-V network.
    2. Please check if you have any File Server , SQL server or any application server which requires frequent use of Storage access which can lead to I/O bottleneck
    3. Please Disable any Antivirus program you may have for temporary purpose.
    4. Please check if you have any QoS at firewall or Switch or Dell storage level level which is not allowing full traffic flow between storage , host and VMs.
    5. Please try to disable Time of DPM back up during non-working hours or during weekend.
    6. Please run Hyper-V Cluster validation wizard to check all cluster configurations are identical and there should be no warning or errors in the Cluster report.

    Please have a look on below Microsoft article to troubleshoot the I/O issue in the Hyper-V Cluster.

    https://techcommunity.microsoft.com/t5/failover-clustering/troubleshooting-cluster-shared-volume-auto-pauses-8211-event/ba-p/371994

    -----
    --If the reply is helpful, please Upvote and Accept as answer--


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.