Hyper-v cluster 2022: WARN Cluster Shared Volume 'Volume2' ('Cluster Disk 3') has entered a paused state because of 'STATUS_CONNECTION_DISCONNECTED(c000020c)'. All I/O will temporarily be queued until a path to the volume is reestablished.

Nguyen Thanh Hieu 61 Reputation points
2024-03-11T07:43:54.94+00:00

Hi Everyone,

I am facing quit headeach issue, took many hours for google seaching but still can not fix. It happing in our Production Hyper-V enviroment so impact quite high ultil now.

I have 2 Hyper-V Clusters (2 nodes each), both running Windows Server 2022 Enterprise. Issue happening on both cluster with same symtom.

Let me describe detail setting of each cluster:

  • Node: 2 nodes
  • Network on each node 2 NICs for Production network: Client & cluster use allowed. 2 NICs for Backup network (no Gateway): No Cluster & client 2 NICs for heartbeath & Live migration: Cluster only use User's image User's image
  • Cluster storage 1 Disk Witness 4 Cluster Share Volumes User's image
  • Virtual Machines All VM will store OS disk on CSV Some VM using Virtual HBA to connect with external storage. 54 VMs each cluster

Clusters seem worked fine after created with less of VMs. But after we created many VMs (Migrated from other old clusters) the problem come.

Problem 1:

  • For 4 months ago, cluster node can't access to other node in cluster User's image Example: Node 02b can't access to node 02a via WRM. Only one way to fix until now is reboot both nodes. But problem will happen again after around one month. User's image User's image
  • No firewall blocked between nodes

Problem 2:

  • CSVs randomly "paused" (one per month) and all VMs "paused csv" will hang and can't not Live/quick migrate to other node. The only one way to temporary over come this is turn off cluster service of any node of cluster, keep remaing node running and everything fine (without HA). If we turn on cluster service on both node, the issue happen again and again.(look like the "split brain" happened but i still not sure)

These are some clusterLog lines:

[System] 00003ad4.00003e10::2023/12/15-10:17:13.493 WARN Cluster Shared Volume 'Volume3' ('Cluster Disk 2') has entered a paused state because of 'STATUS_CONNECTION_DISCONNECTED(c000020c)'. All I/O will temporarily be queued until a path to the volume is reestablished. This error is usually caused by an infrastructure failure. For example, losing connectivity to storage or the node owning the Cluster Shared Volume being removed from active cluster membership.

`Line     5752: [System] 00003ad4.00003e10::2023/12/15-10:17:13.505 WARN  Cluster Shared Volume 'Volume2' ('Cluster Disk 3') has entered a paused state because of 'STATUS_CONNECTION_DISCONNECTED(c000020c)'. All I/O will temporarily be queued until a path to the volume is reestablished.`
````	Line    23330: [System] 00003b3c.0000436c::2024/01/25-08:08:20.510 WARN  Cluster Shared Volume 'Volume4' ('Cluster Disk 5') has entered a paused state because of 'STATUS_CONNECTION_DISCONNECTED(c000020c)'. All I/O will temporarily be queued until a path to the volume is reestablished.`

`	Line    23331: [System] 00003b3c.0000436c::2024/01/25-08:08:23.865 WARN  Cluster Shared Volume 'Volume1' ('Cluster Disk 4') has entered a paused state because of 'STATUS_CONNECTION_DISCONNECTED(c000020c)'. All I/O will temporarily be queued until a path to the volume is reestablished.`

`	Line    23332: [System] 00003b3c.0000436c::2024/01/25-08:14:22.629 WARN  Cluster Shared Volume 'Volume4' ('Cluster Disk 5') has entered a paused state because of 'STATUS_CONNECTION_DISCONNECTED(c000020c)'. All I/O will temporarily be queued until a path to the volume is reestablished.`

`	Line    23339: [System] 00003b3c.0000436c::2024/01/25-08:18:52.256 WARN  Cluster Shared Volume 'Volume1' ('Cluster Disk 4') has entered a paused state because of 'STATUS_CONNECTION_DISCONNECTED(c000020c)'. All I/O will temporarily be queued until a path to the volume is reestablished.`

`	Line    23340: [System] 00003b3c.0000436c::2024/01/25-08:20:33.189 WARN  Cluster Shared Volume 'Volume1' ('Cluster Disk 4') has entered a paused state because of 'STATUS_CONNECTION_DISCONNECTED(c000020c)'. All I/O will temporarily be queued until a path to the volume is reestablished.`

`	Line    23351: [System] 00003b3c.0000436c::2024/01/25-08:26:01.025 WARN  Cluster Shared Volume 'Volume4' ('Cluster Disk 5') has entered a paused state because of 'STATUS_CONNECTION_DISCONNECTED(c000020c)'. All I/O will temporarily be queued until a path to the volume is reestablished.`

**Problem 3:**

DNS name issue happend randomly after already fixed. No problem when do DNS verify:

- nslookup Forward/Reverse

- Ping via dns name..

`Cluster Network name: 'Cluster Name'`

`Error code: 'DNS bad key.`

`Guidance:`

`Ensure that the network adapters associated with dependent IP address resources are configured with access to at least one DNS server.`

`[System] 00004fb4.000042a8::2023/12/08-10:21:06.625 ERR   Cluster network name resource failed to modify the DNS registration.`

**Some other things could be consider.** 

- Acronis Backup Software Backup Cluster host & VM (daily incremental, weekly full)

- Bitdefender anti-virus software

- There are no network firewall blocked between node

- Cluster nodes firewall turned off totally.

Cluster log uploaded :

[https://1drv.ms/u/s!AgFJBZyCAor4jYkd859skYMHXyTbXg?e=hyujti](https://1drv.ms/u/s!AgFJBZyCAor4jYkd859skYMHXyTbXg?e=hyujti)

Please give me your suggestion and ideas to fix this issue based on your experences and knowledge with big thanks

Br,

Hieu

Hyper-V
Hyper-V
A Windows technology providing a hypervisor-based virtualization solution enabling customers to consolidate workloads onto a single server.
2,735 questions
Windows Server Clustering
Windows Server Clustering
Windows Server: A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.Clustering: The grouping of multiple servers in a way that allows them to appear to be a single unit to client computers on a network. Clustering is a means of increasing network capacity, providing live backup in case one of the servers fails, and improving data security.
1,008 questions
0 comments No comments
{count} votes

Accepted answer
  1. Ian Xue 37,621 Reputation points Microsoft Vendor
    2024-03-13T09:31:31.3233333+00:00

    Hi Nguyen Thanh Hieu,

    Thanks for your post. there might be one cause, when accessing a CSV volume from a passive (non-coordinator) node, the disk I/O to the owning (coordinator) node is routed through a 'preferred' network adapter and requires SMB be enabled on that network adapter. For SMB connections to work on these network adapters, the following protocols must be enabled:

    • Client for Microsoft Networks
    • File and Printer Sharing for Microsoft Networks

    You can refer the following article to see if the solution can take effect:Unable to access ClusterStorage folder - Windows Server | Microsoft Learn

    Best Regards,

    Ian Xue


    If the Answer is helpful, please click "Accept Answer" and upvote it.


5 additional answers

Sort by: Most helpful
  1. Net Runner 615 Reputation points
    2024-03-11T11:22:41.7333333+00:00

    Problem 1:

    Unfortunately, those kinds of problems are quite tricky to find since these can be related to WinRM or Cluster services being stuck one way or another. One of the first steps I would recommend doing is trying to use Windows Admin Center https://aka.ms/windows-admin-center for node/cluster management purposes and see if it still works when common Server Manager or Failover Cluster Manager starts showing that error. That can be a temporary workaround and give you clues on where to dig further.

    Do you reboot your cluster nodes on a regular schedule that is more frequent than once in a month? If you don't and the error occurs after approximately a month of uptime, a scheduled weekend reboot of the cluster that may or may not include updates and patching is also an option to workaround the problem.

    If none of the above helps, I would seriously think about redeploying the cluster nodes from scratch.

    Problem 2:

    Since you did not specify what kind of storage you are using, I may assume that it is some sort of hardware SAN. The I/O errors you are seeing tend to happen in cases when storage performance is too low for the workload you have. A performance bottleneck may be at the disk(s) side or network connection bandwidth. Since you mentioned that these problems started to happen after you moved more VMs to the present cluster, that is another valid argument for my assumption. In order to check that idea, you may temporarily shut down some of the most performance-intensive virtual machines to see if the problem goes away. Additionally, you may want to run Dell Liveoptics https://www.liveoptics.com/ or VeeamONE Report https://www.veeam.com/virtual-server-management-one-free.html to see the performance numbers you have a performance that your storage system offers and compare.

    If your servers have some fast internal storage present, you can use Storage Spaces Direct https://aka.ms/s2d or StarWind Virtual SAN https://www.starwindsoftware.com/vsan software to present directly attached disks as Cluster Shared Volumes. That configuration may host your most performance-critical virtual machines and free up some storage resources from the may array.

    Problem 3:

    That looks like a damaged Active Directory / DNS record for your cluster name. Try re-adding the IP address and DNS name of the cluster in cluster core resources management panel and see if that changes anything. You may find the following thread helpful: https://learn.microsoft.com/en-us/answers/questions/864201/windows-cluster-dns-error-event-id-1260-dns-bad-ke.


  2. Nguyen Thanh Hieu 61 Reputation points
    2024-03-13T02:28:08.53+00:00

    This very urgent, please help me!!

    0 comments No comments

  3. Nguyen Thanh Hieu 61 Reputation points
    2024-03-14T03:20:36.3566667+00:00

    Hi Lan Xue,

    Thanks for your answer me.

    I checked and detail as belowing picture.

    User's image

    Do you think i need enable those functions on individual interface of Teamed inteface?

    User's image

    Thank you so much.

    0 comments No comments

  4. Nguyen Thanh Hieu 61 Reputation points
    2024-03-18T02:32:10.53+00:00

    Hi,

    Please help me, I can't re-build system from scratch because using for production.

    Thanks!

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.