Unplanned failover 2 node cluster VM role is booted

Dominique Vorbrodt 21 Reputation points
2022-04-07T12:30:52.393+00:00

Dear all

I have the exactly same setup as in the question "Unplanned failover 2 node cluster fails",
see https://learn.microsoft.com/en-us/answers/questions/786126/unplanned-failover-2-node-cluster-fails.html

That is:

I have the most simple 2 node cluster (Windows Server 2022 Standard) with the most simple iSCSI shared storage.

I have 2 networks:

  • iSCSI (10.0.26.0/24)
  • Cluster & Client (192.168.26.0/24)

I have 1 CSV (C:\ClusterStorage on both nodes)

(Of course I have a AD DC).

I have only 1 Hyper-V VM as clustered role. All files for the VM reside on the CSV.

I have configured the cluster for immediate failover, using (Get-Cluster).ResiliencyLevel = 1 .

Planned live migration of the VM initiated in Cluadmin works without problems!

However, unplanned live migration of the VM (by switching off the cluster node currently hosting the VM) does not work.

The VM is not live migrated but simply powered-on / booted on the still working node.

Can anyone help please?

Thank you.
Sincerely
D.
Zurich, Switzerland.

Hyper-V
Hyper-V
A Windows technology providing a hypervisor-based virtualization solution enabling customers to consolidate workloads onto a single server.
2,537 questions
Windows Server Clustering
Windows Server Clustering
Windows Server: A family of Microsoft server operating systems that support enterprise-level management, data storage, applications, and communications.Clustering: The grouping of multiple servers in a way that allows them to appear to be a single unit to client computers on a network. Clustering is a means of increasing network capacity, providing live backup in case one of the servers fails, and improving data security.
957 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Alex Bykovskyi 1,681 Reputation points
    2022-04-07T15:07:35.333+00:00

    Hey,

    If you power off the node by unexpected power off (not graceful shutdown), cluster can't live migrate the VM. As you have described, it will be booted on the available node. Live migration is not possible, because compute resources of the VM (CPU, RAM) went offline, thus, they cannot be migrated. Failover cluster ensures high availability of the cluster roles.
    https://learn.microsoft.com/en-us/troubleshoot/windows-server/high-availability/high-availability-overview
    If you want your VM/resource to be always online, you would it to be Fault Tolerant. Might be helpful:
    https://www.starwindsoftware.com/best-practices/starwind-virtual-san-best-practices/

    Cheers,

    Alex Bykvoskyi

    StarWind Software

    Note: Posts are provided “AS IS” without warranty of any kind, either expressed or implied, including but not limited to the implied warranties of merchantability and/or fitness for a particular purpose.

    0 comments No comments

  2. Dominique Vorbrodt 21 Reputation points
    2022-04-09T08:24:05.88+00:00

    Alex

    Thank you for your valuable feedback.

    However I believe it is perfectly possible to live-migrate a VM even in the case of hardware failure of a cluster node, as this is exactly what the cluster is for.

    I know for sure that it is working with VMWare. It is called VMotion there. See: https://kb.vmware.com/s/article/1013428

    Here's a statement how it works:

    "
    In general terms, a second virtual machine is created to work in tandem with the virtual machine on which you have enabled Fault Tolerance. This virtual machine resides on a different host in the cluster and runs in virtual lockstep with the primary virtual machine. When a failure is detected, the second virtual machine takes the place of the first one with the least possible interruption of service.
    "

    In the Linux KVM world they are speaking of Pre-copy vs. Post-copy technologies.

    Thank you.
    Sincerely
    D.
    Zurich, Switzerland.