SCVMM 2019 Cluster High Availability

Paul Hutchins 1 Reputation point
2020-11-18T16:42:53.1+00:00

Hi,

Setup a couple of Server 2019 core servers as Hyper-V hosts, then with SCVMM 2019 created a cluster. But I found that I had problems with creating the storage, I had to create the cluster first, then connect the storage via iSCSI initiators on each Hyper-V host. When I then look at the cluster properties I could see the iSCSI volume as available storage and could convert it into a CSV. We don't have an iSCSI SAN that supports SMI-S so I am unable to add it into the fabric of SCVMM.

Everything has been working during testing so far, A VM deployed onto the CSV can be moved back and forth via live migration on each host with no problems. I have noticed that as the cluster was created in SCVMM if I try to launch the Failover Cluster manager on either Hyper-V host I get a message its not a supported cluster type and cannot be used.

The issue I am having is I wanted to simulate a host failure, to do this I powered off one of the two hosts immediately via the Dell iDrac console. The VM on the host did not restart on the other host and there was also errors on the two VM's that were running on the host that was left running. I left it in this state for 10 minutes to see if some sort of HA process would kick in, but nothing happened. I eventually powered back on the host and eventually got the VM's running again. I can't find anything on high availability other than using the Failover Cluster Manager rather than anything specific for HA with SCVMM.

Thanks,
Paul.

System Center Virtual Machine Manager
Hyper-V
Hyper-V
A Windows technology providing a hypervisor-based virtualization solution enabling customers to consolidate workloads onto a single server.
2,547 questions
0 comments No comments
{count} votes

6 answers

Sort by: Most helpful
  1. Mico Mi 1,921 Reputation points
    2020-11-19T07:41:27.943+00:00

    Hi,
    Do you mean when you shut down host1, VM on host1 cannot failover to host2. And VMs running on host2 also have some errors?

    1. Please check if there’s any error code in Applications and service logs> Microsoft> windows> FailoverClustering/Hyper V VMMS/Hyper V Worker, if yes, please provide the detailed error information.
    2. Please run the cluster validation test and view the validation report. (Note: The storage may unavailable during the test, so please test in the maintenance time.)
    3. Please check if nodes are online in cluster manager.
    4. Please check if the system of nodes is updated to the latest, if not, please update the system.
      Thanks for your time!
      Best Regards,
      Mico Mi

    -----------------------------

    If the Answer is helpful, please click "Accept Answer" and upvote it.
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    0 comments No comments

  2. Paul Hutchins 1 Reputation point
    2020-11-19T09:01:09.97+00:00

    Hi,

    Thanks, regarding your understanding of what happened, that is correct. I'll have a look at those things and post back.

    Thanks,
    Paul.

    0 comments No comments

  3. Paul Hutchins 1 Reputation point
    2020-11-19T10:43:14.167+00:00

    Hi,

    In answer to your questions:

    1) There are a number of different errors at the time on the host that was still running (removing details of names):

    Event ID 12635 Hyper-V-SynthStor - vhdx' received a resiliency status notification. Current status: Disconnected
    Event ID 12636 Hyper-V-SynthStor - vhdx' has detected a recoverable error. Current status: Disconnected

    Followed by entries that the VM was paused.

    There are a number of other entries under FailoverClustering:

    Event 2050 - lost quorum (status = 5925), executing OnStop
    Event 2051 - [RES] Physical Disk: Failed to open key HKLM\Cluster\Quorum, status 2
    Event 2051 - [RES] Physical Disk <Cluster Disk 4>: Terminate: Failed to release disk 4, Error 5050
    Event 2050 - [QUORUM] An attempt to form cluster failed due to insufficient quorum votes. Try starting additional cluster node(s) with current vote or as a last resort use Force Quorum option to start the cluster. Look below for quorum information,
    Event 2050 - [QUORUM] To achieve quorum cluster needs at least 1 of quorum votes. There is only 0 quorum votes running
    Event 16000 Hyper-V-VMMS - The Hyper-V Virtual Machine Management service encountered an unexpected error: Catastrophic failure (0x8000FFFF).
    Event 1177 - The Cluster service is shutting down because quorum was lost
    Event 7031 - The Cluster Service service terminated unexpectedly

    Event 1653 - Cluster node 'this is the host still running' failed to join the cluster because it could not communicate over the network with any other node in the cluster

    With regards to the one above, we only have two hosts at present in the cluster.

    Event 2051 - [RCM] s_RcmRpcGetGroupState: (5908)' because of ''SCVMM "VM name" Resources' is owned by node 2, not 1.'

    2) I will run this in a few minutes (we only have test machines on this cluster) and report back.

    3) Both nodes are showing online at present.

    4) host1 is only missing the following:
    Security Intelligence Update for Microsoft Defender Antivirus - KB2267602 (Version 1.327.1185.0)

    host2 is only missing the following:
    Security Intelligence Update for Microsoft Defender Antivirus - KB2267602 (Version 1.327.1185.0)

    0 comments No comments

  4. Paul Hutchins 1 Reputation point
    2020-11-19T10:49:10.1+00:00

    I also wanted to add that when I set up the cluster via SCVMM, I found with trying to connect in the storage during the wizard that it would fail each time unless I removed the shared storage and ran the cluster wizard. To explain in more detail, I added two iSCSI volumes to both hosts (same volumes) one 5GB volume for quorum and one 500GB volume for the CSV.

    At the storage section in the wizard it detects both and states it will use the smaller disk for quorum and I can tick the CSV box and choose to format on the larger disk. The validation would never find any issues than at the time there was a warning regarding a minor windows update mismatch. Yet if do not update and remove the storage, I can then complete the cluster creation.

    When I added the storage back in, I only initially added one volume and converted this to CSV, without adding a smaller disk for quorum, but there was nothing stating at this point it required a quorum disk. Then later I added a second volume and converted to CSV in cluster properties. Is the issue that I haven't correctly configured a quorum?

    Thanks,
    Paul.

    0 comments No comments

  5. Paul Hutchins 1 Reputation point
    2020-11-19T11:01:45.593+00:00

    Hi,

    I have run the cluster validation tool, it is flagging that there is no quorum witness:

    Witness Type: No Witness Configured
    Witness Resource: No Witness Configured
    The cluster is not configured with a quorum witness. As a best practice, configure a quorum witness to help achieve the highest availability of the cluster.
    Cluster managed voting: Enabled

    Voter Name

    HOST1 State Up Assigned Vote1 Current Vote 0
    HOST2 State Up Assigned Vote1 Current Vote 1

    This quorum will be able to sustain simultaneous failures of 0 nodes.
    This quorum configuration can be changed using the Configure Cluster Quorum wizard. This wizard can be started from the Failover Cluster Manager console by selecting the cluster name in the left hand pane, then in the right "actions" pane selecting "More Actions..." and then selecting "Configure Cluster Quorum Settings...".

    However if I try to use Failover Cluster Manager on one of the hosts I receive the following:

    The cluster to which you are attempting to connect is not a version supported by this version of Failover Cluster Manager.

    I also have the following under Validate Resource Status for both Disk 1 and Disk 2:

    Validating cluster resource Cluster Disk 1.
    This resource is marked with a state of 'Offline'. The functionality that this resource provides is not available while it is in the offline state. The resource may be put in this state by an administrator or program. It may also be a newly created resource which has not been put in the online state or the resource may be dependent on a resource that is not online. Resources can be brought online by choosing the 'Bring this resource online' action in Failover Cluster Manager.

    This is the last error message in the validation report:

    Validating network name resource Name: Cluster01 for Active Directory issues.

    The cluster network name CLUSTER01 does not have Create Computer Objects permissions on the Organizational Unit. This can result in issues during the creation of additional network names in this OU.

    0 comments No comments