Exchange 2016 DAG okay, but rebuilt node missing in Failover Cluster

Howard Gyton 101 Reputation points
2021-03-16T13:55:10.123+00:00

Hi,

We recently rebuilt one of our Exchange servers, and have come across an issue with the Windows Failover Clustering, rather than the Exchange side of things. Once the server had been rebuilt, we added that note back into the DAG via the Exchange console. We then proceeded to re-seed the passive database copies. All of that worked okay, but we get failures when we test the replication health.

It looks like the process of adding the clustering service, but without being told it was waiting for a server restart to complete, which we didn't do. I suspect that is the reason why in the Windows Failover Clustering, it only shows a single node. When I attempt to add the newly built node to that cluster, it fails stating that the node is already part of the cluster.

Running the following command shows:

cluster /cluster:DAG02 /add /node:SERVER1

Configuring node SERVER1

12% Validating cluster state on node SERVER1.This phase encountered an error for Cluster object 'Node SERVER1 appears to be a member of a cluster. It is either a member of an existing cluster or the node was not cleaned up after being evicted from a cluster. If you are sure this is not a member of a cluster run the Remove-ClusterNode cmdlet with the -Force parameter to clean up the cluster information from the node and then try to add it to the cluster again.' but will continue. The error status is 5065 (0x000013C9).
This phase has failed for Cluster object 'SERVER1' with an error status of 5065 (0x000013C9).
This phase has failed for Cluster object 'SERVER1' with an error status of 5065 (0x000013C9).
Cleaning up SERVER1.

System error 5065 has occurred (0x000013c9).
The cluster node is already a member of the cluster.

cluster node

Listing status for all available nodes:

Node Node ID Status


SERVER2 2 Up

Checking the database copy status on SERVER1:

Get-MailboxDatabaseCopyStatus -Server SERVER1

Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex
Length Length State


EDB AC 01\SERVER1 Healthy 0 0 16/03/2021 09:50:05 Healthy
EDB DG 01\SERVER1 Healthy 0 0 16/03/2021 09:50:21 Healthy
EDB HJ 01\SERVER1 Healthy 0 0 16/03/2021 09:49:47 Healthy
EDB KM 01\SERVER1 Healthy 0 0 16/03/2021 09:49:11 Healthy
EDB NR 01\SERVER1 Healthy 0 0 16/03/2021 09:47:09 Healthy
EDB SZ 01\SERVER1 Healthy 0 0 16/03/2021 09:49:48 Healthy

And on SERVER2:

Get-MailboxDatabaseCopyStatus -Server SERVER2

Name Status CopyQueue ReplayQueue LastInspectedLogTime ContentIndex
Length Length State


EDB DG 01\SERVER2 Mounted 0 0 Healthy
EDB AC 01\SERVER2 Mounted 0 0 Healthy
EDB HJ 01\SERVER2 Mounted 0 0 Healthy
EDB KM 01\SERVER2 Mounted 0 0 Healthy
EDB NR 01\SERVER2 Mounted 0 0 Healthy
EDB SZ 01\SERVER2 Mounted 0 0 Healthy

I'm not sure how to proceed here.

I don't know whether it would be safe to run the suggested command, "Remove-ClusterNode SERVER1 -force" to cleanup the metadata, then attempt to re-join it to to failover cluster, without upsetting anything else on the Exchange side.

I don't know whether running the "Clear-ClusterNode" on the affected node would help, and allow me to add this node back in to the "DAGO2" cluster.

Exchange | Exchange Server | Management
Windows for business | Windows Server | Storage high availability | Clustering and high availability
Windows for business | Windows Server | User experience | Other
0 comments No comments
{count} votes

Accepted answer
  1. Howard Gyton 101 Reputation points
    2021-03-17T09:17:10.033+00:00

    It looks like it was much simpler than we thought. For some reason, when I added the rebuilt server into the DAG, it was not automatically joined to the Failover Cluster, as you suggest. As I was fiddling around trying to find what was wrong, and finding the message about the pending reboot, I noticed that the service was Disabled. I switched this to Automatic after the reboot. Then I found that trying to manually add it to the cluster failed.

    A colleague found that if you switch the service back to Disabled, it then allows itself to join the Failover Cluster! It looks like that message I saw about it being a member of an existing cluster is bogus, and it really should report that the service is not Disabled.

    1 person found this answer helpful.
    0 comments No comments

2 additional answers

Sort by: Most helpful
  1. Anonymous
    2021-03-17T06:18:08.627+00:00

    Hi @Howard Gyton ,

    Good day!

    Please run the following cmdlet to check the the DAG and try to remove the Server1 and retry adding it if there is Server1, if not you can try adding it.

    Get-DatabaseAvailabilityGroup  
    Remove-DatabaseAvailabilityGroupServer -Identity "DAGName" -MailboxServer Server1  
    Add-DatabaseAvailabilityGroupServer -Identity "DAGName" -MailboxServer Server1  
    

    If this couldn't work, you should run the Remove-ClusterNode. You don't have to worry about the data loss, this command only remove the node from the cluster, it's like removing the member from a DAG.

    I think you will could add the server after removing the node.

    Regards,
    Lou


    If the response is helpful, please click "Accept Answer" and upvote it.
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    1 person found this answer helpful.

  2. Howard Gyton 101 Reputation points
    2021-03-17T09:17:28.287+00:00

    Both DAG, and failover cluster are now healthy!


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.