Exchange 2016 DAG node chashes

2024-11-12T18:35:38.2233333+00:00

Hi!
Need some help with my Exchange 2016 DAG.
At last few moths we encountered problem with our DAG nodes.
We have one DAG (ExDAG) with two nodes: Ex01 and Ex02.

Situation. By some unknown reason Ex01 periodically crashes. Event log can't give us any explanation about reasons. But that's not the worst thing, after all we have DAG. The worst is that some times after Ex01 crashes DAG node (Ex01) not up to cluster.
Get-clusternode returns: Ex01 - state - down
Get-clusterNetwork returns: Ex01 - state - down
Get-clusterNetworkInterface: Ex01- state - Unavailable
At the same time Ex01 networking working without any problems: ping, telnet and other works fine and show no problems.
We tried many of solutuon. Nothings helps. Except one: changing EX01 IP address to any other cause this node to up in cluster. After that all works perfectly.
Who can help me understand what happens? Why only IP-address change help us? I try to do
netsh int ip reset
netsh winsock reset
but no luck.
Any help will be appreciated!

Exchange Server
Exchange Server
A family of Microsoft client/server messaging and collaboration software.
1,363 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Jake Zhang-MSFT 7,235 Reputation points Microsoft Vendor
    2024-11-13T06:08:54.8+00:00

    Hi @Евгений Котляревский,

    Welcome to the Microsoft Q&A platform!

    Based on your description, you are dealing with a problem in a Database Availability Group (DAG). Here are some potential causes and solutions for Exchange 2016 DAG node problems:

    1. Even if your network tools do not show any problems, there may be an IP address conflict or ARP cache problem. Changing the IP address may temporarily resolve this conflict. You may want to check any devices on the network that may be causing this conflict.
    2. Use the "Get-ClusterNetwork" and "Get-ClusterNetworkInterface" commands to check the cluster network configuration and its status. And make sure that the Cluster Network Name is up and running. You can try disabling and enabling the affected network interface in the Failover Cluster Manager.
    3. Sometimes, a network interface card (NIC) can cause problems even if it appears to be functioning properly. Updating the NIC driver or replacing the NIC may help resolve the problem.
    4. There may be a problem with the cluster service on Ex01. Restart the Cluster service on Ex01 and see if it fixes the problem:
    
    Stop-Service ClusSvc
    
    Start-Service ClusSvc
    
    
    1. Event Log and Cluster Log: While you mentioned that the event log did not show any issues, it may be helpful to enable more detailed logging for the Cluster service. This can sometimes reveal hidden issues. You can use the Get-ClusterLog cmdlet to generate a detailed cluster log for further analysis.
    2. Ensure that the network used for the DAG is isolated from other traffic as much as possible. This helps prevent any interference from other network activity.

    Please feel free to contact me for any updates. And if this helps, don't forget to mark it as an answer.

    Best,

    Jake Zhang


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.