Cluster resource 'x' of type 'SQL Server Availability Group' in clustered role 'x' failed

Sandro Alves 41 Reputation points
2022-11-05T04:21:42.3+00:00

Hi,

Today I have:

  • Server A with primary and secondary bases (Synchronous) in the Datacenter on the same 10Gbps network
  • Server B with primary and secondary bases (Synchronous) in the Datacenter on the same 10Gbps network
  • Server C with secondary (Async) bases in Azure on different network

They are servers SQL 2016 with 2 socket virtuals, 20 vCPUs, 180GB and Allflash disks with good practices applied to separate disks for MDF and LDF, as well as TempDBs.

I randomly get event 35206 with a server that is in Azure on another network in asynchronous mode.

  • A connection timeout has occurred on a previously established connection to availability replicates 'SERVIDOR C' with id [X]. Either a networking or a firewall issue exists or the availability replica has transitioned to the resolving role.

As it is a scenario that we have some connectivity problems, we received the warning and we understand that it is normal.

Remembering that for this configuration we set the session timeout to 30 seconds. In fact, I think the correct time would be about 60 seconds, but the current value has greatly reduced the accumulating events.

Today I was surprised with the same event, but between my two servers that are on the same 10Gbps network.

  • Cluster resource 'X' of type 'SQL Server Availability Group' in clustered role 'X' failed.
  • Cluster resource 'Y' of type 'SQL Server Availability Group' in clustered role 'Y' failed.

So I went to investigate if there was a physical failure of the servers and I don't see ANYTHING. All statistics show a normal behavior of what has always happened. In fact, at this time I find even more aggressive results on other days that did not generate any problems.

I continued delving into the SQL logs and found:

  • availability_group_lease_expired: LeaseNotValid (lease_interval - 10000) for availability_group_name X and Y.

In windows events I find error like this:

  • Replication-Replication Distribution Subsystem: agent XPTO failed. The DDL change has been replicated.
  • Report Server Windows Service (MSSQLSERVER) cannot connect to the report server database.

And then I see the information:

EventID 41093: Always On: The local replica of availability group 'X' is going offline because the corresponding resource in the Windows Server Failover Clustering (WSFC) cluster is no longer online. This is an informational message only. No user action is required.

EventID 35206: A connection timeout has occurred on a previously established connection to availability replicates 'SERVIDOR B' with id [X]. Either a networking or a firewall issue exists or the availability replica has transitioned to the resolving role.

EventID 19406: The state of the local availability replicates in availability group 'X' has changed from 'RESOLVING_NORMAL' to 'SECONDARY_NORMAL'. The state changed because the availability group state has changed in Windows Server Failover Clustering (WSFC). For more information, see the SQL Server error log, Windows Server Failover Clustering (WSFC) management console, or WSFC log.

EventID 1480: The availability group database "X" is changing roles from "RESOLVING" to "SECONDARY" because the mirroring session or availability group failed over due to role synchronization. This is an informational message only. No user action is required.

Any suggestions on what else to look at or monitor?

Thanks.

257319-screenshot-2022-11-05-011348.png

257359-screenshot-2022-11-05-011246.png

257431-screenshot-2022-11-05-010850.png

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
13,067 questions
{count} votes

1 answer

Sort by: Most helpful
  1. YufeiShao-msft 7,076 Reputation points
    2022-11-07T07:55:03.413+00:00

    Hi @Sandro Alves ,

    A connection timeout has occurred on a previously established connection to availability replicates 'SERVIDOR C' with id [X]. Either a networking or a firewall issue exists or the availability replica has transitioned to the resolving role.

    This problem might occur only on very powerful computers and when SQL Server is very busy, such as the systems where you have more than 24 cores CPU and SQL Server is highly transactional. A simple solution is to restart SQL Server service on your service on your secondary replica where you are getting this error, it can fix the issue for the time being. And you better make sure your SQL version is SQL Server 2016 RTM CU5 or SP1 CU1 or later

    According to: KB3213703 - FIX: An Always On secondary replica goes into a disconnecting state

    Cluster resource 'X' of type 'SQL Server Availability Group' in clustered role 'X' failed.

    From your event, an automatic failover is triggered on the instance of SQL Server, if not successful, the secondary replica does not successfully transaction to the primary role, what the state is your availability replica in? Is resolving?

    Then you can try to perform a force manual failover of an availability group to see what will happen

    Replication-Replication Distribution Subsystem: agent XPTO failed. The DDL change has been replicated.

    If you have added a new column on the publisher, you would have seen this message, you may need to compare publisher and subscriber table schemas

    Report Server Windows Service (MSSQLSERVER) cannot connect to the report server database.

    This may be because SQL Server Database Engine service is not running, the remote connections or the TCP\IP protocol is not enabled, the reported server database is not configured correctly, or the service account is not configured correctly, or the account no longer has permissions on the report server database, please check them, use the Reporting Services Configuration tool to configure the report server database and service account

    -------------

    If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".

    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    0 comments No comments