Troubleshoot SQL Server Always On issues

Article
2023-06-27

This article helps you resolve the common problem about Always On configuration on SQL Server.

Note

For a guided walk through experience of this article, see Troubleshooting SQL Server Always On Issues.

Original product version: SQL Server 2012 Enterprise, SQL Server 2014 Enterprise, SQL Server 2016 Enterprise
Original KB number: 10179

Important notes

Microsoft CSS data indicates that a significant percentage of customer issues is often previously addressed in a released CU, but not applied proactively and hence recommends ongoing, proactive installation of CUs as they become available. For more information, see Announcing updates to the SQL Server Incremental Servicing Model (ISM).

To check the latest CUs that may be available for your version, see How to determine the version, edition and update level of SQL Server and its components.
You can see Useful Tools for Troubleshooting and Monitoring Always On Availability Groups in Always On Availability Groups Troubleshooting and Monitoring Guide to learn more about the tools that you can use for diagnosing different types of issues and for monitoring availability groups. The guide also has additional scenarios that may not be covered in this guided walk through.
The parent node for Always On Availability Groups documentation and provides a one stop reference for various questions, see Always On Availability Groups (SQL Server).

I need pointers on setting up and configuring Always On Availability groups

If you are looking for documentation on setting up Always On configuration, please see the following documents:

Getting Started with Always On Availability Groups (SQL Server) - The document provides answers to many questions you may have about Availability groups and setup. Following all the steps in this article and reviewing Prerequisites, Restrictions, and Recommendations for Always On Availability Groups (SQL Server) will help prevent many issues that you may run into with setting up and maintaining availability groups in your environment.

Additional resources

If this information isn't helpful, see More information about Always On Availability Groups.

I am having problems configuring Always On Availability groups

Typical configuration problems include Always On Availability Groups are disabled, accounts are incorrectly configured, the database mirroring endpoint doesn't exist, the endpoint is inaccessible (SQL Server Error 1418), network access doesn't exist, and a join database command fails (SQL Server Error 35250). Review the following document for help on troubleshooting these issues:

Troubleshoot Always On Availability Groups Configuration (SQL Server)

Additional link: Fix: Error 41009 when you try to create multiple availability groups

If the issue still exists, see More information about Always On Availability Groups.

I am having issues with Listener configuration (19471, 19476, and other errors)

One of the most common configuration issues customers encounter is availability group listener creation. The errors are similar to the following:

Msg 19471, Level 16, State 0, Line 2The WSFC cluster could not bring the Network Name resource with DNS name '' online. The DNS name may have been taken or have a conflict with existing name services, or the WSFC cluster service may not be running or may be inaccessible. Use a different DNS name to resolve name conflicts, or check the WSFC cluster log for more information.
Msg 19476, Level 16, State 4, Line 2The attempt to create the network name and IP address for the listener failed. The WSFC service may not be running or may be inaccessible in its current state, or the values provided for the network name and IP address may be incorrect. Check the state of the WSFC cluster and validate the network name and IP address with the network administrator.

The majority of time, listener creation failure resulting in the previous messages are due to a lack of permissions for the Cluster Name Object (CNO) in Active Directory to create and read the listener computer object. For troubleshooting this problem, please review the following articles:

If the issue still exists, see More information about Always On Availability Groups.

Automatic Failover isn't working as expected

If you notice that the automatic failover isn't working as expected either during testing or in production, see: Troubleshooting automatic failover problems in SQL Server 2012 Always On environments.

Improper configuration of Maximum failures in the specified period is one of the leading causes for primary not automatically failing over to the secondary. The default value for this setting is N-1, where N is the number of replicas. For more information, see Failover cluster (group) maximum failures limit.

If the issue still exists, see More information about Always On Availability Groups.

I am having issues connecting to Always On Availability groups

After you configure the availability group listener for an Always On Availability Group in SQL Server 2012, you may be unable to ping the listener or connect to it from an application. You may get an error that's similar to the following:

Sqlcmd: Error: Microsoft SQL Native Client : Login timeout expired.

To troubleshoot this and similar errors, review the following:

I am having issues configuring Always On Availability groups in my Azure VM (IaaS)

Lot of issues related to Always On occur due to improper configuration of the listener. If you are having connection issues to the listener,
1. Make sure you read all the limitations of ILB listener and followed all the steps documented in the following article paying particular attention to dependency configuration, IP address, and various other parameters in the PowerShell script.
2. If unsure, you may want to delete and recreate the listener as per the above document.
If you recently moved your VM to a different service or if the IP addresses changed, you need to update the value of the IP address resource to reflect the new address and you need to recreate the load balanced endpoint for your AG. You can update the IP address using the Get or Set commands as follows:

Console
```
Get-ClusterResource "IPResourceName" | Set-ClusterParameter -name Address -value "w.x.y.z"
```

It takes a long time to failover from primary to secondary or vice-versa

After an automatic failover or a planned manual failover without data loss on an availability group, you may find that the failover time exceeds your recovery time objective (RTO). To troubleshoot the causes and potential resolutions, see Troubleshoot: Availability Group Exceeded RTO.

If the issue still exists, see More information about Always On Availability Groups.

Changes on the Primary Replica are either not reflected on or slow to replicate to the Secondary Replica

You may notice that changes on primary replica are not getting propagated to secondary in a timely manner. To troubleshoot and resolve these problems, try the following:

For SQL Server 2012 and SQL Server 2014 environments, see FIX: Slow synchronization when disks have different sector sizes for primary and secondary replica log files in SQL Server AG and Logshipping environments.
Check if the secondary nodes are in a Paused state in the Cluster administrator.
See Troubleshoot: Changes on the Primary Replica are not Reflected on the Secondary Replica.

If the issue still exists, see More information about Always On Availability Groups.

How to manage the size of transaction log for my AG databases

You can reduce the transaction log sizes by configuring regular backups at either primary or secondary servers.

Review the following topics for additional information:

If this information isn't helpful, see More information about Always On Availability Groups.

Primary or Secondary Servers struck in Resolving State or you experience unexpected failovers

Check System and Application event logs for hardware issues and other errors and work with the vendor to fix them.
If you are using Virtual machines, check their knowledge base to see if there are any recently reported issues that may be contributing to the problem. For example, Large packet loss at the guest operating system level on the VMXNET3 vNIC in ESXi (2039495) has caused issues with AG configuration in some cases.
More information:
- Diagnose Unexpected Failover or Availability Group in RESOLVING State
- External link: SQL SERVER - Always On Availability Group Stuck in Resolving State For Long time

If the issue still exists, see More information about Always On Availability Groups.

Not able to bring resources online

Check if the databases are taking a long time to recover by reviewing the messages in the SQL ErrorLog.

If the issue still exists, see More information about Always On Availability Groups.

Frequently asked questions

Is it possible to have two Listeners for one availability group?

Yes, you can set up multiple listeners for the same availability group. See How to create multiple listeners for same availability group (Goden Yao).
Is it possible to have a separate NIC card for always on traffic and Client connectivity?

Yes, you can have dedicated NIC card for Always On traffic. See Configure Availability Group to Communicate on a Dedicated Network.
What editions support Always On failover cluster instances?

This topic in SQL Server Books Online has more information: Editions and Supported Features for SQL Server 2016.
How to recover in case of a failure on all nodes of your cluster?

See WSFC Disaster Recovery through Forced Quorum (SQL Server).
Where can I find information on support for distributed transactions in AG configurations?

See Transactions - availability groups and database mirroring.
How to update Always On configurations?

See Upgrading Always On Availability Group Replica Instances.
How to add TDE (Transparent Data Encryption) enabled database to AG configuration?

To add TDE enabled DB to AG, see How to configure Always On for a TDE database.

How to configure alerts for checking if the secondary is lagging behind the primary?

You can use the following script:

SQL

SELECT ag.name AS ag_name, ar.replica_server_name AS ag_replica_server,
dr_state.database_id AS database_id,
is_ag_replica_local = CASE
    WHEN ar_state.is_local = 1 THEN N'LOCAL'
    ELSE 'REMOTE'
    END,
ag_replica_role = CASE
    WHEN ar_state.role_desc IS NULL THEN N'DISCONNECTED'
    ELSE ar_state.role_desc
    END,
dr_state.last_hardened_lsn, dr_state.last_hardened_time,
datediff(s,last_hardened_time, getdate()) AS 'seconds behind primary'
FROM (( sys.availability_groups AS ag
JOIN sys.availability_replicas AS ar
    ON ag.group_id = ar.group_id)
JOIN sys.dm_hadr_availability_replica_states AS ar_state
    ON ar.replica_id = ar_state.replica_id)
JOIN sys.dm_hadr_database_replica_states dr_state
    ON ag.group_id = dr_state.group_id AND dr_state.replica_id = ar_state.replica_id

How to get alerted if the state of the database is other than synchronized?

You can use the following script:

SQL

SELECT ag.name AS ag_name, ar.replica_server_name AS ag_replica_server,
dr_state.database_id AS database_id,
is_ag_replica_local = CASE
    WHEN ar_state.is_local = 1 THEN N'LOCAL'
    ELSE 'REMOTE'
    END,
ag_replica_role = CASE
    WHEN ar_state.role_desc IS NULL THEN N'DISCONNECTED'
    ELSE ar_state.role_desc
    END,
ar_state.connected_state_desc, ar.availability_mode_desc, dr_state.synchronization_state_desc
FROM (( sys.availability_groups AS ag
JOIN sys.availability_replicas AS ar
    ON ag.group_id = ar.group_id )
JOIN sys.dm_hadr_availability_replica_states AS ar_state
    ON ar.replica_id = ar_state.replica_id)
JOIN sys.dm_hadr_database_replica_states dr_state
    ON ag.group_id = dr_state.group_id AND dr_state.replica_id = ar_state.replica_id

You can also review the following links for additional methods to monitor Always On groups:

Share via

Troubleshoot SQL Server Always On issues

Important notes

I need pointers on setting up and configuring Always On Availability groups

Additional resources

I am having problems configuring Always On Availability groups

I am having issues with Listener configuration (19471, 19476, and other errors)

Automatic Failover isn't working as expected

I am having issues connecting to Always On Availability groups

I am having issues configuring Always On Availability groups in my Azure VM (IaaS)

It takes a long time to failover from primary to secondary or vice-versa

Changes on the Primary Replica are either not reflected on or slow to replicate to the Secondary Replica

How to manage the size of transaction log for my AG databases

Primary or Secondary Servers struck in Resolving State or you experience unexpected failovers

Not able to bring resources online

Frequently asked questions

More information about Always On Availability Groups

Feedback

Additional resources