Flexible Failover Policy for Automatic Failover of an Availability Group (SQL Server)
A flexible failover policy provides granular control over the conditions that cause automatic failover for an availability group. By changing the failure conditions that trigger an automatic failover and the frequency of health checks, you can increase or decrease the likelihood of an automatic failover to support your SLA for high availability.
The flexible failover policy of an availability group is defined by its failure-condition level and health-check timeout threshold. On detecting that an availability group has exceeded its failure condition level or its health-check timeout threshold, the availability group's resource DLL responds back to the Windows Server Failover Clustering (WSFC) cluster. The WSFC cluster then initiates an automatic failover to the secondary replica.
Important
If an availability group exceeds its WSFC failure threshold, the WSFC cluster will not attempt an automatic failover for the availability group. Furthermore, the WSFC resource group of the availability group remains in a failed state until either the cluster administrator manually brings the failed resource group online or the database administrator performs a manual failover of the availability group. The WSFC failure threshold is defined as the maximum number of failures supported for the availability group during a given time period. The default time period is six hours, and the default value for the maximum number of failures during this period is n-1, where n is the number of WSFC nodes. To change the failure-threshold values for a given availability group, use the WSFC Failover Manager Console.
This topic contains the following sections:
Health-Check Timeout Threshold
Failure-Condition Level
Related Tasks
Related Content
Health-Check Timeout Threshold
WSFC resource DLL of the availability group performs a health check of the primary replica by calling the sp_server_diagnostics stored procedure on the instance of SQL Server that hosts the primary replica. sp_server_diagnostics returns results at an interval that equals 1/3 of the health-check timeout threshold for the availability group. The default health-check timeout threshold is 30 seconds, which causes sp_server_diagnostics to return at a 10-second interval. If sp_server_diagnostics is slow or is not returning information, the resource DLL will wait for the full interval of the health-check timeout threshold before determining that the primary replica is unresponsive. If the primary replica is unresponsive, an automatic failover is initiated, if currently supported.
Important
sp_server_diagnostics does not perform health checks at the database level.
[Top]
Failure-Condition Level
Whether the diagnostic data and health information returned by sp_server_diagnostics warrants an automatic failover depends on the failure-condition level of the availability group. The failure-condition level specifies what failure conditions trigger an automatic failover. There are five failure-condition levels, which range from the least restrictive (level one) to the most restrictive (level five). A given level encompasses the less restrictive levels. Thus, the strictest level, five, includes the four less restrictive conditions, and so forth.
Important
Damaged databases and suspect databases are not detected by any failure-condition level. Therefore, a database that is damaged or suspect (whether due to a hardware failure, data corruption, or other issue) never triggers an automatic failover.
The following table describes the failure-conditions that corresponds to each level.
Level |
Failure Condition |
Transact-SQL Value |
PowerShell Value |
---|---|---|---|
One |
On server down. Specifies that an automatic failover is initiated when any of the following occurs:
This is the least restrictive level. |
1 |
OnServerDown |
Two |
On server unresponsive. Specifies that an automatic failover is initiated when any of the following occurs:
|
2 |
OnServerUnresponsive |
Three |
On critical server error. Specifies that an automatic failover is initiated on critical SQL Server internal errors, such as orphaned spinlocks, serious write-access violations, or too much dumping. This is the default level. |
3 |
OnCriticalServerError |
Four |
On moderate server error. Specifies that an automatic failover is initiated on moderate SQL Server internal errors, such as a persistent out-of-memory condition in the SQL Server internal resource pool. |
4 |
OnModerateServerError |
Five |
On any qualified failure conditions. Specifies that an automatic failover is initiated on any qualified failure conditions, including:
This is the most restrictive level. |
5 |
OnAnyQualifiedFailureConditions |
Note
Lack of response by an instance of SQL Server to client requests is irrelevant to availability groups.
[Top]
Related Tasks
To configure automatic failover
Change the Availability Mode of an Availability Replica (SQL Server) (automatic failover requires synchronous-commit availability mode)
Change the Failover Mode of an Availability Replica (SQL Server)
[Top]
Related Content
[Top]
See Also
Reference
sp_server_diagnostics (Transact-SQL)
Concepts
Overview of AlwaysOn Availability Groups (SQL Server)
Availability Modes (AlwaysOn Availability Groups)
Failover and Failover Modes (AlwaysOn Availability Groups)