Monitoring AlwaysOn Health with PowerShell - Part 2: Advanced Cmdlet Usage

Part 2 – Advanced Usage of AlwaysOn Health Cmdlets

In this part, I’ll cover some advanced features of the AlwaysOn health cmdlets. Please see Part 1 for a basic overview of these cmdlets. We’ll look at how we can drill down into unhealthy objects to discover individual policy failures, and how to include custom policies in health evaluation. We’ll also talk about some performance considerations. Please note that all PowerShell commands demonstrated below should be executed within the SQLPS shell.

Recall the AlwaysOn health cmdlets are:

  • Test-SqlAvailabilityGroup
  • Test-SqlAvailabilityReplica
  • Test-SqlDatabaseReplicaState

AlwaysOn Health Model

Read this blog for an in depth look at the AlwaysOn health model. To briefly summarize: to compute the health of an availability group, availability replica, or availability database, we execute a collection of Policy-Based Management (PBM) policies that are installed along with SQL Server 2012. These policies test various aspects of your high availability configuration, and based on the results we compute an overall health state of the object in question. Some policies are categorized as more critical than others, so that one policy failure might result in a warning and another in a critical error. Users can add their own policies to this collection to extend the health model.

Viewing Policy Evaluation Details

Using our AlwaysOn health cmdlets, you can obtain policy evaluation details in two ways:

  1. Every health result returned by a Test-Sql* cmdlet has a field, PolicyEvaluationDetails. This field stores an array of policy evaluation details, one entry for each policy executed.
  2. Every Test-Sql* cmdlet has a –ShowPolicyDetails parameter. If you specify this parameter, the cmdlet will enumerate the policy evaluation details instead of returning a single result.

Here are some examples. Last time we were investigating the availability group ‘VLEApplication’ which had a critical error.

PS > cd SQLSERVER:\SQL\WSNAVELY1-SH21Q\DEFAULT\AvailabilityGroups
PS > Test-SqlAvailabilityGroup .\VLEApplication

HealthState Name
----------- ----
Error VLEApplication

We know that the availability group is in a critical state, but we can’t immediately see the policy failures responsible for this. We’ll first demonstrate method (1) from above for getting this information, namely accessing the PolicyEvaluationDetails field on the object returned by the Test-SqlAvailabilityGroup cmdlet.

PS > $result = Test-SqlAvailabilityGroup .\VLEApplication
PS > $result.PolicyEvaluationDetails | ft Result,Name -AutoSize

Result Name
------ ----
True AlwaysOnAgOnlineStateHealthPolicy
True AlwaysOnAgWSFClusterHealthPolicy
False AlwaysOnAgAutomaticFailoverHealthPolicy
True AlwaysOnAgReplicasConnectionHealthPolicy
False AlwaysOnAgReplicasDataSynchronizationHealthPolicy
True AlwaysOnAgReplicasRoleHealthPolicy
False AlwaysOnAgSynchronousReplicasDataSynchronizationHealthPolicy

Note I apply some simple formatting to the output (ft, alias for Format-Table). We see that three policies failed (returned a result of False):

  • AlwaysOnAgAutomaticFailoverHealthPolicy
  • AlwaysOnAgReplicasDataSynchronizationHealthPolicy
  • AlwaysOnAgSynchronousReplicasDataSynchronizationHealthPolicy

At this point, I can connect to the server instance with SQL Server Management Studio and determine the conditions monitored by these policies, and take appropriate action.

Now I’ll demonstrate method (2) from above, using the –ShowPolicyDetails parameter.

PS > Test-SqlAvailabilityGroup .\VLEApplication -ShowPolicyDetails | ft Result,Name -AutoSize

Result Name
------ ----
True AlwaysOnAgOnlineStateHealthPolicy
True AlwaysOnAgWSFClusterHealthPolicy
False AlwaysOnAgAutomaticFailoverHealthPolicy
True AlwaysOnAgReplicasConnectionHealthPolicy
False AlwaysOnAgReplicasDataSynchronizationHealthPolicy
True AlwaysOnAgReplicasRoleHealthPolicy
False AlwaysOnAgSynchronousReplicasDataSynchronizationHealthPolicy

Again I apply some simple formatting to the result. We see that the output of method (1) and (2) is identical.

Including User Policies

You are able to extend the default AlwaysOn health model with your own policies. I won’t go into details here about how to create these custom policies, but please refer to this post for more information. By default, the AlwaysOn health cmdlets won’t evaluate user policies; however you can force them to do so by specifying the AllowUserPolicies parameter, supported by all three cmdlets. This is a simple switch parameter:

Test-SqlAvailabilityGroup .\VLEApplication -ShowPolicyDetails -AllowUserPolicies

Performance Considerations

The AlwaysOn health cmdlet follow this general pattern: accept an object from the pipeline, refresh its properties, then execute a collection of PBM policies against the object. The objects in this case are part of the SQL Server Management Object (SMO) framework. Test-SqlAvailabilityGroup accepts objects of type AvailabilityGroup, Test-SqlAvailabilityReplica objects of type AvailabilityReplica, and Test-SqlDatabaseReplicaState objects of type DatabaseReplicaState. If you are monitoring many objects – say, you have an availability group with 100 databases – this process will be very chatty, since each object refresh will result in a query to the server.

In this scenario, you can suppress the refresh behavior of these cmdlets: simply specify the –NoRefresh parameter. In this case you will be responsible for ensuring that the SMO objects passed to the cmdlets have recent data. Let’s consider the scenario mentioned: our availability group “LargeAvailabilityGroup” has 100 databases across two replicas, for a total of 200 databases, and you want to evaluate the health of all these databases. First let’s use the naïve approach.

PS > $serverObj = New-Object Microsoft.SqlServer.Management.Smo.Server("WSNAVELY1-SH21Q")
PS > $ag = $serverObj.AvailabilityGroups["LargeAvailabilityGroup"]
PS > Measure-Command { $ag.DatabaseReplicaStates | Test-SqlDatabaseReplicaState }
Days : 0
Hours : 0
Minutes : 0
Seconds : 23
Milliseconds : 430

This took about 20 seconds to execute, rather a long time. Now we’ll demonstrate an optimized approach. First, we’ll want to turn off the auto-refresh behavior of the cmdlet using the –NoRefresh parameter as mentioned. Next, we need to figure out an efficient way to load the data we need from the server. Fortunately, SMO provides a mechanism for efficiently loading data for collections of objects, the “SetDefaultInitFields” method on the SMO Server object. For a detailed explanation of how this mechanism works, please refer to these blog posts by Michiel Wories:

Here’s a complete example with our cmdlet:

PS > $ServerObj = New-Object Microsoft.SqlServer.Management.Smo.Server("WSNAVELY1-SH21Q")
PS > $ServerObj.SetDefaultInitFields([Microsoft.SqlServer.Management.Smo.AvailabilityGroup], $true)
PS > $ServerObj.SetDefaultInitFields([Microsoft.SqlServer.Management.Smo.AvailabilityReplica], $true)
PS > $ServerObj.SetDefaultInitFields([Microsoft.SqlServer.Management.Smo.DatabaseReplicaState], $true)
PS > Measure-Command { $ag.DatabaseReplicaStates | Test-SqlDatabaseReplicaState -NoRefresh }
Days : 0
Hours : 0
Minutes : 0
Seconds : 7
Milliseconds : 317

This time the evaluation took about 7 seconds, much faster. The primary difference here is the number of queries issued. The first example issues a query for each database, while the second example issues a single query that fetches all the required data.

We've now covered most of the advanced usage scenarios for these cmdlets. In Part 3 we will combine what we've learned into a simple script.