OpsMgr: Alerting on agents without a failover MS with a Powershell-based Rule
Creating a custom Powershell script based rule to alert on agents without a failover management server in a management group.
(Note: This article was written back in 2012 for OpsMgr 2007 R2 and is reposted to this MSDN blog. )
In this post, a Powershell script that returns a list of agents without a failover management server is featured. Then, the steps taken to include it in a custom rule of a management pack is briefly illustrated. For further details please refer to this earlier post.
Here are some features of the Powershell Script: SCOMFindFailoverServer.ps1:
- uses OpsMgr 2007 Cmdlets.
- takes in the name of the Root Management Server (RMS) as its first parameter, loads the OpsMgr Powershell snap-in and connects to the management group via the RMS.
- returns a list of agents that have not been assigned a failover management server (mostly for agents in untrusted domains or DMZ).
- for each agent listed, the list will also include other management servers in its same domain that are not its management server, as potential candidates to become its failover management server.
- accepts as its second parameter, a list of FQDN of managed agents to exclude from further queries and comands in the script ( $exclusionList). The input string format can be like: "FQDN1,FQDN2,FQDN3".
- if the value of the third input paramter ( $autoSwitch) is "true", then the script will automatically assign a failover management server of the same domain to each agent without a failover management server that is found.
* Note that the auto-assignment will only work for agents with a single potential failover candidate. For agents with more than one potential candidate, the auto-assignment will not be applied but instead the available management servers per agent will all be listed in the script output.
To include the script in a custom rule, the Powershell script was modified to return its output in a propertybag, as highlighted below. The main logic of the script remains unchanged.
////////////////////////////////////////////////////////// Sample PS script start ////////////////////////////////////////////////////////
param([string] $RMS, [string] $exclusionList, [string] $autoSwitch)
# Load OpsMgr snap-inn
Add-PSSnapin Microsoft.EnterpriseManagement.OperationsManager.Client -ErrorAction SilentlyContinue -ErrorVariable Err
# Connect to OpsMgr
New-ManagementGroupConnection $RMS -ErrorAction SilentlyContinue -ErrorVariable Err
Set-Location "OperationsManagerMonitoring::" -ErrorVariable errSnapin;
$API = new-object -comObject "MOM.ScriptAPI"
$PropertyBag = $API.CreatePropertyBag()
$ScriptName = "SCOMFindFailoverServer.ps1"
$newline = "`r`n"
$tab = " " + "`t"
if($autoSwitch -eq "true") {
$FullList="List of Agents without Failover Management Server (AUTO ALLOCATION SWITCH IS ON):" + $newline }
else {
$FullList="List of Agents without Failover Management Server:" + $newline }
$managementservers = get-managementserver | where{!$_.IsRootManagementServer}
$myagents = get-agent | where-object{(!$_.getfailovermanagementservers())}
# Null Check
if($myagents) {
foreach ($agent in $myagents) {
if(! $exclusionList.contains($agent.name)) {
$FullList = $FullList + $agent.name + $tab
$alternateFailoverMS = $managementservers | where{$_.Domain -eq $agent.Domain -and $_.PrincipalName -ne $agent.primarymanagementservername}
$alernatives = $alternateFailoverMS.count #will return null if only 1 item is found
if($alternateFailoverMS) {
$FullList = $FullList + "Other Available MS: "
$i = 1
if(!$alernatives) {
$FullList = $FullList + $alternateFailoverMS.PrincipalName
if($autoSwitch -eq "true") {
$FullList = $FullList + " (Allocated) " + $newline
##################################################################
$primaryMS = Get-ManagementServer | where {$_.Name –eq $agent.primarymanagementservername}
Set-ManagementServer -AgentManagedComputer: $agent -PrimaryManagementServer: $primaryMS -FailoverServer: $alternateFailoverMS
################################################################## }
else {
$FullList = $FullList + $newline }
}
else {
while($i -lt $alernatives) {
$FullList = $FullList + $alternateFailoverMS[$i-1].PrincipalName + " / "
$i++ }
$FullList = $FullList + $alternateFailoverMS[$i-1].PrincipalName + $newline }
}
else {
$FullList = $FullList + "No Available Failover MS Identified. " + $newline }
}
}
}
else {
$FullList= "" }
if($FullList -eq "") {
$PropertyBag.AddValue("State","OK")
$PropertyBag.AddValue("Description", "All Managed Agents contain at least 1 failover management server.") }
else {
$PropertyBag.AddValue("State","ERROR")
$PropertyBag.AddValue("Description", $FullList) }
$PropertyBag
////////////////////////////////////////////////////////// Sample PS script end ////////////////////////////////////////////////////////
The custom rule that uses the Powershell script above was created in a new management pack: Take-A-Wei Demo Failover Management Server Monitoring Management Pack.
This new management pack references the sealed Take.A.Wei.MP.Demo.Library library pack:
This is to allow the module, Powershell.PropertyBag.DataSource, in the library pack to be accessible by the custom rule as its data source.
Here is a summary on how this custom rule, Failover Management Server Monitoring and Alerting Rule, was created and configured:
The rule has the following ID and display name (Failover Management Server Monitoring and Alerting Rule), and targets the Microsoft.SystemCenter.RootManagementServer class:
The rule was configured with 1 data source module, 1 condition detection module, and 1 write action module:
For the data source module, the following parameters were added by editing the XML.
The datasource module will run the Powershell script on schedule, with an interval of 600 seconds and accepting the target's principal name variable as its parameter value. Ideally the schedule can be set to run once a week (intervalSeconds = 604800) in a production environment to reduce overhead on the OpsMgr SDK.The value for the autoSwitch parameter is set to "false" to disable the auto-assignment feature of the Powershell script as mentioned above. This data source module's output will then be evaluated by the condition detection module.The System.ExpressionFilter module was used as the condition detection module with an expression that looks for a particular value from the propertybag that the data source outputs:
The System.Health.GenerateAlert write action module was selected to generate an alert whenever the condition detection module evaluates to true. The name of the value in the propertybag was specified in the alert description:
Here are some screen shots on the test results in the OpsMgr 2007 R2 Operations Console:
Imported the library and the Failover Management Server Monitoring management packs
Alert generated with a list of agents without a failover management server.
To use this approach for a monitor, a custom monitor type will need to be created. Note that separate workflows for each health state will have to be defined for the monitor type.