Operations Manager Management Pack Authoring - Rules
This document is part of the Operations Manager Management Pack Authoring Guide. The Microsoft System Center team has validated this procedure as of the original version. We will continue to review any changes and periodically provide validations on later revisions as they are made. Please feel free to make any corrections or additions to this procedure that you think would assist other users.
Rules in System Center Operations Manager 2007 use the same Data Sources as monitors but provide different functionality. While a monitor changes state, a rule performs one of the following three functions:
- Create an alert that is not related to health state.
- Collect performance or event data for analysis and reporting.
- Run a script or command.
Alert Rules
Alert rules create alerts in response to particular conditions detected in a data source. This is the same kind of alert that is created by a monitor when it changes state. Monitors and alert rules use the same data sources and typically the same kind of logic for determining whether an error has occurred.
A monitor that creates an alert is generally preferred over an alert rule identifying the same issue for the following reasons:
- Monitors set a health state on the target object in addition to creating the alert. This identifies the category of the detected problem, the application component affected, and its effect on the overall health of the application. The health state is also recorded for availability reports that provide historical record of the availability of the application.
- Alerts created by monitors can be automatically resolved when the application returns to a healthy state. Alerts created by rules cannot be automatically resolved because there is no method of determining the healthy state.
The situations when a rule should be used instead of a monitor to create an alert are as follows:
- The problem being detected does not relate to the health of the application. For example, an application may perform an automated nightly backup. If the backup fails, then an alert should be created to inform users of this condition. The application though is still completely healthy and should not record a negative health state.
- A monitor is cannot determine when the detected problem was resolved. One of the options to this condition is to use a rule to create an alert instead of a monitor. This situation is discussed with additional options in the Event Monitors section of this guide.
Event Alert Rules
Alert rules can be created for each event data source. The criteria that is specified to determine when an alert should be created is the same as the criteria for a state change in the event monitors.
Performance Alert Rules
The Authoring Console provides no wizards for creating an alert rule based on a performance counter. A monitor should be used instead because a success condition is usually detectable from a performance counter and is usually related to some health state of the target class. Alert rules based on a performance counter can be created, although they must be done with a custom rule.
Scripting Alert Rules
The Authoring Console provides no wizards for creating an alert rule based on a script. A monitor should be used instead because a script will typically provide a return value for both and error and a healthy state in such a way that a success condition is usually detectable and related to some health state of the target class. Alert rules based on a script can be created, although they must be done with a custom rule.
Alerts from Rules
Alert Name
The name of the alert is a line of static text that allows variables through use of replacement strings.
Alert Description
The alert description may have several lines of text that includes static text or variables. The most common kind of variable in the alert description will be $Data variables to include different information from the rule’s data source in the description of the alert. The properties that are available will depend on the kind of data source being used. Each section of Data Sources includes a list of the properties available for different data sources. The following table provides syntax and examples of variables in rule alerts created from different data sources:
Data Sources | Syntax | Variable |
Windows Event | $Data/ $ | $Data/EventDescription$ |
$Data/Params/Param[#]$ | $Data/Params/Param[2]$ | |
Text Log | $Data/EventData/DataItem/ $ | $Data/EventData/DataItem/LogFileName$ |
$Data/EventData/DataItem/Params/Param[1]$ | $Data/EventData/DataItem/Params/Param[1]$ | |
Delimited Text Log | $Data/EventData/DataItem/ $ | $Data/EventData/DataItem/LogFileName$ |
$Data/EventData/DataItem/Params/Param[#]$ | $Data/EventData/DataItem/Params/Param[2]$ | |
WMI Event | $Data/EventData/DataItem/Collection[@Name='']/Property[@Name=' ']$ | $Data/EventData/DataItem/Collection[@Name='TargetInstance']/Property[@Name='Name']$ |
Syslog Event | $Data/EventData/DataItem/ $ | $Data/EventData/DataItem/Facility$ |
Priority and Severity
The Alert severity defines the alert as either Information, Warning, or Critical. This severity does not have to match the severity of the health state triggering the alert. The severity of the alert is identified by an icon in the Operations console and is used by views and notification subscriptions. The alert priority is inaccessible in the Operations console but is used primarily for notification subscriptions.
Alert Suppression
Alert suppression refers to logic that is defined on alert rules to suppress the creating an alert when a corresponding alert is still open. This prevents alert storms where multiple alerts are created for the same issue. Because the issue has already been identified with an open alert, creation of additional alert creates unnecessary noise with minimal value. When the condition for an alerting rule is met but an existing alert is already open, instead of creating an additional alert suppression will increase the repeat count of the existing alert.
In order to define suppression on an alerting rule, the fields must be specified that identify a matching alert. Before an alerting rule creates a new alert, it will check whether an open alert exists with values for the fields that are defined for suppression that match the fields of the new alert. If an alert with matching values for each of these fields is open, then a new alert is not created.
The minimum number of fields that uniquely identify the alert should be specified for alert suppression. This is typically the computer name in addition to the fields used for the criteria of the rule. For example, suppression on event rules can frequently be achieved by using the following fields:
- Logging Computer
- Event Source
- Event Number
If the rule is targeted at a class that has multiple instances on the agent, however, then a parameter might be required to uniquely identify the event in the criteria of the rule. If this is the case, then the same parameter should be specified in the alert suppression.
Automatic Alert Resolution
Automatic resolution cannot be performed on alerts that are created from a rule since a rule has no mechanism for determining that the problem has been resolved.
Collection Rules
Collection Rules are used to collect data into the Operations Manager database and data warehouse for analysis in views and into the Data Warehouse for use in reports. Even though a particular data source might be used for a monitor or alerting rule, a collection rule is still required if its information is to be available for this analysis. For example, a monitor may sample a performance counter on a regular interval and compare the value to a threshold in order to set its state. It will not store this information however, so a collection rule is also required using the same performance counter if the data is required to be views.
Event Collection Rules
Events in System Center Operations Manager 2007 can be collected from any of the data sources detailed in the Events section of this guide. For most data sources, all that is required is the criteria for which events are to be collected. This criteria is specified using the properties available to the particular data source being used.
Script Based Event Collection
Script based event collection runs a monitoring script on a regular schedule and stores the results as an event. The monitoring script returns a property bag as described in the Monitoring Scripts section of this guide. The performance collection rule maps values of the property bag into properties of the event by using $Data variables referring to different values in the property bag.
Event properties that the rule may populate are listed in the following table:
Property | Description |
Computer | The computer logging the event. Typically uses the PrincipalName property of the target object’s host computer. |
Event source | The source of the event. Typically either a static string or a value from the property bag. |
Event log | The name of the event log. Typically either a static string or a value from the property bag. |
Event ID | The number of the event. Typically either a static string or a value from the property bag. |
Category | The category of the event. |
Level | The level of the event. Typically selected from the list of available options. |
Parameters | Multiple values that contains data that does fit into the other properties. Typically one or more values from the property bag. |
Performance Collection Rules
Performance data can be collected in System Center Operations Manager 2007 from any of the data sources detailed in the Performance Data section of this guide. This section provides the details required for specifying the source of the data and the properties available for specifying the criteria for the data to be collected.
Script Based Performance Collection
Script based performance collection runs a monitoring script on a regular schedule and stores the results as performance data. The monitoring script returns a property bag as described in the Monitoring Scripts section of this guide. The performance collection rule maps values of the property bag into properties of the performance data by using $Data variables referring to different values in the property bag.
Performance data properties that the rule must populate are listed in the following table:
Property | Description |
Object | Name of the performance object. This is typically a static value but may be a $Data variable to retrieve a value from the property bag returned from the script. |
Counter | Name of the performance counter. This is typically a static value but may be a $Data variable to retrieve a value from the property bag returned from the script. |
Instance | Name of the instance if it is specified. If the target of the rule has a single instance on the agent, the instance name may not be specified. If the target of the rule has multiple instances on the agent, an instance name should be specified by using a $Target variable to retrieve the value of a unique property to identify the target object. |
Value | The numeric value to store. This is typically a $Data variable to retrieve a value from the property bag returned from the script. |
Optimized Collection
Performance collection rules based on Windows performance counters can be configured to perform optimized collection. Optimized collection reduces the space that is required by only sampling those performance counters that differ significantly from a previously sampled counter.
When optimized collection is specified, a tolerance must be specified that indicates the value that the sampled data must differ from the previously sampled value for the data to be stored. This tolerance can be either an absolute number or a percentage. An absolute tolerance evaluates the difference between the current and the last counter. A percentage tolerance evaluates the difference as a percentage of the previously sampled value.
An example of optimized collection is the Microsoft.Windows.Server.2008.LogicalDisk.FreeMB.Collection rule in the Windows Server 2008 Operating System (Monitoring) management pack. This rule collects the free space in MB on a logical disk every 5 minutes. Free disk space is a value that typically changes gradually, and under most conditions collecting it with this frequency would create an excess number of sampled counters with minimal value. Increasing the frequency of collection though would introduce the chance of missing those periods where a sudden change in the value did occur.
This rule uses optimized collection specifying an absolute value of 100 MB. This means that the counter is sampled every 5 minutes, but the value is only stored if it differs from the last stored value by 100 MB. This still lets it to perform its sampling at a fairly frequent rate but significantly reduces its storage requirements by reducing the number of unnecessary data points.