Alerts Management

Article
01/04/2010

Potential issue on Alerts management process

Many time Operators team manages problem regarding OpsMgr2007 Alerts views.

When an alert is raised, an operator acknowledges the alert, manages the issue, and closes the alert if not already done.

The following chapters explain why this can be an issue, and how to manage it.

Alert concept in OpsMgr2007

2 types of alert are managed in OpsMgr2007.

Ø Alert generated by Rule

Ø Alert generated by Monitor

Alerts generated by Rule

This type of alert is generated by an OpsMgr2007 rule which doesn’t affect the health of the target object (Alert Source).

In alert details windows, a link appears on the Alert Rule:

Alert generated by rule can be configured to consolidate if the problem is raised again. In this case the Repeat Count field is updated:

Alert context tab contains the last event which has been used to generate the alert:

Auto resolved alert process affects only alert generated by rule, if alert is still in state NEW:

Generally rules are used to collect information, as events or performance counters. This information is used for troubleshooting, analysis, capacity planning, reporting …

Rules are also sometime used for proactive monitoring, in this case the rule is configured to generate alert.

Converted management packs can also content rules which generate reactive alerts as MOM2005 didn’t have monitor concept.

This type of alert is auto resolved only by the OpsMgr2007 auto resolved process, if this one is still in resolution state “New”, or if the alert source is healthy.

Summary

Ø Health of target object is not affected

Ø Alert context contains the last event

Ø Auto resolved if the resolution state still NEW, or if alert source is healthy

Ø No auto resolved when issue is solved

Alert generated by Monitor

This type of alert is generated by an OpsMgr2007 Monitor which affects the health of the target object (Alert Source).

In alert details windows, a link appears on the Alert Monitor:

Repeat count field is never updated.

Alert is used as a notification when the monitor updates the heath state of an object to Healthy to Critical (or Warning).

IMPORTANT

If alert is closed manually, the Monitor heath state of related object is not updated to Healthy, and if the problem still occurs, the monitor will never generate a new alert.

Therefore an alert generated by a monitor, rather than a rule, should not have its alert closed manually but the alert should be managed by the health of the target object. If the health returns to healthy then the alert will automatically close.

Alert generated by a monitor is also closed by the OpsMgr2007 auto resolved process if the resolution state is still New, however the health state of the alert source is not updated.

The alert context tab contains the event which has been used to change the monitor health state and generate the alert:

Summary

Ø Never close manually an alert generated by a Monitor

Ø Manage the problem by using Health Explorer

Ø Alert is automatically closed if problem is solved and if monitor has received the configured healthy event

Ø Alert closed automatically by OpsMgr connector do not reset the monitor heath state.

Ø Alert is also closed by OpsMgr2007 auto resolved process if the resolution state is still New, however the health state of the alert source is not updated.

How to manage this behavior

As it is not possible to prevent an alert that has been generated by a monitor being closed by an operator and therefore not possible to ensure that the health state of the monitor has also been reset to healthy, I have developed two tools to manage this behavior:

Ø ResetMonitorFromAllClosedAlerts.exe

Ø ResetMonitorfromAlertId.exe

ResetMonitorFromAllClosedAlerts

This tool scans all closed monitor alerts and checks the state of the related monitor, and if the monitor is not healthy, the state is reset. At the next occurrence of the monitor after this has run, if the issue in question is still occurring then a new alert will be raised.

Also, to be sure that the scanned alert is the alert related to the last time the monitor state has changed, the tool will compare the time the alert was added and the last health state change value of the monitor. This needs to be less than 90 seconds, which is a reasonable indicator that this alert and health state change are related.

Alert.TimeAdded - (DateTime)monitoringObject.GetMonitoringStates(monitors)[0].LastTimeModified).TotalSeconds)) < 90

This tool can be launched from the command line on the RMS server.

Without option, the tool doesn’t reset any monitor, but shows all monitors that should be reset.

==== Reset Monitor!!!

Monitoring ObjectPath: OM2007R2.dom02.com

Alert Name: SPEC - Monitor Object from Syslog Event (critical/information)

Alert ResolvedBy: DOM02\Administrator

Alert TimeAdded: 13.07.2009 16:14:15

Monitor DisplayName: SPEC - Monitor Object from Syslog Event (critical/information)

Monitor HealthState: Error

Monitor LastTimeModified: 13.07.2009 16:14:15

With option –r, all detected monitor will be reset.

ResetMonitorFromAllClosedAlerts.exe –r

ResetMonitorfromAlertId

Using the same principle, this tool takes in argument of an alert ID, and if it is an alert raised by a monitor which has been subsequently closed, it checks the state of the related monitor, and if the monitor is not healthy, the state is reset.

This tool can be launched automatically by creating a notification channel.

The detail of this implementation is explained in chapters below.

How to implement “ResetMonitorfromAlertId” tool.

OpsMgr2007 SP1 - Create a notification to launch “ResetMonitorfromAlertId” tool when alert is closed.

Create a new Command notification Channel

Create a new subscriber

Create a new subscription

OpsMgr2007 R2 - Create a notification to launch “ResetMonitorfromAlertId” tool when alert is closed.

Create a new Command notification Channel

Create a new command notification channel

Create a new subscriber

Create a new subscription

Monitoring

Events created

Log Name: Operations Manager

Source: OpsMgr2007 ResetMonitorFromAlertId

Date: 13.07.2009 16:12:46

Event ID: 1000

Task Category: None

Level: Information

Keywords: Classic

User: N/A

Computer: OM2007R2.dom02.com

Description:

Start ResetMonitorFromAlertId

Event Xml:

<Channel>Operations Manager</Channel>

</System>

<Data>Start ResetMonitorFromAlertId</Data>

</EventData>

</Event>

Log Name: Operations Manager

Source: OpsMgr2007 ResetMonitorFromAlertId

Date: 13.07.2009 16:12:49

Event ID: 1000

Task Category: None

Level: Information

Keywords: Classic

User: N/A

Computer: OM2007R2.dom02.com

Description:

Manage Alert with GUID: 5fd07143-a3ac-4eb2-8897-b73b6a80fa6e

Event Xml:

<Channel>Operations Manager</Channel>

</System>

<Data>Manage Alert with GUID: 5fd07143-a3ac-4eb2-8897-b73b6a80fa6e</Data>

</EventData>

</Event>

Log Name: Operations Manager

Source: OpsMgr2007 ResetMonitorFromAlertId

Date: 13.07.2009 16:12:55

Event ID: 1010

Task Category: None

Level: Information

Keywords: Classic

User: N/A

Computer: OM2007R2.dom02.com

Description:

Monitor resets by ResetMonitorfromAlertId

MonitorDisplayName: SPEC - Monitor Object from Syslog Event (critical/information)

AlertName: SPEC - Monitor Object from Syslog Event (critical/information)

MonitoringObjectPath: OM2007R2.dom02.com

Event Xml:

<Channel>Operations Manager</Channel>

</System>

<Data>Monitor resets by ResetMonitorfromAlertId

MonitorDisplayName: SPEC - Monitor Object from Syslog Event (critical/information)

AlertName: SPEC - Monitor Object from Syslog Event (critical/information)

MonitoringObjectPath: OM2007R2.dom02.com</Data>

</EventData>

</Event>

A rule can be created to collect the following event from RMS.

Target	Root Management Server
Event Log	Operations Manager
Source	OpsMgr2007 ResetMonitorFromAlertId
Event ID	1010

Maximum number of asynchronous responses configuration on RMS Server

As it’s described in the following blog article the hardcoded limit is maximum 5 asynchronous responses in OpsMgr2007 SP1.

https://blogs.technet.com/cliveeastwood/archive/2008/04/16/some-more-command-notification-tricks-and-tips.aspx

So if more than 5 alerts are closed in the same time, the following event should appear in Operation Manager Event log on the RMS server.

Alerts “Script or Executable was Dropped”

“The process could not be created because the maximum number of asynchronous responses (5) has been reached, and it will be dropped. Command executed: ………”

The following Event should be controlled:

Event Log Windows (event collected by OpsMgr2007) :

Event Log: Operations Manager

EventI: 21410

Source: Health Service Modules

This limit has been removed in OpsMgr2007 R2, but for performance reason this limit can be set also as follow.

This limit can be modified by changing the following registry key:

“HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Modules\Global\Command Executer”

· Create Keys: Global\Command Executer

· Create a DWORD value called “AsyncProcessLimit" and set it between 1 and 100.

Outside of this key, it will default back to 5

This modification can affect the RMS performance, so it’s important to not increase too much this value, and to check the performance after modifying it.

Value can be set to 20, and then EventId 21410 can be controlled to see if it’s enough, or if the value should be increased.

ResetMonitor_Tools.zip

Comments

Anonymous
January 01, 2003
Hi Thierry, Much better now! ;-) Keep up the great work. Regards, Stefan Stranger

Partager via

Alerts Management

Potential issue on Alerts management process

Alert concept in OpsMgr2007

Alerts generated by Rule

Alert generated by Monitor

How to manage this behavior

ResetMonitorFromAllClosedAlerts

ResetMonitorfromAlertId

How to implement “ResetMonitorfromAlertId” tool.

OpsMgr2007 SP1 - Create a notification to launch “ResetMonitorfromAlertId” tool when alert is closed.

Create a new Command notification Channel

Create a new subscriber

Create a new subscription

OpsMgr2007 R2 - Create a notification to launch “ResetMonitorfromAlertId” tool when alert is closed.

Create a new Command notification Channel

Create a new subscriber

Create a new subscription

Monitoring

Maximum number of asynchronous responses configuration on RMS Server

Comments

Ressources supplémentaires