Dela via


Alerts Management

Potential issue on Alerts management process

Many time Operators team manages problem regarding OpsMgr2007 Alerts views.

When an alert is raised, an operator acknowledges the alert, manages the issue, and closes the alert if not already done.

The following chapters explain why this can be an issue, and how to manage it.

Alert concept in OpsMgr2007

2 types of alert are managed in OpsMgr2007.

Ø Alert generated by Rule

Ø Alert generated by Monitor

Alerts generated by Rule

This type of alert is generated by an OpsMgr2007 rule which doesn’t affect the health of the target object (Alert Source).

In alert details windows, a link appears on the Alert Rule:

clip_image002

 

Alert generated by rule can be configured to consolidate if the problem is raised again. In this case the Repeat Count field is updated:

clip_image004

 

Alert context tab contains the last event which has been used to generate the alert:

clip_image006

 Auto resolved alert process affects only alert generated by rule, if alert is still in state NEW:

clip_image008

Generally rules are used to collect information, as events or performance counters. This information is used for troubleshooting, analysis, capacity planning, reporting …

Rules are also sometime used for proactive monitoring, in this case the rule is configured to generate alert.

Converted management packs can also content rules which generate reactive alerts as MOM2005 didn’t have monitor concept.

This type of alert is auto resolved only by the OpsMgr2007 auto resolved process, if this one is still in resolution state “New”, or if the alert source is healthy.

clip_image010

 

Summary

Ø Health of target object is not affected

Ø Alert context contains the last event

Ø Auto resolved if the resolution state still NEW, or if alert source is healthy

Ø No auto resolved when issue is solved

 

Alert generated by Monitor

This type of alert is generated by an OpsMgr2007 Monitor which affects the health of the target object (Alert Source).

In alert details windows, a link appears on the Alert Monitor:

clip_image012

Repeat count field is never updated.

Alert is used as a notification when the monitor updates the heath state of an object to Healthy to Critical (or Warning).

clip_image014

 

IMPORTANT

If alert is closed manually, the Monitor heath state of related object is not updated to Healthy, and if the problem still occurs, the monitor will never generate a new alert.

Therefore an alert generated by a monitor, rather than a rule, should not have its alert closed manually but the alert should be managed by the health of the target object. If the health returns to healthy then the alert will automatically close.

 

Alert generated by a monitor is also closed by the OpsMgr2007 auto resolved process if the resolution state is still New, however the health state of the alert source is not updated.

clip_image016 

 

The alert context tab contains the event which has been used to change the monitor health state and generate the alert:

clip_image018

 

Summary

Ø Never close manually an alert generated by a Monitor

Ø Manage the problem by using Health Explorer

Ø Alert is automatically closed if problem is solved and if monitor has received the configured healthy event

Ø Alert closed automatically by OpsMgr connector do not reset the monitor heath state.

Ø Alert is also closed by OpsMgr2007 auto resolved process if the resolution state is still New, however the health state of the alert source is not updated.

 

How to manage this behavior

As it is not possible to prevent an alert that has been generated by a monitor being closed by an operator and therefore not possible to ensure that the health state of the monitor has also been reset to healthy, I have developed two tools to manage this behavior:

Ø ResetMonitorFromAllClosedAlerts.exe

Ø ResetMonitorfromAlertId.exe

 

 

ResetMonitorFromAllClosedAlerts

This tool scans all closed monitor alerts and checks the state of the related monitor, and if the monitor is not healthy, the state is reset. At the next occurrence of the monitor after this has run, if the issue in question is still occurring then a new alert will be raised.

Also, to be sure that the scanned alert is the alert related to the last time the monitor state has changed, the tool will compare the time the alert was added and the last health state change value of the monitor. This needs to be less than 90 seconds, which is a reasonable indicator that this alert and health state change are related.

Alert.TimeAdded - (DateTime)monitoringObject.GetMonitoringStates(monitors)[0].LastTimeModified).TotalSeconds)) < 90

   

This tool can be launched from the command line on the RMS server.

Without option, the tool doesn’t reset any monitor, but shows all monitors that should be reset.

 

==== Reset Monitor!!!

Monitoring ObjectPath: OM2007R2.dom02.com

Alert Name: SPEC - Monitor Object from Syslog Event (critical/information)

Alert ResolvedBy: DOM02\Administrator

Alert TimeAdded: 13.07.2009 16:14:15

 

Monitor DisplayName: SPEC - Monitor Object from Syslog Event (critical/information)

Monitor HealthState: Error

Monitor LastTimeModified: 13.07.2009 16:14:15

 

clip_image020

With option –r, all detected monitor will be reset.

 ResetMonitorFromAllClosedAlerts.exe –r

clip_image022

 

clip_image024

 

ResetMonitorfromAlertId

Using the same principle, this tool takes in argument of an alert ID, and if it is an alert raised by a monitor which has been subsequently closed, it checks the state of the related monitor, and if the monitor is not healthy, the state is reset.

This tool can be launched automatically by creating a notification channel.

The detail of this implementation is explained in chapters below.

 

How to implement “ResetMonitorfromAlertId” tool.

 

OpsMgr2007 SP1 - Create a notification to launch “ResetMonitorfromAlertId” tool when alert is closed.

Create a new Command notification Channel

clip_image026 

clip_image028 

 

Create a new subscriber

 clip_image030

clip_image032 

clip_image034 

clip_image036 

clip_image038 

clip_image040 

 

Create a new subscription

clip_image042 

clip_image044 clip_image046

clip_image048 

clip_image050 

clip_image052 

clip_image054

clip_image056 

clip_image058 

 

OpsMgr2007 R2 - Create a notification to launch “ResetMonitorfromAlertId” tool when alert is closed.

Create a new Command notification Channel

Create a new command notification channel

clip_image060 

clip_image062 

clip_image064 

 

Create a new subscriber

clip_image066 

clip_image068 

clip_image070 

clip_image072

clip_image074 

clip_image076 

clip_image078 

clip_image080 

clip_image082 

 

Create a new subscription

clip_image084 

clip_image086

clip_image088 

clip_image090 

clip_image092 

clip_image094 

clip_image096 

clip_image098 

clip_image100 

clip_image102 

clip_image104 

clip_image106 

clip_image108 

 

Monitoring

Events created

Log Name: Operations Manager

Source: OpsMgr2007 ResetMonitorFromAlertId

Date: 13.07.2009 16:12:46

Event ID: 1000

Task Category: None

Level: Information

Keywords: Classic

User: N/A

Computer: OM2007R2.dom02.com

Description:

Start ResetMonitorFromAlertId

Event Xml:

<Event xmlns="https://schemas.microsoft.com/win/2004/08/events/event">

  <System>

    <Provider Name="OpsMgr2007 ResetMonitorFromAlertId" />

    <EventID Qualifiers="0">1000</EventID>

    <Level>4</Level>

    <Task>0</Task>

    <Keywords>0x80000000000000</Keywords>

    <TimeCreated SystemTime="2009-07-13T14:12:46.000Z" />

    <EventRecordID>444968</EventRecordID>

    <Channel>Operations Manager</Channel>

    <Computer>OM2007R2.dom02.com</Computer>

  <Security />

  </System>

  <EventData>

    <Data>Start ResetMonitorFromAlertId</Data>

  </EventData>

</Event>

clip_image110 

Log Name: Operations Manager

Source: OpsMgr2007 ResetMonitorFromAlertId

Date: 13.07.2009 16:12:49

Event ID: 1000

Task Category: None

Level: Information

Keywords: Classic

User: N/A

Computer: OM2007R2.dom02.com

Description:

Manage Alert with GUID: 5fd07143-a3ac-4eb2-8897-b73b6a80fa6e

Event Xml:

<Event xmlns="https://schemas.microsoft.com/win/2004/08/events/event">

  <System>

    <Provider Name="OpsMgr2007 ResetMonitorFromAlertId" />

    <EventID Qualifiers="0">1000</EventID>

    <Level>4</Level>

    <Task>0</Task>

    <Keywords>0x80000000000000</Keywords>

    <TimeCreated SystemTime="2009-07-13T14:12:49.000Z" />

    <EventRecordID>444970</EventRecordID>

    <Channel>Operations Manager</Channel>

    <Computer>OM2007R2.dom02.com</Computer>

    <Security />

  </System>

  <EventData>

    <Data>Manage Alert with GUID: 5fd07143-a3ac-4eb2-8897-b73b6a80fa6e</Data>

  </EventData>

</Event>

clip_image112 

Log Name: Operations Manager

Source: OpsMgr2007 ResetMonitorFromAlertId

Date: 13.07.2009 16:12:55

Event ID: 1010

Task Category: None

Level: Information

Keywords: Classic

User: N/A

Computer: OM2007R2.dom02.com

Description:

Monitor resets by ResetMonitorfromAlertId

 MonitorDisplayName: SPEC - Monitor Object from Syslog Event (critical/information)

 AlertName: SPEC - Monitor Object from Syslog Event (critical/information)

 MonitoringObjectPath: OM2007R2.dom02.com

Event Xml:

<Event xmlns="https://schemas.microsoft.com/win/2004/08/events/event">

  <System>

    <Provider Name="OpsMgr2007 ResetMonitorFromAlertId" />

    <EventID Qualifiers="0">1010</EventID>

    <Level>4</Level>

    <Task>0</Task>

    <Keywords>0x80000000000000</Keywords>

    <TimeCreated SystemTime="2009-07-13T14:12:55.000Z" />

    <EventRecordID>444971</EventRecordID>

    <Channel>Operations Manager</Channel>

    <Computer>OM2007R2.dom02.com</Computer>

    <Security />

  </System>

  <EventData>

    <Data>Monitor resets by ResetMonitorfromAlertId

 MonitorDisplayName: SPEC - Monitor Object from Syslog Event (critical/information)

 AlertName: SPEC - Monitor Object from Syslog Event (critical/information)

 MonitoringObjectPath: OM2007R2.dom02.com</Data>

  </EventData>

</Event>

clip_image114 

A rule can be created to collect the following event from RMS.

Target

Root Management Server

Event Log

Operations Manager

Source

OpsMgr2007 ResetMonitorFromAlertId

Event ID

1010

 

Maximum number of asynchronous responses configuration on RMS Server

As it’s described in the following blog article the hardcoded limit is maximum 5 asynchronous responses in OpsMgr2007 SP1.

https://blogs.technet.com/cliveeastwood/archive/2008/04/16/some-more-command-notification-tricks-and-tips.aspx

So if more than 5 alerts are closed in the same time, the following event should appear in Operation Manager Event log on the RMS server.

Alerts “Script or Executable was Dropped”

“The process could not be created because the maximum number of asynchronous responses (5) has been reached, and it will be dropped. Command executed: ………”

The following Event should be controlled:

Event Log Windows (event collected by OpsMgr2007) :

   Event Log: Operations Manager

   EventI: 21410

   Source: Health Service Modules

 

This limit has been removed in OpsMgr2007 R2, but for performance reason this limit can be set also as follow.

This limit can be modified by changing the following registry key:

HKLM\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Modules\Global\Command Executer”

· Create Keys: Global\Command Executer

· Create a DWORD value called “AsyncProcessLimit" and set it between 1 and 100.

Outside of this key, it will default back to 5

 

This modification can affect the RMS performance, so it’s important to not increase too much this value, and to check the performance after modifying it.

Value can be set to 20, and then EventId 21410 can be controlled to see if it’s enough, or if the value should be increased.

ResetMonitor_Tools.zip