How to: Reset monitor when closing alert?
Not something I recommend, one would say I almost regret we did not prohibit closing alerts generated by monitor (especially when auto-resolve feature was used). But recently I learned about some ticketing systems closing alerts where it is unclear if issue was corrected, so I see some necessity to automate the action of resetting monitor health state to re-generate monitor state change when issue still present after ticket was closed.
Problem Description:
Again, as said, there may be legitimate situations where a customer needs to reset monitor health once an alert generated by its state change has been resolved. Such scenario would include automated ticketing systems resolving alerts without providing enough evidence that issue was indeed addressed, situation where operator resolves batch of alert without investigating their root cause (after network outage) and/or by mistake.
Recently I saw this type of request from multiple sources independently of each other so I decided to provide what I believe may be the only solution on how to achieve this functionality – OpsMgr connector.
Analyzing proposal:
OpsMgr connector is nice feature allowing subscribing to alert changes happening for members of specified group. It also allows reacting on such change, in our case by locating monitor associated with alert and requesting its state reset thru SDK call.
Note:
I will not discuss connector internals (registration, used subscription …) but will provide source code for possible reverse engineering of my implementation.
Solution:
Attached, you can find source code for my solution as well as binary you should copy into your RMS product folder. You need to initialize connector when you start it. Such action will import MP with group definition (if MP was not imported already), it creates connector and its subscription (again if such actions are necessary) and starts worker thread to receive monitor raised alerts.
You should see connector created (in Administration section of operations console) after successful initialization.
Currently connector uses group which is populated with instances of computer. This can be adjusted (steps described later) and you should be able to see all members after group calculation rule finishes (in Authoring section of the operations console).
Bellow is an example of alert raised by test event based monitor. When this alert is resolved, state of the monitor resets.
Customization:
As mentioned, this connector will respond to all alerts generated for any monitors which belong to the instance of Windows Computer. It is rather simple to customize the managed entity type you want to use though.
First export connector management pack:
Then edit management pack in XML editor of your choice. You need to change type used with relationship as well as group population rule:
After changes are saved and you imported your management pack, please restart connector application (initialization button will not overwrite changes to MP, but remove button will delete MP from OpsMgr when removing connector from your environment). You can see in the source code that worker thread starts in 3 minutes (to give group calculation time to populate group) and subscription uses 1 minute polling interval to retrieve all alerts as per subscription definition.
One more word of caution to be said is that connector like this may not be fully scalable in big environments and additional work could be needed. This post can serve as nice example and base stone for such more advanced application though.
DISCLAIMER :
Please evaluate in your test environment first! As expected, this solution is provided AS-IS, with no warranties and confers no rights. Use is subject to the terms specified at Microsoft. Future versions of this tool may be created based on time and requests.
https://msutara.members.winisp.net/Blog/Tools/ResetMonitorConnector/ResetMonitorConnector.zip
Comments
Anonymous
February 03, 2009
This tool is awesome. This was one of the most confusing parts of SCOM to explain to a new user. This should be built directly into R2. Nice work.Anonymous
March 11, 2010
Why do you choose to use a group of "Windows.Computer" versus checking every alert from a monitor that is closed? I've written a connector to our help desk system and I'm adding this functionality into my connector (on SCOM 2007 R2) but wasn't sure if there was a key reason I'm missing to only apply this to a single class of monitors.