Windows Automatic Services Monitoring using SCOM
Monitoring services in windows computers is available out of box in SCOM through Service Monitoring Template. But in a large enterprise with over 1000s of windows computers and 100s of applications, it is difficult to list out all services that needs to be monitored in each computer and create monitoring using template. Consider monitoring on average 30 services in 1000 computers would result on 30,000 instances added to SCOM DB. This will create numerous classes, discoveries and cause bloating of instance space which will make SCOM less responsive.
Also, we cannot create a monitor for each service and target it across all computers as each service may be present on bunch of computers and not on others. Thus targeting unanimously will result in false alarms and again, we may need 30+ windows service monitors targeted to all windows computers which will create overhead on agents and thus on the computers running the agent.
So, What is the solution?
Optimal solution would be creating a single rule to monitor all automatic services in each computer and alert on those which are not running. This can be accomplished using Powershell script with property bag output.
The rule runs on each computer at specific time interval, creates property bags for each service which is set to automatic but not running and an alert is generated for each property bag.
A catch to note in this monitoring scenario is not to alert on services that are stopped only for a moment. To overcome the issue, we will use consolidator condition. So only if the service is failed for ‘n’ consecutive samples, we will alert.
This solution, though optimal pose another challenge – What if we do not want to monitor a service which is set to automatic in one or few of computers.
This can be handled using a centrally located file with details of service and the computers to be excluded from monitoring.
We will see how to construct the Management Pack XML to accomplish this. You can also create MP using Visual Studio, MP Studio or Authoring Console.
Step 1:
Add references to the Management pack.
1 <ManagementPack ContentReadable="true" xmlns:xsd="https://www.w3.org/2001/XMLSchema" xmlns:xsl="https://www.w3.org/1999/XSL/Transform">
2 <Manifest>
3 <Identity>
4 <ID>GKLab.Windows.Automatic.Service.Monitoring</ID>
5 <Version>1.0.0.0</Version>
6 </Identity>
7 <Name>GKLab Windows Automatic Service Monitoring</Name>
8 <References>
9 <Reference Alias="SC">
10 <ID>Microsoft.SystemCenter.Library</ID>
11 <Version>6.1.7221.0</Version>
12 <PublicKeyToken>31bf3856ad364e35</PublicKeyToken>
13 </Reference>
14 <Reference Alias="Windows">
15 <ID>Microsoft.Windows.Library</ID>
16 <Version>6.1.7221.0</Version>
17 <PublicKeyToken>31bf3856ad364e35</PublicKeyToken>
18 </Reference>
19 <Reference Alias="Health">
20 <ID>System.Health.Library</ID>
21 <Version>6.1.7221.0</Version>
22 <PublicKeyToken>31bf3856ad364e35</PublicKeyToken>
23 </Reference>
24 <Reference Alias="System">
25 <ID>System.Library</ID>
26 <Version>6.1.7221.0</Version>
27 <PublicKeyToken>31bf3856ad364e35</PublicKeyToken>
28 </Reference>
29 <Reference Alias="Performance">
30 <ID>System.Performance.Library</ID>
31 <Version>6.1.7221.0</Version>
32 <PublicKeyToken>31bf3856ad364e35</PublicKeyToken>
33 </Reference>
34 </References>
35 </Manifest>
Step 2:
Now create a Powershell property bag probe script. The Powershell script fetches list for all services that are set to start automatic and checks for the current status. For each service that are set to Automatic but not running, a property bag is created.
To exclude some services from being monitored, a centrally located CSV file is used and the path of file is passed as parameter to the script. The script reads list of services to be excluded from monitoring from CSV file and compares it with the list of services in the target computer. The property bag for excludes services are not created.
1 param (
2 [string] $excludeservicelist
3 )
4 if (test-path $excludeservicelist) {
5 write-eventlog -logname "Operations Manager" -Source "Health Service Script" -EventID 776 -Message "WindowsAutomaticServiceMonitoring.ps1 - Accessing Exclusion List CSV" -EntryType Information
6 $contents = Import-Csv $excludeservicelist
7 }
8 $TargetComputer = hostname
9 $api = New-Object -comObject 'MOM.ScriptAPI'
10 $auto_services = Get-WmiObject -Class Win32_Service -Filter "StartMode='Auto'"
11 foreach ($service in $auto_services)
12 {
13 $isExcluded = 0
14 $state = $service.state
15 $name = $service.DisplayName
16 If ($Contents){
17 $contents | ForEach-Object{
18 $ExcludeServiceDisplayName = $_.ServiceToExclude
19 $ExcludeComputerName = $_.ComputersToExclude
20 if (($name -match $ExcludeServiceDisplayName) -and (($TargetComputer -match $ExcludeComputerName) -or ($ExcludeComputerName -match "ALL_COMPUTERS"))){
21 $isExcluded = 1
22 #write-eventlog -logname "Operations Manager" -Source "Health Service Script" -EventID 777 -Message "WindowsAutomaticServiceMonitoring.ps1 - Excluded Service Name - $ExcludeServiceDisplayName, Excluded Computer Name - $ExcludeComputerName" -EntryType Information
23 }
24 }
25 }
26 if (($isExcluded -eq 0) -and ($state -eq "Stopped")){
27 #write-eventlog -logname "Operations Manager" -Source "Health Service Script" -EventID 778 -Message "WindowsAutomaticServiceMonitoring.ps1 - Windows Service set to Automatic but Not Running - $name" -EntryType Information
28 $bag = $api.CreatePropertyBag()
29 $bag.AddValue("ServiceName", $name)
30 $bag.AddValue("Status", $state)
31 $bag
32 }
33 }
Step 3:
Create a data source module incorporating the above written Powershell script. We will use consolidator condition as discussed in solution part to alert only on valid service failures.
1 <TypeDefinitions>
2 <ModuleTypes>
3 <DataSourceModuleType ID="GKLab.Windows.Auto.Service.Monitoring.DataSource" Accessibility="Internal" Batching="false">
4 <Configuration>
5 <xsd:element minOccurs="1" name="ExcludeServiceList" type="xsd:string" />
6 <xsd:element minOccurs="1" name="IntervalSeconds" type="xsd:integer" />
7 <xsd:element minOccurs="1" name="ConsolidationInterval" type="xsd:integer" />
8 <xsd:element minOccurs="1" name="Count" type="xsd:integer" />
9 </Configuration>
10 <OverrideableParameters>
11 <OverrideableParameter ID="IntervalSeconds" Selector="$Config/IntervalSeconds$" ParameterType="int" />
12 <OverrideableParameter ID="Count" Selector="$Config/Count$" ParameterType="int" />
13 <OverrideableParameter ID="ConsolidationInterval" Selector="$Config/ConsolidationInterval$" ParameterType="int" />
14 </OverrideableParameters>
15 <ModuleImplementation Isolation="Any">
16 <Composite>
17 <MemberModules>
18 <DataSource ID="Trigger" TypeID="System!System.SimpleScheduler">
19 <IntervalSeconds>$Config/IntervalSeconds$</IntervalSeconds>
20 <SyncTime>00:00</SyncTime>
21 </DataSource>
22 <ProbeAction ID="Probe" TypeID="Windows!Microsoft.Windows.PowerShellPropertyBagProbe">
23 <ScriptName>WindowsAutomaticServicesMonitoring.ps1</ScriptName>
24 <ScriptBody><![CDATA[
25 param (
26 [string] $excludeservicelist
27 )
28 if (test-path $excludeservicelist) {
29 write-eventlog -logname "Operations Manager" -Source "Health Service Script" -EventID 776 -Message "WindowsAutomaticServiceMonitoring.ps1 - Accessing Exclusion List CSV" -EntryType Information
30 $contents = Import-Csv $excludeservicelist
31 }
32 $TargetComputer = hostname
33 $api = New-Object -comObject 'MOM.ScriptAPI'
34 $auto_services = Get-WmiObject -Class Win32_Service -Filter "StartMode='Auto'"
35 foreach ($service in $auto_services)
36 {
37 $isExcluded = 0
38 $state = $service.state
39 $name = $service.DisplayName
40 If ($Contents){
41 $contents | ForEach-Object{
42 $ExcludeServiceDisplayName = $_.ServiceToExclude
43 $ExcludeComputerName = $_.ComputersToExclude
44 if (($name -match $ExcludeServiceDisplayName) -and (($TargetComputer -match $ExcludeComputerName) -or ($ExcludeComputerName -match "ALL_COMPUTERS"))){
45 $isExcluded = 1
46 write-eventlog -logname "Operations Manager" -Source "Health Service Script" -EventID 777 -Message "WindowsAutomaticServiceMonitoring.ps1 - Excluded Service Name - $ExcludeServiceDisplayName, Excluded Computer Name - $ExcludeComputerName" -EntryType Information
47 }
48 }
49 }
50 if (($isExcluded -eq 0) -and ($state -eq "Stopped")){
51 write-eventlog -logname "Operations Manager" -Source "Health Service Script" -EventID 778 -Message "WindowsAutomaticServiceMonitoring.ps1 - Windows Service set to Automatic but Not Running - $name" -EntryType Information
52 $bag = $api.CreatePropertyBag()
53 $bag.AddValue("ServiceName", $name)
54 $bag.AddValue("Status", $state)
55 $bag
56 }
57 }
58 ]]></ScriptBody>
59 <Parameters>
60 <Parameter>
61 <Name>ExcludeServiceList</Name>
62 <Value>$Config/ExcludeServiceList$</Value>
63 </Parameter>
64 </Parameters>
65 <TimeoutSeconds>300</TimeoutSeconds>
66 </ProbeAction>
67 <ConditionDetection ID="Consolidator" TypeID="System!System.ConsolidatorCondition">
68 <Consolidator>
69 <ConsolidationProperties>
70 <PropertyXPathQuery>$Target/Property[Type="Windows!Microsoft.Windows.Computer"]/PrincipalName$</PropertyXPathQuery>
71 <PropertyXPathQuery>Property[@Name='ServiceName']</PropertyXPathQuery>
72 </ConsolidationProperties>
73 <TimeControl>
74 <WithinTimeSchedule>
75 <Interval>$Config/ConsolidationInterval$</Interval>
76 </WithinTimeSchedule>
77 </TimeControl>
78 <CountingCondition>
79 <Count>$Config/Count$</Count>
80 <CountMode>OnNewItemTestOutputRestart_OnTimerSlideByOne</CountMode>
81 </CountingCondition>
82 </Consolidator>
83 </ConditionDetection>
84 </MemberModules>
85 <Composition>
86 <Node ID="Consolidator">
87 <Node ID="Probe">
88 <Node ID="Trigger" />
89 </Node>
90 </Node>
91 </Composition>
92 </Composite>
93 </ModuleImplementation>
94 <OutputType>System!System.ConsolidatorData</OutputType>
95 </DataSourceModuleType>
96 </ModuleTypes>
97 </TypeDefinitions>
Step 4:
Next we will create a rule using the data source. Below configuration needs to be customized according to the need.
ExcludeServiceList – the UNC path for the excluded services list file (in CSV format). Sample CSV provided below.
CSV has two headers- “ServiceToExclude” which is display name of service.
ComputersToExclude – NetBIOS Name of computer. If two or more computers, it can be specified as individual entry or using regular expression syntax. If need to exclude in all computers, the value should be “ALL_Computers”
1 ServiceToExclude,ComputersToExclude
2 Distributed Transaction Coordinator,SCOM2012R2
3 Windows Audio,Win2k12-DC
4 Remote Registry,ALL_Computers
5 Software Protection,SCOM2012R2|Win2k12-DC
IntervalSeconds – Polling Interval in Seconds
Count – Number of polls, the service should fail to alert. (Minimum 2)
ConsolidationInterval – The interval time within which the service status fails ‘n’ number of times to generate alert. (Minimum value = (n-1) * IntervalSeconds where n = count)
1 <Monitoring>
2 <Rules>
3 <Rule ID="GKLab.Windows.AutomaticService.Monitoring.Rule" Enabled="true" Target="Windows!Microsoft.Windows.Computer" ConfirmDelivery="true" Remotable="true" Priority="Normal" DiscardLevel="100">
4 <Category>Alert</Category>
5 <DataSources>
6 <DataSource ID="DS" TypeID="GKLab.Windows.Auto.Service.Monitoring.DataSource">
7 <ExcludeServiceList>\\SCOM2012R2\Configs\WindowsAutomaticServiceMonitoringExclusionList.csv</ExcludeServiceList>
8 <IntervalSeconds>300</IntervalSeconds>
9 <ConsolidationInterval>600</ConsolidationInterval>
10 <Count>2</Count>
11 </DataSource>
12 </DataSources>
13 <WriteActions>
14 <WriteAction ID="Alert" TypeID="Health!System.Health.GenerateAlert">
15 <Priority>1</Priority>
16 <Severity>2</Severity>
17 <AlertMessageId>$MPElement[Name="GKLab.Windows.AutomaticService.Monitoring.Rule.AlertMessage"]$</AlertMessageId>
18 <AlertParameters>
19 <AlertParameter1>$Data/Context/DataItem/Property[@Name='ServiceName']$</AlertParameter1>
20 </AlertParameters>
21 <Suppression>
22 <SuppressionValue>$Target/Property[Type="Windows!Microsoft.Windows.Computer"]/PrincipalName$</SuppressionValue>
23 <SuppressionValue>$Data/Context/DataItem/Property[@Name='ServiceName']$</SuppressionValue>
24 </Suppression>
25 </WriteAction>
26 </WriteActions>
27 </Rule>
28 </Rules>
29 </Monitoring>
Step 5:
Final step is to construct XML for presentation and language packs. Ensure the close the <ManagementPack> tag.
1 <Presentation>
2 <StringResources>
3 <StringResource ID="GKLab.Windows.AutomaticService.Monitoring.Rule.AlertMessage" />
4 </StringResources>
5 </Presentation>
6 <LanguagePacks>
7 <LanguagePack ID="ENU" IsDefault="true">
8 <DisplayStrings>
9 <DisplayString ElementID="GKLab.Windows.Automatic.Service.Monitoring">
10 <Name>GKLab Windows Automatic Service Monitoring</Name>
11 <Description>GKLab Windows Automatic Service Monitoring Management Pack</Description>
12 </DisplayString>
13 <DisplayString ElementID="GKLab.Windows.Auto.Service.Monitoring.DataSource">
14 <Name>GKLab Windows Automatic Service Monitoring Data Source</Name>
15 <Description>GKLab Windows Automatic Service Monitoring Data Source</Description>
16 </DisplayString>
17 <DisplayString ElementID="GKLab.Windows.AutomaticService.Monitoring.Rule">
18 <Name>Windows Automatic Services Monitoring Rule</Name>
19 <Description>Windows Automatic Services Monitoring Rule</Description>
20 </DisplayString>
21 <DisplayString ElementID="GKLab.Windows.AutomaticService.Monitoring.Rule" SubElementID="Alert">
22 <Name>Alert</Name>
23 </DisplayString>
24 <DisplayString ElementID="GKLab.Windows.AutomaticService.Monitoring.Rule" SubElementID="DS">
25 <Name>GKLab Windows Automatic Service Monitoring Data Source</Name>
26 </DisplayString>
27 <DisplayString ElementID="GKLab.Windows.AutomaticService.Monitoring.Rule.AlertMessage">
28 <Name>Windows Automatic Services Monitoring Alert</Name>
29 <Description>Windows Service {0} is set to auto-start but is currently not running.</Description>
30 </DisplayString>
31 </DisplayStrings>
32 </LanguagePack>
33 </LanguagePacks>
34 </ManagementPack>
Step 7:
Deploy the MP in lab and check for alerts.
I have attached copy of XML which you can import in to any authoring tool. Customize as per your needs and have fun.
Happy SCOMing…