Reporting on SCOM Data: Piecing the data together (Part 2)

In my last blog post, Reporting on SCOM Data: Piecing the data together (Part 1), I discussed collecting host details from the SCOM DW db.  This provided a great foundation for correlating Health State and Alert data into the existing hierarchy that our customer wanted to see.  In this post, I'll focus on querying the Health State and Alert aggregates by host and updating our existing tables to enable reporting on that data.  This data is updated every 10 minutes in our reporting database.  Again, a huge shout out to Jimmy Harper for his help with the queries and imparting his extensive SCOM knowledge on me. :)

Here are the steps that we followed to populate the Health State and Alert counts for each host:

  1. Populate the Host_HealthLoad table with current Health State (from the SCOM OperationsManager db) for each host.
  2. Populate the Host_AlertCountLoad table with the current count of Warning and Critical Alerts (from the SCOM DW db) for each host.
  3. Update the Host_HealthLoad table with on Alert counts from the Host_AlertCountLoad table.
  4. Update the Host table with current Health State and Alert Counts from the Host_HealthLoad table.

First, we need our Host_HealthLoad table.  This is our loading/staging table for Health State & Alert Counts for each host.

 CREATE TABLE Host_HealthLoad (
    [HealthLoadId] [int] IDENTITY(1,1) NOT NULL,
    [DnsHostName] [nvarchar](255) NULL,
    [EntityHealth] [tinyint] NULL,
    [AvailabilityHealth] [tinyint] NULL,
    [ConfigurationHealth] [tinyint] NULL,
    [PerformanceHealth] [tinyint] NULL,
    [SecurityHealth] [tinyint] NULL,
    [WarningOpenAlertCount] [int] NULL,
    [CriticalOpenAlertCount] [int] NULL
)

To populate the Host_HealthLoad table with Health State data, we used the following query against the SCOM OperationsManager db in an SSIS package scheduled every 10 minutes (all other queries in this blog are tasks in the same SSIS package).  This query leverages the State table and MonitorView view to retrieve the current Health State of specific Monitor Categories.  In our case, we're looking for following monitor categories for each host: "Entity Health" (which is used to display Overall Health of the host), "Availability", "Configuration", "Performance", and "Security".  We're also setting WarningOpenAlertCount and CriticalOpenAlertCount equal to zero (0) in this query since we're populating them from the SCOM DW db (more on that in a bit).  It's also important to note that we're truncating the Host_HealthLoad table immediately before inserting this results of this query into it.

 SELECT
    DnsHostName = bme.DisplayName,
    EntityHealth = 
                    (SELECT TOP 1
                        s.HealthState
                    FROM
                        State s WITH (NOLOCK)
                    INNER JOIN
                        MonitorView mv WITH (NOLOCK)
                        ON mv.Id = s.MonitorId
                    WHERE
                        s.BaseManagedEntityId = bme.BaseManagedEntityId
                        AND mv.DisplayName = 'Entity Health'),
    AvailabilityHealth = 
                    (SELECT TOP 1
                        s.HealthState
                    FROM
                        State s WITH (NOLOCK)
                    INNER JOIN
                        MonitorView mv WITH (NOLOCK)
                        ON mv.Id = s.MonitorId
                    WHERE
                        s.BaseManagedEntityId = bme.BaseManagedEntityId
                        AND mv.DisplayName = 'Availability'),
    ConfigurationHealth = 
                    (SELECT TOP 1
                        s.HealthState
                    FROM
                        State s WITH (NOLOCK)
                    INNER JOIN
                        MonitorView mv WITH (NOLOCK)
                        ON mv.Id = s.MonitorId
                    WHERE
                        s.BaseManagedEntityId = bme.BaseManagedEntityId
                        AND mv.DisplayName = 'Configuration'),
    PerformanceHealth = 
                    (SELECT TOP 1
                        s.HealthState
                    FROM
                        State s WITH (NOLOCK)
                    INNER JOIN
                        MonitorView mv WITH (NOLOCK)
                        ON mv.Id = s.MonitorId
                    WHERE
                        s.BaseManagedEntityId = bme.BaseManagedEntityId
                        AND mv.DisplayName = 'Performance'),
    SecurityHealth = 
                    (SELECT TOP 1
                        s.HealthState
                    FROM
                        State s WITH (NOLOCK)
                    INNER JOIN
                        MonitorView mv WITH (NOLOCK)
                        ON mv.Id = s.MonitorId
                    WHERE
                        s.BaseManagedEntityId = bme.BaseManagedEntityId
                        AND mv.DisplayName = 'Security'),
    WarningOpenAlertCount = 0,
    CriticalOpenAlertCount = 0
FROM
    BaseManagedEntity bme WITH (NOLOCK)
INNER JOIN
    ManagedType mt WITH (NOLOCK)
    ON mt.ManagedTypeId = bme.BaseManagedTypeId
WHERE
    bme.IsDeleted = 0
    AND mt.TypeName = 'Microsoft.Windows.Computer'

Next, we'll need to populate our Host_AlertCountLoad table.  Here's the schema for that table:

 CREATE TABLE Host_AlertCountLoad (
    [AlertCountLoadId] [int] IDENTITY(1,1) NOT NULL,
    [DnsHostName] [nvarchar](255) NULL,
    [Severity] [tinyint] NULL,
    [AlertCount] [int] NULL
)

To improve efficiency of the query, we decided to UNION the results for Warning and Critical Alert counts that we're retrieving from the SCOM DW db.  This is also the reason that we update the Host_HealthLoad table with these counts after we've inserted them into the Host_AlertCountLoad table.  In addition, by breaking out the Health State data load from the Alert data load, we were able to support pulling these data sets from different data sources (e.g. - OperationsManager db & DW db).

This query leverages the vAlert, vAlertResolutionState, and vManagedEntity views.  The results are inserted into the Host_AlertCountLoad table.  Like the Host_HealthLoad table, it's important to note that we're truncating the Host_AlertCountLoad table immediately before inserting this results of this query into it.

 DECLARE @ParamDuration INT = 7 -- we only go back 7 days
SELECT
    DnsHostName = a.Path,
    Severity = 1,
    AlertCount = COUNT(a.AlertGuid)
FROM
    (SELECT
        me2.Path,
        a.AlertGuid,
        a.AlertName,
        a.Severity,
        a.DBCreatedDateTime
    FROM
        Alert.vAlert a WITH (NOLOCK)
    INNER JOIN
        (SELECT
            ars.AlertGuid,
            CurrentResolutionState = MAX(ars.ResolutionState)
        FROM
            Alert.vAlertResolutionState ars WITH (NOLOCK)
        GROUP BY
            ars.AlertGuid) crs
        ON crs.AlertGuid = a.AlertGuid
    INNER JOIN
        vManagedEntity me2 WITH (NOLOCK)
        ON me2.ManagedEntityRowId = a.ManagedEntityRowId
    WHERE
        a.Severity = 1
        AND a.DBCreatedDateTime > (GETUTCDATE() - @ParamDuration)
        AND crs.CurrentResolutionState = 0) a
GROUP BY
    a.Path
UNION
SELECT
    DnsHostName = a.Path,
    Severity = 2,
    AlertCount = COUNT(a.AlertGuid)
FROM
    (SELECT
        me2.Path,
        a.AlertGuid,
        a.AlertName,
        a.Severity,
        a.DBCreatedDateTime
    FROM
        Alert.vAlert a WITH (NOLOCK)
    INNER JOIN
        (SELECT
            ars.AlertGuid,
            CurrentResolutionState = MAX(ars.ResolutionState)
        FROM
            Alert.vAlertResolutionState ars WITH (NOLOCK)
        GROUP BY
            ars.AlertGuid) crs
        ON crs.AlertGuid = a.AlertGuid
    INNER JOIN
        vManagedEntity me2 WITH (NOLOCK)
        ON me2.ManagedEntityRowId = a.ManagedEntityRowId
    WHERE
        a.Severity = 2
        AND a.DBCreatedDateTime > (GETUTCDATE() - @ParamDuration)
        AND crs.CurrentResolutionState = 0) a
GROUP BY
    a.Path

Now that we have the Alert counts in the Host_AlertCountLoad table, we simply update the Host_HealthLoad table with those results.

 BEGIN TRAN
UPDATE
    hl
SET
    hl.WarningOpenAlertCount = 
        ISNULL((SELECT
            SUM(acl.AlertCount)
        FROM
            Host_AlertCountLoad acl
        WHERE
            acl.ServerName = hl.DnsHostName
            AND acl.Severity = 1), 0)
FROM
    Host_HealthLoad hl
    
UPDATE
    hl
SET
    hl.CriticalOpenAlertCount = 
        ISNULL((SELECT
            SUM(acl.AlertCount)
        FROM
            Host_AlertCountLoad acl
        WHERE
            acl.ServerName = hl.DnsHostName
            AND acl.Severity = 2), 0)
FROM
    Host_HealthLoad hl
COMMIT TRAN

With a loaded and updated Host_HealthLoad table, we simply update the Host table with our current values.

 UPDATE 
    h
SET
    h.EntityHealth = l.EntityHealth,
    h.AvailabilityHealth = l.AvailabilityHealth,
    h.ConfigurationHealth = l.ConfigurationHealth,
    h.PerformanceHealth = l.PerformanceHealth,
    h.SecurityHealth = l.SecurityHealth,
    h.WarningOpenAlertCount = l.WarningOpenAlertCount,
    h.CriticalOpenAlertCount = l.CriticalOpenAlertCount
FROM
    Host h WITH (NOLOCK)
INNER JOIN
    Host_HealthLoad l WITH (NOLOCK)
    ON l.DnsHostName = h.DnsHostName

We've now got the data to support a picture of SCOM Health State and Alert counts across the entire SCOM enterprise being updated every 10 minutes.  You can review my first blog post (Reporting on SCOM Data: A Single Pane of Glass) in this series for more details, but here are the types of visualizations that we created using this data.

Health State Map

Drill down for Health State by Monitor Category

Host-level details for Health State and Alert count, linking off to the SCOM Web Console for more in-depth Alert and Performance Monitoring details.

In summary, we've been able to provide a "Single Pane of Glass" for Health State and Alert data across our customer's enterprise.  These reports have been incredibly well received by all levels of the customer and has enabled us to increase our new work efforts with them.

I hope you've found these posts helpful and hope to hear from some of you that may have implemented some or all of the solution, along with any improvements that you've made.

Viel Spass!

Joseph