Share via


View your Agent Health in OMS

Summary: Learn how to view and monitor the health of your Windows and Linux agents in OMS.

Hi everyone, Nini here, and today I’m happy to announce that you are now able to view and monitor the health of your Windows and Linux agents in OMS. With agent heartbeat events, you can gain insights into the performance, availability and geo-location of your agents. If you have SCOM agents or direct agents connected to OMS, we will automatically start to generate and collect these agent heartbeat events. Best of all, they do not count towards your data volume.

First, go into Log Search and input Type=Heartbeat:

Screenshot of results when you type “Type=Heartbeat” in Log Search.

Now that you can see a list of all computers and servers with Windows and Linux agents installed, you can view the following fields:

Table of fields for agents.
Visualize Heartbeat Events

Using the View Designer, you can create an Agent Health solution to visualize these results. You can dive into key counters that you care about, such as the distribution of your agents and agent heartbeat overtime.

Be sure to enable the View Designer feature in “Preview Features” below, before following the next steps.

Screenshot of the Preview Features tab where you enable the View Designer feature.

Next, open the View Designer, and populate each blade by using the following queries. In this example, I’m populating my Overview tile to see the total number of agents in my environment and the Agent Count over Time blade to view a trend of my number of agents over time.

Agent Health solution tile.

For the Overview tile, I use these queries to get the total number of agents, and a count of unresponsive agents in the last 24 hours:

  • Type=Heartbeat | distinct Computer
  • Type=Heartbeat | measure max(TimeGenerated) as LastCall by Computer | where LastCall < NOW-24HOURS

In the Agent Count over Time blade, I use this query to get a trend of my number of agents over time:

Type=Heartbeat | measure countdistinct(Computer) by OSType

I use this query to get a list of all my computers in the “All Agents” list right below the trend:

Type=Heartbeat | Distinct Computer

Agent Health Overview.

For the other blades in my Agent Health solution, I use the following queries:

Queries for other blades in my Agent Health solution.

Screenshots of the completed solution:

Agent health overview for the completed solution.

 

Agent health overview for the completed solution.

And voila! You’re can now get a quick sense of your agent health and set up from a glance at this dashboard.

Create Alerts to track Agent heartbeating

In addition to visualizing heartbeat events, you can also create alert rules so that you get notified when an agent has stopped heart beating for 24 hours or longer. You can simply do this by creating a log search and then selecting the Alert button to create and configure the alert.

Using this log search, Type=Heartbeat | measure max(TimeGenerated) as LastCall by Computer | where LastCall < NOW-24HOURS, here’s what the alert rule should look like:

Add alert rule options.

Click save, and your alert should start running immediately. To read more about how to set up alerts in Log Analytics, see Alerts in Log Analytics.

What's next?

We are currently working on an Agent Health OMS solution so that you can have a more in-depth and insightful visualization in OMS.

That’s all I have for you today. If you have any feedback or questions, please comment below.

Priscilla Nini Ikhena
Microsoft Operations Management Team

Comments

  • Anonymous
    August 18, 2016
    Why is so much development effort going into OMS, which nobody uses, and almost none going into scom?
    • Anonymous
      September 06, 2016
      Hi Timothy,I don't agree with your comment. We need wide OMS functionality and OMS vs SCOM comparison should stop.If no effort goes into further developing OMS no one will ever use it an it won't have a chance.In Certain scenarios OMS is a better option then SCOM .It all depends on the situation(business).Don get me wrong, I'll take SCOM any time any place over competitors, but when it comes to cloud monitoring(SaaS to IaaS monitoring) OMS should be one of the basic tools to use.I agree that OMS misses certain basic tooling which is commodity in SCOM. Once again the twoo should not be directly compared.Did you look at the SCOM 2016 roadmap, it looks pretty decent and I believe that community had enough input 9as it should).
  • Anonymous
    September 15, 2016
    Am seeing incorrect values show up for "ComputerIP" field.Any thoughts?
  • Anonymous
    September 30, 2016
    What do you mean when you say "IsGatewayInstalled: true if a Gateway is installed"? Are you talking about SCOM Gateway or OMS Gateway?.I have installed an OMS Gateway (OMS Agent + OMS Gateway) and this property is marked as "false". Is this behaviour normal??? Apparently all is working well. One direct agent is reporting to OMS Gateway and OMS Gateway is reporting to OMS Portal. Moreover "Count of Gateways Installed" view in dashboard is 0.
    • Anonymous
      October 06, 2016
      IsGatewayInstalled value indicates whether the OMS Gateway is installed on that particular machine. Are you seeing the value of this as "false" for heartbeats from that machine ?
  • Anonymous
    December 08, 2016
    Hi, thank you for the really helpful article but unfortunately I cannot confirm it, the alert does not work. I shut down one of my Azure VM and it still listed with actual data under column timegenerated when I use: Type=Heartbeat Computer="ComputerName". I dont know what LastCall is but in my mind this is the wrong parameter to search for, because the call runs against the VM every few minutes, although its turned off. Is there somethink like LastAnswer or anything else? I adapt your query to test to Type=Heartbeat | measure max(TimeGenerated) as LastCall by Computer | where LastCall < NOW-1MINUTES and never get an alert of the turned off machine. (machine is successfully connected to OMS via MMA). Any hints? Thank you
    • Anonymous
      December 08, 2016
      I know LastCall = measure max(TimeGenerated).
  • Anonymous
    February 10, 2017
    The comment has been removed
  • Anonymous
    April 27, 2017
    Hello Priscilla,Thank you for adding agent health into OMS , this will help identify computers that are unresponsive. However I did have a question in regards to generating alerts. If a device becomes unresponsive an alert should trigger specifying that particular device. The current monitor will generate an alert for all devices if only a single computer becomes unresponsive. Is this the designed behavior, if so then an alert will need to be created per device unless i'm missing something.Kind Regards
  • Anonymous
    May 12, 2017
    Why is the default 24 hours? Wouldn't you want to know your server went down in like 5-15 minutes? Also, how do you change the default dashboard from 24HOURS to something reasonable?
  • Anonymous
    May 25, 2017
    Once a systems loss heartbeat how you restore Heartbeat collection? The system never went down or disconnected but OMS identified as if it was. When you check the Overview tile every day there are less and less agents accounted and running. (first day 88, 2nd day 87, 3rd day 84, 4th day 80 in the morning and 79 on the evening) is this normal behavior for OMS? and all server are up an running giving services to the clients.