App Insights: Metrics Outage (no-report) alerting rule

Robin Heller 1 Reputation point
2020-06-01T12:07:18.757+00:00

I use App Insights metrics to report the delay between an application event and it's processing. The processing time is, per definition, >1s since it uses a cron based scheduler. I've written a bash script to report the time difference to Azure App Insights. This is working fine so far. Now I've configured two alerts:

  • avg(time difference) of the last 5 minutes > 120
  • avg(time difference) of the last 5 minutes <= 1

The first alert is pretty obvious: catch a instance where my application is not processing the event correctly at all.
The second alert might need some more explanation: I want to catch the case in which my bash script is not reporting any data at all (i.e. system downtime, complete application crash…). In theory, the average value would drag down to 0 within 5 minutes after the application crash, thus triggering the alert and sending me an email.

This is not working at all: I can kill my bash script that transfers the data to the custom metrics API and not receive an alert at all (yes, I've waited the 5 minutes). If I manually (/from my bash script that is) report values of 0 for the time difference, the alert fires correctly. If I then change the script to report a value > 0, the alert is deactived properly as well. I have also tested this with avg(td) &lt; 0 (which is my preferred way of doing it), but that doesn't work either. Is this expected / documented behavior? It really doesn't make a whole lot of sense to me. Is there a better way to alert on this"non-reporting" of certain metrics?

Azure Monitor
Azure Monitor
An Azure service that is used to collect, analyze, and act on telemetry data from Azure and on-premises environments.
3,330 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. kobulloc-MSFT 26,351 Reputation points Microsoft Employee
    2020-06-08T03:28:05.037+00:00

    There are a couple features in Application Insights to be aware of if you are looking for a 1:1 mapping of activity and result instead of statistically relevant overviews. Sampling in Application Insights is one of the first things I would look at if you are not seeing specific events that you are expecting. You would also want to be aware of a 5-10 minute delay in the availability of data although that may not be important in your scenario. I would also take a quick look at other similar services, like Stream Analytics to see if they are more in line with your goals for this project.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.