I use App Insights metrics to report the delay between an application event and it's processing. The processing time is, per definition, >1s since it uses a cron based scheduler. I've written a bash script to report the time difference to Azure App Insights. This is working fine so far. Now I've configured two alerts:
avg(time difference) of the last 5 minutes > 120
avg(time difference) of the last 5 minutes <= 1
The first alert is pretty obvious: catch a instance where my application is not processing the event correctly at all.
The second alert might need some more explanation: I want to catch the case in which my bash script is not reporting any data at all (i.e. system downtime, complete application crash…). In theory, the average value would drag down to 0 within 5 minutes after the application crash, thus triggering the alert and sending me an email.
This is not working at all: I can kill my bash script that transfers the data to the custom metrics API and not receive an alert at all (yes, I've waited the 5 minutes). If I manually (/from my bash script that is) report values of 0 for the time difference, the alert fires correctly. If I then change the script to report a value > 0, the alert is deactived properly as well. I have also tested this with
avg(td) < 0 (which is my preferred way of doing it), but that doesn't work either. Is this expected / documented behavior? It really doesn't make a whole lot of sense to me. Is there a better way to alert on this"non-reporting" of certain metrics?