Why did Action Group Notifications for SQL_Database_Alerts not fire an alert for Transient errors (transient faults)?

Question

Why did Action Group Notifications for SQL_Database_Alerts not fire an alert for Transient errors (transient faults)?

Daniel-4204 65

For context: On Wednesday at 4:20am, one of our Azure SQL db's was unavailable for a few minutes. I was alerted to this when reviewing some Application Insight Exceptions that stated ('db-name-example' on server 'db-server-example' is not currently available. Please retry the connection later. If the problem persists, contact customer support, and provide them the session tracing ID of 'Example-ID')

Upon reviewing the portal Activity Log for this timeframe, I found "Health Event" for the db (severity=informational).
"details": "Your database was moved to a different machine to ensure it has the resources required for its compute size. This is an occasional transient operation. Currently, Azure shows the downtime for your SQL database resource at a two-minute granularity. The actual downtime may be less than that. Please also note the outage window may be shifted by around 5 minutes.",

Earlier this year, we encountered a more critical situation. As a response, I created Action Group Notification for SQL_Database_Alerts to get an SMS and an Email, supposedly when any db in a particular resource group is unavailable for any reason regardless of severity.

The elephant in the room here is to enable zone redundancy to avoid this issue... but the team would still like to receive the alert.

I have read up on Action Groups, Metric Alerts, and Transient connection errors... While it is not explicitly stated, I am reading in between the lines that transient connection errors do not seem to be considered abnormal.. just very very... and consequently will not trigger Action Group SQL_Database_Alerts.

Are my suspicions correct in assuming the above? Or is there a way to get notices for these DB transient errors?

In response, I am considering setting up a metric alert for failed connections to suffice in this situation. It will not tell me the db was down for transient connection errors... but it should alert the team for specific failed connections which when investigated would reveal the cause for that particular situation.

SSingh-MSFT 16,371 Reputation points Moderator

2023-12-08T05:26:30.4933333+00:00

Hi
Daniel-4204 •,

Welcome to Microsoft Q&A forum.

Thanks for letting the forum know and we await your result.

Thanks

1 answer

Your answer

SSingh-MSFT 16,371 Reputation points Moderator

2023-12-08T05:26:30.4933333+00:00

Hi
Daniel-4204 •,

Welcome to Microsoft Q&A forum.

Thanks for letting the forum know and we await your result.

Thanks

Answer 1

Daniel-4204 65

It appears that my Action Group is not Associated with an Action Rule per https://azure.microsoft.com/en-us/blog/get-notified-when-your-azure-resources-become-unavailable/

I am going to head over to Monitor and create a rule to use with the group. I believe this is the solution and will report back.

Share via

Why did Action Group Notifications for SQL_Database_Alerts not fire an alert for Transient errors (transient faults)?

1 answer

Your answer