Why did Action Group Notifications for SQL_Database_Alerts not fire an alert for Transient errors (transient faults)?

Daniel-4204 65 Reputation points
2023-12-07T17:00:01.0533333+00:00

For context: On Wednesday at 4:20am, one of our Azure SQL db's was unavailable for a few minutes. I was alerted to this when reviewing some Application Insight Exceptions that stated ('db-name-example' on server 'db-server-example' is not currently available.  Please retry the connection later.  If the problem persists, contact customer support, and provide them the session tracing ID of 'Example-ID')

Upon reviewing the portal Activity Log for this timeframe, I found "Health Event" for the db (severity=informational).
"details": "Your database was moved to a different machine to ensure it has the resources required for its compute size. This is an occasional transient operation.  Currently, Azure shows the downtime for your SQL database resource at a two-minute granularity. The actual downtime may be less than that. Please also note the outage window may be shifted by around 5 minutes.",

Earlier this year, we encountered a more critical situation. As a response, I created Action Group Notification for SQL_Database_Alerts to get an SMS and an Email, supposedly when any db in a particular resource group is unavailable for any reason regardless of severity.

The elephant in the room here is to enable zone redundancy to avoid this issue... but the team would still like to receive the alert.

I have read up on Action Groups, Metric Alerts, and Transient connection errors... While it is not explicitly stated, I am reading in between the lines that transient connection errors do not seem to be considered abnormal.. just very very... and consequently will not trigger Action Group SQL_Database_Alerts.

Are my suspicions correct in assuming the above? Or is there a way to get notices for these DB transient errors?

In response, I am considering setting up a metric alert for failed connections to suffice in this situation. It will not tell me the db was down for transient connection errors... but it should alert the team for specific failed connections which when investigated would reveal the cause for that particular situation.

Azure SQL Database
Azure
Azure
A cloud computing platform and infrastructure for building, deploying and managing applications and services through a worldwide network of Microsoft-managed datacenters.
1,409 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Daniel-4204 65 Reputation points
    2023-12-07T17:32:41.0133333+00:00

    It appears that my Action Group is not Associated with an Action Rule per https://azure.microsoft.com/en-us/blog/get-notified-when-your-azure-resources-become-unavailable/

    I am going to head over to Monitor and create a rule to use with the group. I believe this is the solution and will report back.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.