Best practices for Azure Monitor alerts
This article provides architectural best practices for Azure Monitor alerts, alert processing rules, and action groups. The guidance is based on the five pillars of architecture excellence described in Azure Well-Architected Framework.
In the cloud, we acknowledge that failures happen. Instead of trying to prevent failures altogether, the goal is to minimize the effects of a single failing component. Use the following information to minimize failure of your Azure Monitor alert rule components.
Azure Monitor alerts offer a high degree of reliability without any design decisions. Conditions where a temporary loss of alert data loss may occur are often mitigated by features of other Azure Monitor components.
- Configure service health alert rules.
- Configure resource health alert rules.
- Avoid service limits for alert rules that produce large scale notifications.
|Configure service health alert rules.||Service health alerts send you notifications for outages, service disruptions, planned maintenance and security advisories. See Create or edit an alert rule.|
|Configure resource health alert rules.||Resource Health alerts can notify you in near real-time when these resources have a change in their health status. See Create or edit an alert rule.|
|Avoid service limits for alert rules that produce large scale notifications.||If you have alert rules that would send a large number of notifications, you may reach your service limits for the service you use to send email or SMS notifications. Configure programmatic actions or choose an alternate notification method or provider to handle large scale notifications. See Service limits for notifications.|
Security is one of the most important aspects of any architecture. Azure Monitor provides features to employ both the principle of least privilege and defense-in-depth. Use the following information to maximize the security of Azure Monitor alerts.
- Use customer managed keys if you need your own encryption key to protect data and saved queries in your workspaces
- Use managed identities to increase security by controlling permissions
- Assign the monitoring reader role for all users who don’t need configuration privileges
- Use secure webhook actions
- When using action groups that use private links, use Event hub actions
|Use customer managed keys if you need your own encryption key to protect data and saved queries in your workspaces.||Azure Monitor ensures that all data and saved queries are encrypted at rest using Microsoft-managed keys (MMK). If you require your own encryption key and collect enough data for a dedicated cluster, use customer-managed keys for greater flexibility and key lifecycle control. If you use Microsoft Sentinel, then make sure that you're familiar with the considerations at Set up Microsoft Sentinel customer-managed key.|
|To control permissions for log alert rules, use managed identities for your log alert rules.||A common challenge for developers is the management of secrets, credentials, certificates, and keys used to secure communication between services. Managed identities eliminate the need for developers to manage these credentials. Setting a managed identity for your log alert rules gives you control and visibility into the exact permissions of your alert rule. At any time, you can view your rule’s query permissions and add or remove permissions directly from its managed identity. In addition, using a managed identity is required if your rule’s query is accessing Azure Data Explorer (ADX) or Azure Resource Graph (ARG). See Managed identities.|
|Assign the monitoring reader role for all users who don’t need configuration privileges.||Enhance security by giving users the least amount of privileges required for their role. See Roles, permissions, and security in Azure Monitor.|
|Where possible, use secure webhook actions.||If your alert rule contains an action group that uses webhook actions, prefer using secure webhook actions for additional authentication. See Configure authentication for Secure webhook|
|When using action groups that use private links, use Event hub actions||When using private links in Azure, use Event hub actions for alerts. Due to the increased security for private links, event hub actions are the only actions supported by private links.|
Cost optimization refers to ways to reduce unnecessary expenses and improve operational efficiencies. You can significantly reduce your cost for Azure Monitor by understanding your different configuration options and opportunities to reduce the amount of data that it collects. See Azure Monitor cost and usage to understand the different ways that Azure Monitor charges and how to view your monthly bill.
See Optimize costs in Azure Monitor for cost optimization recommendations across all features of Azure Monitor.
- Activity log alerts, service health alerts, and resource health alerts are free of charge.
- When using log alerts, minimize log alert frequency.
- When using metric alerts, minimize the number of resources being monitored.
|Keep in mind that activity log alerts, service health alerts, and resource health alerts are free of charge.||Azure Monitor activity alerts, service health alerts and resource health alerts are free. If what you want to monitor can be achieved with these alert types, use them.|
|When using log alerts, minimize log alert frequency.||When configuring log alerts, keep in mind that the more frequent the rule evaluation, the higher the cost. Configure your rules accordingly.|
|When using metric alerts, minimize the number of resources being monitored.||Some resource types support metric alert rules that can monitor multiple resources of the same type. For these resource types, keep in mind that the rule can become expensive if the rule monitors many resources. To reduce costs, you can either reduce the scope of the metric alert rule or use log alert rules, which are less expensive to monitor a large number of resources.|
Operational excellence refers to operations processes required keep a service running reliably in production. Use the following information to minimize the operational requirements for supporting Azure Monitor alerts.
- Use dynamic thresholds in metric alert rules where appropriate.
- Whenever possible, use one alert rule to monitor multiple resources.
- To control behavior at scale, use alert processing rules.
- Leverage custom properties to enhance diagnostics
- Leverage Logic Apps to customize, enrich, and integrate with a variety of systems
|Use dynamic thresholds in metric alert rules where appropriate.||Dynamic thresholds are use machine learning to determine the correct threshold, so you don't need to know the correct threshold to configure. Dynamic thresholds are also useful for rules that monitor multiple resources, and a single threshold can't be configured for all of the resources. See Dynamic thresholds in metric alerts.|
|Whenever possible, use one alert rule to monitor multiple resources.||Using alert rules that monitor multiple resources reduces management overhead, by allowing you to manage one rule to monitor a large number of resources.|
|To control behavior at scale, use alert processing rules.||Alert processing rules can be used to reduce the number of alert rules you need to create and manage.|
|Use custom properties to enhance diagnostics.||If the alert rule uses action groups, you can add your own properties to include in the alert notification payload. You can use these properties in the actions called by the action group, such as webhook, Azure function or logic app actions.|
|Use Logic Apps to customize the notification workflow and integrate with various systems.||You can use Azure Logic Apps to build and customize workflows for integration. Use Logic Apps to customize your alert notifications. You can:
- Customize the alerts email by using your own email subject and body format.
- Customize the alert metadata by looking up tags for affected resources or fetching a log query search result.
- Integrate with external services by using existing connectors like Outlook, Microsoft Teams, Slack, and PagerDuty. You can also configure the logic app for your own services.
Performance efficiency is the ability of your workload to scale to meet the demands placed on it by users in an efficient manner. Alerts offer a high degree of performance efficiency without any design decisions.