Traffic Manager metrics and alerts
Traffic Manager provides you with DNS-based load balancing that includes multiple routing methods and endpoint monitoring options. This article describes the metrics and associated alerts that are available to customers.
Metrics available in Traffic Manager
Traffic Manager provides the following metrics on a per profile basis that customers can use to understand their usage of Traffic manager and the status of their endpoints under that profile.
Queries by endpoint returned
Use this metric to view the number of queries that a Traffic Manager profile processes over a specified period. You can also view the same information at an endpoint level granularity that helps you understand how many times an endpoint was returned in the query responses from Traffic Manager.
In the following example, Figure 1 displays all the query responses that the Traffic Manager profile returns.
Figure 1: Aggregate view with all queries
Figure 2 displays the same information, however, it is split by endpoints. As a result, you can see the volume of query responses in which a specific endpoint was returned.
Figure 2: Split view with query volume shown per endpoint returned
Endpoint status by endpoint
Use this metric to understand the health status of the endpoints in the profile. It takes two values:
- use 1 if the endpoint is up.
- use 0 if the endpoint is down.
This metric can be shown either as an aggregate value representing the status of all the metrics (Figure 3), or, it can be split (see Figure 4) to show the status of specific endpoints. If the former, if the aggregation level is selected as Avg, the value of this metric is the arithmetic average of the status of all endpoints. For example, if a profile has two endpoints and only one is healthy, then this metric has a value of 0.50 as shown in Figure 3.
Figure 3: Composite view of endpoint status metric – “Avg” aggregation selected
Figure 4: Split view of endpoint status metrics
Alerts on Traffic Manager metrics
In addition to processing and displaying metrics from Traffic Manager, Azure Monitor enables customers to configure and receive alerts associated with these metrics. You can choose what conditions need to be met in these metrics for an alert to occur, how often those conditions need to be monitored, and how the alerts should be sent to you. For more information, see Azure Monitor alerts documentation.
Alert monitoring is important to ensure the system notifies when probes are down. Overly sensitive monitoring can be a distraction. Traffic Manager deploys multiple probes to increase resiliency. The threshold for probe statuses should be less than 0.5. If the average for up status falls below 0.5 (meaning less than 50% of probes are up) there should be an alert for an endpoint failure.
Multiple probes are deployed to increase resiliency. If one probe is down out of the many that are being sent, it doesn't necessarily reflect that the endpoint is down. The endpoint is only classified as down if the majority of probes returned are down.
The following configuration is an example of an alert setup.
For more information about probes and monitoring, see Traffic Manager endpoint monitoring.