Downtime, SLA, and outages workbook

This article introduces a simple way to calculate and report service-level agreement (SLA) for web tests through a single pane of glass across your Application Insights resources and Azure subscriptions. The downtime and outage report provides powerful prebuilt queries and data visualizations to enhance your understanding of your customer's connectivity, typical application response time, and experienced downtime.

The SLA workbook template is accessible through the workbook gallery in your Application Insights resource. Or, in the left pane, select Availability and then select SLA Report at the top of the screen. Screenshot that shows the Availability tab with SLA Report highlighted.

Screenshot of the workbook gallery with the Downtime & Outages workbook highlighted.

Parameter flexibility

The parameters set in the workbook influence the rest of your report.

 Screenshot that shows parameters.

  • Subscriptions, App Insights Resources, and Web Test: These parameters determine your high-level resource options. They're based on Log Analytics queries and are used in every report query.
  • Failure Threshold and Outage Window: You can use these parameters to determine your own criteria for a service outage. An example is the criteria for an App Insights Availability alert based on a failed location counter over a chosen period. The typical threshold is three locations over a five-minute window.
  • Maintenance Period: You can use this parameter to select your typical maintenance frequency. Maintenance Window is a datetime selector for an example maintenance period. All data that occurs during the identified period will be ignored in your results.
  • Availability Target %: This parameter specifies your target objective and takes custom values.

Overview page

The overview page contains high-level information about your:

  • Total SLA (excluding maintenance periods, if defined).
  • End-to-end outage instances.
  • Application downtime.

Outage instances are defined by when a test starts to fail until it's successful, based on your outage parameters. If a test starts failing at 8:00 AM and succeeds again at 10:00 AM, that entire period of data is considered the same outage.

 Screenshot that shows an overview page showing the Overview Table by Test.

You can also investigate the longest outage that occurred over your reporting period.

Some tests are linkable back to their Application Insights resource for further investigation. But that's only possible in the workspace-based Application Insights resource.

Downtime, outages, and failures

The Outages & Downtime tab has information on total outage instances and total downtime broken down by test. The Failures by Location tab has a geo-map of failed testing locations to help identify potential problem connection areas.

 Screenshot that shows the Outages & Downtime tab and the Failure by Location tab in the downtime and outages workbook.

Edit the report

You can edit the report like any other Azure Monitor workbook. You can customize the queries or visualizations based on your team's needs.

 Screenshot that shows selecting the Edit button to change the visualization to a pie chart.

Log Analytics

The queries can all be run in Log Analytics and used in other reports or dashboards. Remove the parameter restriction and reuse the core query.

 Screenshot that shows a log query.

Access and sharing

The report can be shared with your teams and leadership or pinned to a dashboard for further use. The user needs to have read permission/access to the Application Insights resource where the actual workbook is stored.

 Screenshot that shows the Share Template pane.

Next steps