Downtime, SLA, and outages workbook
This article introduces a simple way to calculate and report service-level agreement (SLA) for web tests through a single pane of glass across your Application Insights resources and Azure subscriptions. The downtime and outage report provides powerful prebuilt queries and data visualizations to enhance your understanding of your customer's connectivity, typical application response time, and experienced downtime.
The SLA workbook template is accessible through the workbook gallery in your Application Insights resource. Or, in the left pane, select Availability and then select SLA Report at the top of the screen.
The parameters set in the workbook influence the rest of your report.
App Insights Resources, and
Web Test: These parameters determine your high-level resource options. They're based on Log Analytics queries and are used in every report query.
Outage Window: You can use these parameters to determine your own criteria for a service outage. An example is the criteria for an App Insights Availability alert based on a failed location counter over a chosen period. The typical threshold is three locations over a five-minute window.
Maintenance Period: You can use this parameter to select your typical maintenance frequency.
Maintenance Windowis a datetime selector for an example maintenance period. All data that occurs during the identified period will be ignored in your results.
Availability Target %: This parameter specifies your target objective and takes custom values.
The overview page contains high-level information about your:
- Total SLA (excluding maintenance periods, if defined).
- End-to-end outage instances.
- Application downtime.
Outage instances are defined by when a test starts to fail until it's successful, based on your outage parameters. If a test starts failing at 8:00 AM and succeeds again at 10:00 AM, that entire period of data is considered the same outage.
You can also investigate the longest outage that occurred over your reporting period.
Some tests are linkable back to their Application Insights resource for further investigation. But that's only possible in the workspace-based Application Insights resource.
Downtime, outages, and failures
The Outages & Downtime tab has information on total outage instances and total downtime broken down by test. The Failures by Location tab has a geo-map of failed testing locations to help identify potential problem connection areas.
Edit the report
You can edit the report like any other Azure Monitor workbook. You can customize the queries or visualizations based on your team's needs.
The queries can all be run in Log Analytics and used in other reports or dashboards. Remove the parameter restriction and reuse the core query.
Access and sharing
The report can be shared with your teams and leadership or pinned to a dashboard for further use. The user needs to have read permission/access to the Application Insights resource where the actual workbook is stored.