Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Azure Monitor issues and investigations (preview) are new AIOps capabilities that automate the troubleshooting processes for Azure monitor alerts.
This article explains what Azure Monitor issues and investigations (preview) are and how they are used to triage and mitigate problems with an Azure resource.
Note
For preview, the only alert supported is an Application Insights resource alert.
What is an issue?
An issue is a holistic view of service-related problems providing a structured framework for managing incidents. It uses AI for automated analysis and diagnostic processes to deliver high-quality insights using all observability-related data for fast and accurate troubleshooting service health degradations.
An issue presents an overview, the investigation, details about the alerts, and the resources involved.
You can set the severity, status, and impact time of an issue.
What is an investigation?
An investigation is an analysis of a set of findings within the context of an issue. The analysis uses AI-based, iterative triage and diagnostic processes. The investigation minimizes manual effort to enable faster and more accurate troubleshooting.
Only the latest investigation is displayed. Users can edit the scope and impact time and run a new investigation. An investigation scans up to two hours of telemetry from the issue impact time.
Findings
Findings identify anomalous behavior that could explain a problem with a service resource. They summarize the analysis of multiple anomalies (for example, 'VM performance is low due to possible memory leak’) based on relevant signals (metrics, logs, etc.) and might suggest further investigation steps and potential mitigations.
A finding contains a summary that can include:
- What happened. A description of the finding with the resources included in the investigation.
- A possible explanation. A description of what might be causing problems for the specific finding and related evidence.
- Next steps. Suggestions for continuing the investigation or mitigating the problems.
- Evidence. Evidence is the data justifying the finding, such as anomalies, diagnostics insights, health data, resource changes, related resources, and related alerts.
Note
Up to five findings are displayed and all other anomalies are grouped into Additional data.
Evidence types
Metric anomaly explanations
In addition to detecting anomalies, explanations are created based the metric dimensions, for example, the specific region or error code of the anomaly.
Application logs Analysis
The investigation scans the application logs for anomalies. The top three failure events (for dependencies, requests and exceptions) are analyzed. For each event:
- Explanation: An explanation of what happened is generated for the failure.
- Transaction Examples: A list of examples of transactions in which the specific failure event exists. Selecting the example displays the end-to-end transaction in Application Insights.
- Exceptions: If there are specific exception problem IDs that correlate with the failure, they'll be displayed with the count of appearance in the logs. The problem IDs are explained in natural language and an example is provided.
- Transaction Pattern: If there's a specific transaction pattern the failure, it is displayed. This information can help explain the issue and show the root cause. If there are multiple transaction patterns, no pattern is displayed.
- Trace Message Patterns: If there are specific trace message patterns that correlate with the failure, they'll be displayed with the count of appearance in the logs. The patterns are explained in natural language and an example is provided.
Diagnostic insights
Provides actionable solutions and diagnostics based on abnormal telemetry from Azure support best practices, enhancing issue resolution efficiency.
Related Alerts
Contains data from related, high-severity alerts on the issue scoped resource that occurred in the last 15 minutes. Those alerts are synced back to the issue and appear in the Alerts tab.
Resource Health
Provides events data from Azure Resource Health about resource health degradation in the investigated period.
Capabilities
Configurable scope
Azure Monitor investigation makes suggestions for which resources to analyze based on the scope of the investigation. The default scope of an investigation includes all metrics of the resource. You can change the scope to include up to five resources. See Scope the investigation in Use issue and investigation.
Smart scoping
An investigation also offers smart scoping for Application Insight resources. In this case, possible suspected resources are automatically identified by looking at the dependencies and the infrastructure where the service is running then includes them in the analysis. This happens during an investigation and the results are synced to the issue.
Issue and investigation initial workflow example
- An alert email from Azure Monitor is received.
- A select on the investigate button in the email creates an issue and starts an investigation. The issue page on the Azure portal opens in your browser.
- On the Issue page, you're presented with:
- The issue overview where the findings of the last investigation are presented with summarized evidence.
- Each finding contains the AI analysis summary, suggested actions to take and the evidence used for the analysis.
- Every finding in an investigation presents more details on the potential cause and present next steps to choose from.
Regions
These are the supported Azure regions for issues and investigation services:
Public preview region availability |
---|
australiaeast |
centralus |
eastasia |
eastus |
eastus2euap |
uksouth |
westeurope |