Detect and mitigate potential issues using AIOps and machine learning in Azure Monitor

Artificial Intelligence for IT Operations (AIOps) offers powerful ways to improve service quality and reliability by using machine learning to process and automatically act on data you collect from applications, services, and IT resources into Azure Monitor.

Azure Monitor's built-in AIOps capabilities provide insights and help you troubleshoot issues and automate data-driven tasks, such as predicting capacity usage and autoscaling, identifying and analyzing application performance issues, and detecting anomalous behaviors in virtual machines, containers, and other resources. These features boost your IT monitoring and operations, without requiring machine learning knowledge and further investment.

Azure Monitor also provides tools that let you create your own machine learning pipeline to introduce new analysis and response capabilities and act on data in Azure Monitor Logs.

This article describes Azure Monitor's built-in AIOps capabilities and explains how you can create and run customized machine learning models and build an automated machine learning pipeline on data in Azure Monitor Logs.

Built-in Azure Monitor AIOps and machine learning capabilities

Monitoring scenario Capability Description
Root cause analysis of incidents Azure Monitor Investigator (preview) Automates analysis to simplify the identification of anomalies across Azure resources and provide next steps to mitigate issues.
Log monitoring Log Analytics Workspace Insights Provides a unified view of your Log Analytics workspaces and uses machine learning to detect ingestion anomalies.
Kusto Query Language (KQL) time series analysis and machine learning functions Easy-to-use tools for generating time series data, detecting anomalies, forecasting, and performing root cause analysis directly in Azure Monitor Logs without requiring in-depth knowledge of data science and programming languages.
Microsoft Copilot for Azure Helps you use Log Analytics to analyze data and troubleshoot issues. Generates example KQL queries based on prompts, such as "Are there any errors in container logs?".
Application performance monitoring Application Map Intelligent view Maps dependencies between services and helps you spot performance bottlenecks or failure hotspots across all components of your distributed application.
Smart detection Analyzes the telemetry your application sends to Application Insights, alerts on performance problems and failure anomalies, and identifies potential root causes of application performance issues.
Metric alerts Dynamic thresholds for metric alerting Learns metrics patterns, automatically sets alert thresholds based on historical data, and identifies anomalies that might indicate service issues.
Virtual machine scale sets Predictive autoscale Forecasts the overall CPU requirements of a virtual machine scale set, based on historical CPU usage patterns, and automatically scales out to meet these needs.

Machine learning in Azure Monitor Logs

Use the Kusto Query Language's built-in time series analysis and machine learning functions, operators, and plug-ins to gain insights about service health, usage, capacity and other trends, and to generate forecasts and detect anomalies in Azure Monitor Logs.

To gain greater flexibility and expand your ability to analyze and act on data, you can also implement your own machine learning pipeline on data in Azure Monitor Logs.

This table compares the advantages and limitations of using KQL's built-in machine learning capabilities and creating your own machine learning pipeline, and links to tutorials that demonstrate how you can implement each:

Built-in KQL machine learning capabilities Create your own machine learning pipeline
Scenario ✅ Anomaly detection, root cause, and time series analysis
✅ Anomaly detection, root cause, and time series analysis
Advanced analysis and AIOPs scenarios
Advantages 🔹Gets you started very quickly.
🔹No data science knowledge and programming skills required.
🔹 Optimal performance and cost savings.
🔹Supports larger scales.
🔹Enables advanced, more complex scenarios.
🔹Flexibility in choosing libraries, models, parameters.
Service limits and data volumes Azure portal or Query API log query limits depending on whether you're working in the portal or using the API, for example, from a notebook. 🔹Query API log query limits if you query data in Azure Monitor Logs as part of your machine learning pipeline. Otherwise, no Azure service limits.
🔹Can support larger data volumes.
Integration None required. Run using Log Analytics in the Azure portal or from an integrated Jupyter Notebook. Requires integration with a tool, such as Jupyter Notebook. Typically, you'd also integrate with other Azure services, like Azure Synapse Analytics.
Performance Optimal performance, using the Azure Data Explorer platform, running at high scales in a distributed manner. Introduces a small amount of latency when querying or exporting data, depending on how you implement your machine learning pipeline.
Model type Linear regression model and other models supported by KQL time series functions with a set of configurable parameters. Completely customizable machine learning model or anomaly detection method.
Cost No extra cost. Depending on how you implement your machine learning pipeline, you might incur charges for exporting data, ingesting scored data into Azure Monitor Logs, and the use of other Azure services.
Tutorial Detect and analyze anomalies using KQL machine learning capabilities in Azure Monitor Analyze data in Azure Monitor Logs using a notebook

Create your own machine learning pipeline on data in Azure Monitor Logs

Build your own machine learning pipeline on data in Azure Monitor Logs to introduce new AIOps capabilities and support advanced scenarios, such as:

  • Hunting for security attacks with more sophisticated models than those by KQL.
  • Detecting performance issues and troubleshooting errors in a web application.
  • Creating multi-step flows, running code in each step based on the results of the previous step.
  • Automating the analysis of Azure Monitor Log data and providing insights into multiple areas, including infrastructure health and customer behavior.
  • Correlating data in Azure Monitor Logs with data from other sources.

There are two approaches to making data in Azure Monitor Logs available to your machine learning pipeline:

This table compares the advantages and limitations of the approaches to retrieving data for your machine learning pipeline:

Query data in Azure Monitor Logs Export data
Advantages 🔹Gets you started quickly.
🔹Requires only basic data science and programming skills.
🔹Minimal latency and cost savings.
🔹Supports larger scales.
🔹No query limitations.
Data exported? No Yes
Service limits Query API log query limits and user query throttling. You can overcome Query API limits to, a certain degree, by splitting larger queries into chunks. None from Azure Monitor.
Data volumes Analyze several GBs of data, or a few million records per hour. Supports large volumes of data.
Machine learning library For small to medium-sized datasets, you'd typically use single-node machine learning libraries, like Scikit Learn. For large datasets, you'd typically use big data machine learning libraries, like SynapseML.
Latency Minimal. Introduces a small amount of latency in exporting data.
Cost No extra charges in Azure Monitor.
Cost of Azure Synapse Analytics, Azure Machine Learning, or other service, if used.
Cost of data export and external storage.
Cost of Azure Synapse Analytics, Azure Machine Learning, or other service, if used.

Tip

To benefit from the best of both implementation approaches, create a hybrid pipeline. A common hybrid approach is to export data for model training, which involves large volumes of data, and to use the query data in Azure Monitor Logs approach to explore data and score new data to reduce latency and costs.

Implement the steps of the machine learning lifecycle in Azure Monitor Logs

Setting up a machine learning pipeline typically involves all or some of the steps described below.

There are various Azure and open source machine learning libraries you can use to implement your machine learning pipeline, including Scikit Learn, PyTorch, Tensorflow, Spark MLlib, and SynapseML.

This table describes each step and provides high-level guidance and some examples of how to implement these steps based on the implementation approaches described in Create your own machine learning pipeline on data in Azure Monitor Logs:

Step Description Query data in Azure Monitor Logs Export data
Explore data Examine and understand the data you've collected. The simplest way to explore your data is using Log Analytics, which provides a rich set of tools for exploring and visualizing data in the Azure portal. You can also analyze data in Azure Monitor Logs using a notebook. To analyze logs outside of Azure Monitor, export data out of your Log Analytics workspace and set up the environment in the service you choose.
For an example of how to explore logs outside of Azure Monitor, see Analyze data exported from Log Analytics using Synapse.
Build and training a machine learning model Model training is an iterative process. Researchers or data scientists develop a model by fetching and cleaning the training data, engineer features, trying various models and tuning parameters, and repeating this cycle until the model is accurate and robust. For small to medium-sized datasets, you typically use single-node machine learning libraries, like Scikit Learn.
For an example of how to train a machine learning model on data in Azure Monitor Logs using the Scikit Learn library, see this sample notebook: Detect anomalies in Azure Monitor Logs using machine learning techniques.
For large datasets, you typically use big data machine learning libraries, like SynapseML.
Deploy and score a model Scoring is the process of applying a machine learning model on new data to get predictions. Scoring usually needs to be done at scale with minimal latency. To query new data in Azure Monitor Logs, use Azure Monitor Query client library.
For an example of how to score data using open source tools, see this sample notebook: Detect anomalies in Azure Monitor Logs using machine learning techniques.
Run your pipeline on schedule Automate your pipeline to retrain your model regularly on current data. Schedule your machine learning pipeline with Azure Synapse Analytics or Azure Machine Learning. See the examples in the Query data in Azure Monitor Logs column.

Ingesting scored results to a Log Analytics workspace lets you use the data to get advanced insights, and to create alerts and dashboards. For an example of how to ingest scored results using Azure Monitor Ingestion client library, see Ingest anomalies into a custom table in your Log Analytics workspace.

Next steps

Learn more about: