Detect and mitigate potential issues using AIOps and machine learning in Azure Monitor

บทความ
02/14/2024

Artificial Intelligence for IT Operations (AIOps) offers powerful ways to improve service quality and reliability by using machine learning to process and automatically act on data you collect from applications, services, and IT resources into Azure Monitor.

Azure Monitor's built-in AIOps capabilities provide insights and help you troubleshoot issues and automate data-driven tasks, such as predicting capacity usage and autoscaling, identifying and analyzing application performance issues, and detecting anomalous behaviors in virtual machines, containers, and other resources. These features boost your IT monitoring and operations, without requiring machine learning knowledge and further investment.

Azure Monitor also provides tools that let you create your own machine learning pipeline to introduce new analysis and response capabilities and act on data in Azure Monitor Logs.

This article describes Azure Monitor's built-in AIOps capabilities and explains how you can create and run customized machine learning models and build an automated machine learning pipeline on data in Azure Monitor Logs.

Built-in Azure Monitor AIOps and machine learning capabilities

Monitoring scenario	Capability	Description
Log monitoring	Log Analytics Workspace Insights	Provides a unified view of your Log Analytics workspaces and uses machine learning to detect ingestion anomalies.
	Kusto Query Language (KQL) time series analysis and machine learning functions	Easy-to-use tools for generating time series data, detecting anomalies, forecasting, and performing root cause analysis directly in Azure Monitor Logs without requiring in-depth knowledge of data science and programming languages.
	Microsoft Copilot in Azure	Helps you use Log Analytics to analyze data and troubleshoot issues. Generates example KQL queries based on prompts, such as "Are there any errors in container logs?".
Application performance monitoring	Application Map Intelligent view	Maps dependencies between services and helps you spot performance bottlenecks or failure hotspots across all components of your distributed application.
	Smart detection	Analyzes the telemetry your application sends to Application Insights, alerts on performance problems and failure anomalies, and identifies potential root causes of application performance issues.
Metric alerts	Dynamic thresholds for metric alerting	Learns metrics patterns, automatically sets alert thresholds based on historical data, and identifies anomalies that might indicate service issues.
Virtual machine scale sets	Predictive autoscale	Forecasts the overall CPU requirements of a virtual machine scale set, based on historical CPU usage patterns, and automatically scales out to meet these needs.

Machine learning in Azure Monitor Logs

Use the Kusto Query Language's built-in time series analysis and machine learning functions, operators, and plug-ins to gain insights about service health, usage, capacity and other trends, and to generate forecasts and detect anomalies in Azure Monitor Logs.

To gain greater flexibility and expand your ability to analyze and act on data, you can also implement your own machine learning pipeline on data in Azure Monitor Logs.

This table compares the advantages and limitations of using KQL's built-in machine learning capabilities and creating your own machine learning pipeline, and links to tutorials that demonstrate how you can implement each:

	Built-in KQL machine learning capabilities	Create your own machine learning pipeline
Scenario	✅ Anomaly detection, root cause, and time series analysis	✅ Anomaly detection, root cause, and time series analysis ✅ Advanced analysis and AIOPs scenarios
Advantages	🔹Gets you started very quickly. 🔹No data science knowledge and programming skills required. 🔹 Optimal performance and cost savings.	🔹Supports larger scales. 🔹Enables advanced, more complex scenarios. 🔹Flexibility in choosing libraries, models, parameters.
Service limits and data volumes	Azure portal or Query API log query limits depending on whether you're working in the portal or using the API, for example, from a notebook.	🔹Query API log query limits if you query data in Azure Monitor Logs as part of your machine learning pipeline. Otherwise, no Azure service limits. 🔹Can support larger data volumes.
Integration	None required. Run using Log Analytics in the Azure portal or from an integrated Jupyter Notebook.	Requires integration with a tool, such as Jupyter Notebook. Typically, you'd also integrate with other Azure services, like Azure Synapse Analytics.
Performance	Optimal performance, using the Azure Data Explorer platform, running at high scales in a distributed manner.	Introduces a small amount of latency when querying or exporting data, depending on how you implement your machine learning pipeline.
Model type	Linear regression model and other models supported by KQL time series functions with a set of configurable parameters.	Completely customizable machine learning model or anomaly detection method.
Cost	No extra cost.	Depending on how you implement your machine learning pipeline, you might incur charges for exporting data, ingesting scored data into Azure Monitor Logs, and the use of other Azure services.
Tutorial	Detect and analyze anomalies using KQL machine learning capabilities in Azure Monitor	Analyze data in Azure Monitor Logs using a notebook

Create your own machine learning pipeline on data in Azure Monitor Logs

Build your own machine learning pipeline on data in Azure Monitor Logs to introduce new AIOps capabilities and support advanced scenarios, such as:

Hunting for security attacks with more sophisticated models than those by KQL.
Detecting performance issues and troubleshooting errors in a web application.
Creating multi-step flows, running code in each step based on the results of the previous step.
Automating the analysis of Azure Monitor Log data and providing insights into multiple areas, including infrastructure health and customer behavior.
Correlating data in Azure Monitor Logs with data from other sources.

There are two approaches to making data in Azure Monitor Logs available to your machine learning pipeline:

Query data in Azure Monitor Logs - Integrate a notebook with Azure Monitor Logs or run a script or application on log data using libraries like Azure Monitor Query client library or MSTICPY to retrieve data from Azure Monitor Logs in tabular form; for example, into a Pandas DataFrame. The data you query is retrieved to an in-memory object on your server, without exporting the data out of your Log Analytics workspace.

Note

You might need to convert data formats as part of your pipeline. For example, to use libraries built on top of Apache Spark, like SynapseML, you might need to convert Pandas to PySpark DataFrame.
Export data out of Azure Monitor Logs - Export data out of your Log Analytics workspace, usually to a blob storage account, and implement your machine learning pipeline using a machine learning library.

This table compares the advantages and limitations of the approaches to retrieving data for your machine learning pipeline:

	Query data in Azure Monitor Logs	Export data
Advantages	🔹Gets you started quickly. 🔹Requires only basic data science and programming skills. 🔹Minimal latency and cost savings.	🔹Supports larger scales. 🔹No query limitations.
Data exported?	No	Yes
Service limits	Query API log query limits and user query throttling. You can overcome Query API limits to, a certain degree, by splitting larger queries into chunks.	None from Azure Monitor.
Data volumes	Analyze several GBs of data, or a few million records per hour.	Supports large volumes of data.
Machine learning library	For small to medium-sized datasets, you'd typically use single-node machine learning libraries, like Scikit Learn.	For large datasets, you'd typically use big data machine learning libraries, like SynapseML.
Latency	Minimal.	Introduces a small amount of latency in exporting data.
Cost	No extra charges in Azure Monitor. Cost of Azure Synapse Analytics, Azure Machine Learning, or other service, if used.	Cost of data export and external storage. Cost of Azure Synapse Analytics, Azure Machine Learning, or other service, if used.

Tip

To benefit from the best of both implementation approaches, create a hybrid pipeline. A common hybrid approach is to export data for model training, which involves large volumes of data, and to use the query data in Azure Monitor Logs approach to explore data and score new data to reduce latency and costs.

Implement the steps of the machine learning lifecycle in Azure Monitor Logs

Setting up a machine learning pipeline typically involves all or some of the steps described below.

There are various Azure and open source machine learning libraries you can use to implement your machine learning pipeline, including Scikit Learn, PyTorch, Tensorflow, Spark MLlib, and SynapseML.

This table describes each step and provides high-level guidance and some examples of how to implement these steps based on the implementation approaches described in Create your own machine learning pipeline on data in Azure Monitor Logs:

Step	Description	Query data in Azure Monitor Logs	Export data
Explore data	Examine and understand the data you've collected.	The simplest way to explore your data is using Log Analytics, which provides a rich set of tools for exploring and visualizing data in the Azure portal. You can also analyze data in Azure Monitor Logs using a notebook.	To analyze logs outside of Azure Monitor, export data out of your Log Analytics workspace and set up the environment in the service you choose. For an example of how to explore logs outside of Azure Monitor, see Analyze data exported from Log Analytics using Synapse.
Build and training a machine learning model	Model training is an iterative process. Researchers or data scientists develop a model by fetching and cleaning the training data, engineer features, trying various models and tuning parameters, and repeating this cycle until the model is accurate and robust.	For small to medium-sized datasets, you typically use single-node machine learning libraries, like Scikit Learn. For an example of how to train a machine learning model on data in Azure Monitor Logs using the Scikit Learn library, see this sample notebook: Detect anomalies in Azure Monitor Logs using machine learning techniques.	For large datasets, you typically use big data machine learning libraries, like SynapseML.
Deploy and score a model	Scoring is the process of applying a machine learning model on new data to get predictions. Scoring usually needs to be done at scale with minimal latency.	To query new data in Azure Monitor Logs, use Azure Monitor Query client library. For an example of how to score data using open source tools, see this sample notebook: Detect anomalies in Azure Monitor Logs using machine learning techniques.
Run your pipeline on schedule	Automate your pipeline to retrain your model regularly on current data.	Schedule your machine learning pipeline with Azure Synapse Analytics or Azure Machine Learning.	See the examples in the Query data in Azure Monitor Logs column.

Ingesting scored results to a Log Analytics workspace lets you use the data to get advanced insights, and to create alerts and dashboards. For an example of how to ingest scored results using Azure Monitor Ingestion client library, see Ingest anomalies into a custom table in your Log Analytics workspace.

Next steps

Learn more about:

แชร์ผ่าน