Monitoring Azure OpenAI Service
When you have critical applications and business processes that rely on Azure resources, you want to monitor those resources for their availability, performance, and operation.
This article describes the monitoring data generated by Azure OpenAI Service. Azure OpenAI is part of Azure AI services, which uses Azure Monitor. If you're unfamiliar with the features of Azure Monitor that are common to all Azure services that use the service, see Monitoring Azure resources with Azure Monitor.
Azure OpenAI provides out-of-box dashboards for each of your Azure OpenAI resources. To access the monitoring dashboards sign-in to https://portal.azure.com and select the overview pane for one of your Azure OpenAI resources.
The dashboards are grouped into four categories: HTTP Requests, Tokens-Based Usage, PTU Utilization, and Fine-tuning
Data collection and routing in Azure Monitor
Azure OpenAI collects the same kinds of monitoring data as other Azure resources. You can configure Azure Monitor to generate data in activity logs, resource logs, virtual machine logs, and platform metrics. For more information, see Monitoring data from Azure resources.
Platform metrics and the Azure Monitor activity log are collected and stored automatically. This data can be routed to other locations by using a diagnostic setting. Azure Monitor resource logs aren't collected and stored until you create a diagnostic setting and then route the logs to one or more locations.
When you create a diagnostic setting, you specify which categories of logs to collect. For more information about creating a diagnostic setting by using the Azure portal, the Azure CLI, or PowerShell, see Create diagnostic setting to collect platform logs and metrics in Azure.
Keep in mind that using diagnostic settings and sending data to Azure Monitor Logs has other costs associated with it. For more information, see Azure Monitor Logs cost calculations and options.
The metrics and logs that you can collect are described in the following sections.
You can analyze metrics for your Azure OpenAI Service resources with Azure Monitor tools in the Azure portal. From the Overview page for your Azure OpenAI resource, select Metrics under Monitoring in the left pane. For more information, see Get started with Azure Monitor metrics explorer.
Azure OpenAI has commonality with a subset of Azure AI services. For a list of all platform metrics collected for Azure OpenAI and similar Azure AI services by Azure Monitor, see Supported metrics for Microsoft.CognitiveServices/accounts.
Cognitive Services Metrics
These are legacy metrics that are common to all Azure AI Services resources. We no longer recommend that you use these metrics with Azure OpenAI.
Azure OpenAI Metrics
The following table summarizes the current subset of metrics available in Azure OpenAI.
||HTTP||Count||Total number of calls made to the Azure OpenAI API over a period of time. Applies to PayGo, PTU, and PTU-managed SKUs.||
||Usage||Sum||Number of generated tokens (output) from an OpenAI model. Applies to PayGo, PTU, and PTU-manged SKUs||
||Usage||Sum||Number of Training Hours Processed on an OpenAI FineTuned Model||
||Usage||Sum||Number of inference tokens processed by an OpenAI model. Calculated as prompt tokens (input) + generated tokens. Applies to PayGo, PTU, and PTU-manged SKUs.||
||Usage||Sum||Total number of prompt tokens (input) processed on an OpenAI model. Applies to PayGo, PTU, and PTU-managed SKUs.||
||Usage||Average||Provision-managed utilization is the utilization percentage for a given provisioned-managed deployment. Calculated as (PTUs consumed/PTUs deployed)*100. When utilization is at or above 100%, calls are throttled and return a 429 error code.||
Configure diagnostic settings
All of the metrics are exportable with diagnostic settings in Azure Monitor. To analyze logs and metrics data with Azure Monitor Log Analytics queries, you need to configure diagnostic settings for your Azure OpenAI resource and your Log Analytics workspace.
From your Azure OpenAI resource page, under Monitoring, select Diagnostic settings on the left pane. On the Diagnostic settings page, select Add diagnostic setting.
On the Diagnostic settings page, configure the following fields:
- Select Send to Log Analytics workspace.
- Choose your Azure account subscription.
- Choose your Log Analytics workspace.
- Under Logs, select allLogs.
- Under Metrics, select AllMetrics.
Enter a Diagnostic setting name to save the configuration.
After you configure the diagnostic settings, you can work with metrics and log data for your Azure OpenAI resource in your Log Analytics workspace.
Data in Azure Monitor Logs is stored in tables where each table has its own set of unique properties.
All resource logs in Azure Monitor have the same fields followed by service-specific fields. For information about the common schema, see Common and service-specific schemas for Azure resource logs.
The activity log is a type of platform log in Azure that provides insight into subscription-level events. You can view this log independently or route it to Azure Monitor Logs. In the Azure portal, you can use the activity log in Azure Monitor Logs to run complex queries with Log Analytics.
For a list of the types of resource logs available for Azure OpenAI and similar Azure AI services, see Microsoft.CognitiveServices Azure resource provider operations.
Use Kusto queries
After you deploy an Azure OpenAI model, you can send some completions calls by using the playground environment in Azure AI Studio.
Any text that you enter in the Completions playground or the Chat completions playground generates metrics and log data for your Azure OpenAI resource. In the Log Analytics workspace for your resource, you can query the monitoring data by using the Kusto query language.
The Open query option on the Azure OpenAI resource page browses to Azure Resource Graph, which isn't described in this article. The following queries use the query environment for Log Analytics. Be sure to follow the steps in Configure diagnostic settings to prepare your Log Analytics workspace.
From your Azure OpenAI resource page, under Monitoring on the left pane, select Logs.
Select the Log Analytics workspace that you configured with diagnostics for your Azure OpenAI resource.
From the Log Analytics workspace page, under Overview on the left pane, select Logs.
The Azure portal displays a Queries window with sample queries and suggestions by default. You can close this window.
For the following examples, enter the Kusto query into the edit region at the top of the Query window, and then select Run. The query results display below the query text.
The following Kusto query is useful for an initial analysis of Azure Diagnostics (
AzureDiagnostics) data about your resource:
AzureDiagnostics | take 100 | project TimeGenerated, _ResourceId, Category, OperationName, DurationMs, ResultSignature, properties_s
This query returns a sample of 100 entries and displays a subset of the available columns of data in the logs. In the query results, you can select the arrow next to the table name to view all available columns and associated data types.
To see all available columns of data, you can remove the scoping parameters line
| project ... from the query:
AzureDiagnostics | take 100
To examine the Azure Metrics (
AzureMetrics) data for your resource, run the following query:
AzureMetrics | take 100 | project TimeGenerated, MetricName, Total, Count, Maximum, Minimum, Average, TimeGrain, UnitName
The query returns a sample of 100 entries and displays a subset of the available columns of Azure Metrics data:
When you select Monitoring > Logs in the Azure OpenAI menu for your resource, Log Analytics opens with the query scope set to the current resource. The visible log queries include data from that specific resource only. To run a query that includes data from other resources or data from other Azure services, select Logs from the Azure Monitor menu in the Azure portal. For more information, see Log query scope and time range in Azure Monitor Log Analytics for details.
Set up alerts
Azure Monitor alerts proactively notify you when important conditions are found in your monitoring data. They allow you to identify and address issues in your system before your users notice them. You can set alerts on metrics, logs, and the activity log. Different types of alerts have different benefits and drawbacks.
Every organization's alerting needs vary and can change over time. Generally, all alerts should be actionable and have a specific intended response if the alert occurs. If an alert doesn't require an immediate response, the condition can be captured in a report rather than an alert. Some use cases might require alerting anytime certain error conditions exist. In other cases, you might need alerts for errors that exceed a certain threshold for a designated time period.
Errors below certain thresholds can often be evaluated through regular analysis of data in Azure Monitor Logs. As you analyze your log data over time, you might discover that a certain condition doesn't occur for an expected period of time. You can track for this condition by using alerts. Sometimes the absence of an event in a log is just as important a signal as an error.
Depending on what type of application you're developing with your use of Azure OpenAI, Azure Monitor Application Insights might offer more monitoring benefits at the application layer.