Monitoring Azure OpenAI Service

When you have critical applications and business processes that rely on Azure resources, you want to monitor those resources for their availability, performance, and operation.

This article describes the monitoring data generated by Azure OpenAI Service. Azure OpenAI is part of Azure AI services, which uses Azure Monitor. If you're unfamiliar with the features of Azure Monitor that are common to all Azure services that use the service, see Monitoring Azure resources with Azure Monitor.

Dashboards

Azure OpenAI provides out-of-box dashboards for each of your Azure OpenAI resources. To access the monitoring dashboards sign-in to https://portal.azure.com and select the overview pane for one of your Azure OpenAI resources.

Screenshot that shows out-of-box dashboards for an Azure OpenAI resource in the Azure portal.

The dashboards are grouped into four categories: HTTP Requests, Tokens-Based Usage, PTU Utilization, and Fine-tuning

Data collection and routing in Azure Monitor

Azure OpenAI collects the same kinds of monitoring data as other Azure resources. You can configure Azure Monitor to generate data in activity logs, resource logs, virtual machine logs, and platform metrics. For more information, see Monitoring data from Azure resources.

Platform metrics and the Azure Monitor activity log are collected and stored automatically. This data can be routed to other locations by using a diagnostic setting. Azure Monitor resource logs aren't collected and stored until you create a diagnostic setting and then route the logs to one or more locations.

When you create a diagnostic setting, you specify which categories of logs to collect. For more information about creating a diagnostic setting by using the Azure portal, the Azure CLI, or PowerShell, see Create diagnostic setting to collect platform logs and metrics in Azure.

Keep in mind that using diagnostic settings and sending data to Azure Monitor Logs has other costs associated with it. For more information, see Azure Monitor Logs cost calculations and options.

The metrics and logs that you can collect are described in the following sections.

Analyze metrics

You can analyze metrics for your Azure OpenAI Service resources with Azure Monitor tools in the Azure portal. From the Overview page for your Azure OpenAI resource, select Metrics under Monitoring in the left pane. For more information, see Get started with Azure Monitor metrics explorer.

Azure OpenAI has commonality with a subset of Azure AI services. For a list of all platform metrics collected for Azure OpenAI and similar Azure AI services by Azure Monitor, see Supported metrics for Microsoft.CognitiveServices/accounts.

Cognitive Services Metrics

These are legacy metrics that are common to all Azure AI Services resources. We no longer recommend that you use these metrics with Azure OpenAI.

Azure OpenAI Metrics

Note

The Provisioned-managed Utilization metric is now deprecated and is no longer recommended. This metric has been replaced by the Provisioned-managed Utilization V2 metric.

The following table summarizes the current subset of metrics available in Azure OpenAI.

Metric Category Aggregation Description Dimensions
Azure OpenAI Requests HTTP Count Total number of calls made to the Azure OpenAI API over a period of time. Applies to PayGo, PTU, and PTU-managed SKUs. ApiName, ModelDeploymentName,ModelName,ModelVersion, OperationName, Region, StatusCode, StreamType
Generated Completion Tokens Usage Sum Number of generated tokens (output) from an Azure OpenAI model. Applies to PayGo, PTU, and PTU-manged SKUs ApiName, ModelDeploymentName,ModelName, Region
Processed FineTuned Training Hours Usage Sum Number of training hours processed on an Azure OpenAI fine-tuned model. ApiName, ModelDeploymentName,ModelName, Region
Processed Inference Tokens Usage Sum Number of inference tokens processed by an Azure OpenAI model. Calculated as prompt tokens (input) + generated tokens. Applies to PayGo, PTU, and PTU-manged SKUs. ApiName, ModelDeploymentName,ModelName, Region
Processed Prompt Tokens Usage Sum Total number of prompt tokens (input) processed on an Azure OpenAI model. Applies to PayGo, PTU, and PTU-managed SKUs. ApiName, ModelDeploymentName,ModelName, Region
Provision-managed Utilization V2 HTTP Average Provision-managed utilization is the utilization percentage for a given provisioned-managed deployment. Calculated as (PTUs consumed/PTUs deployed)*100. When utilization is at or above 100%, calls are throttled and return a 429 error code. ModelDeploymentName,ModelName,ModelVersion, Region, StreamType
Prompt Token Cache Match Rate HTTP Average Provisioned-managed only. The prompt token cache hit ration expressed as a percentage. ModelDeploymentName, ModelVersion, ModelName, Region
Time to Response HTTP Average Recommended latency (responsiveness) measure for streaming requests. Applies to PTU, and PTU-managed deployments. This metric does not apply to standard pay-go deployments. Calculated as time taken for the first response to appear after a user sends a prompt, as measured by the API gateway. This number increases as the prompt size increases and/or cache hit size reduces. Note: this metric is an approximation as measured latency is heavily dependent on multiple factors, including concurrent calls and overall workload pattern. In addition, it does not account for any client- side latency that may exist between your client and the API endpoint. Please refer to your own logging for optimal latency tracking. ModelDepIoymentName, ModelName, and ModelVersion

Configure diagnostic settings

All of the metrics are exportable with diagnostic settings in Azure Monitor. To analyze logs and metrics data with Azure Monitor Log Analytics queries, you need to configure diagnostic settings for your Azure OpenAI resource and your Log Analytics workspace.

  1. From your Azure OpenAI resource page, under Monitoring, select Diagnostic settings on the left pane. On the Diagnostic settings page, select Add diagnostic setting.

    Screenshot that shows how to open the Diagnostic setting page for an Azure OpenAI resource in the Azure portal.

  2. On the Diagnostic settings page, configure the following fields:

    1. Select Send to Log Analytics workspace.
    2. Choose your Azure account subscription.
    3. Choose your Log Analytics workspace.
    4. Under Logs, select allLogs.
    5. Under Metrics, select AllMetrics.

    Screenshot that shows how to configure diagnostic settings for an Azure OpenAI resource in the Azure portal.

  3. Enter a Diagnostic setting name to save the configuration.

  4. Select Save.

After you configure the diagnostic settings, you can work with metrics and log data for your Azure OpenAI resource in your Log Analytics workspace.

Analyze logs

Data in Azure Monitor Logs is stored in tables where each table has its own set of unique properties.

All resource logs in Azure Monitor have the same fields followed by service-specific fields. For information about the common schema, see Common and service-specific schemas for Azure resource logs.

The activity log is a type of platform log in Azure that provides insight into subscription-level events. You can view this log independently or route it to Azure Monitor Logs. In the Azure portal, you can use the activity log in Azure Monitor Logs to run complex queries with Log Analytics.

For a list of the types of resource logs available for Azure OpenAI and similar Azure AI services, see Microsoft.CognitiveServices Azure resource provider operations.

Use Kusto queries

After you deploy an Azure OpenAI model, you can send some completions calls by using the playground environment in Azure AI Studio.

Screenshot that shows how to generate completions for an Azure OpenAI resource in the Azure OpenAI Studio playground.

Any text that you enter in the Completions playground or the Chat completions playground generates metrics and log data for your Azure OpenAI resource. In the Log Analytics workspace for your resource, you can query the monitoring data by using the Kusto query language.

Important

The Open query option on the Azure OpenAI resource page browses to Azure Resource Graph, which isn't described in this article. The following queries use the query environment for Log Analytics. Be sure to follow the steps in Configure diagnostic settings to prepare your Log Analytics workspace.

  1. From your Azure OpenAI resource page, under Monitoring on the left pane, select Logs.

  2. Select the Log Analytics workspace that you configured with diagnostics for your Azure OpenAI resource.

  3. From the Log Analytics workspace page, under Overview on the left pane, select Logs.

    The Azure portal displays a Queries window with sample queries and suggestions by default. You can close this window.

For the following examples, enter the Kusto query into the edit region at the top of the Query window, and then select Run. The query results display below the query text.

The following Kusto query is useful for an initial analysis of Azure Diagnostics (AzureDiagnostics) data about your resource:

AzureDiagnostics
| take 100
| project TimeGenerated, _ResourceId, Category, OperationName, DurationMs, ResultSignature, properties_s

This query returns a sample of 100 entries and displays a subset of the available columns of data in the logs. In the query results, you can select the arrow next to the table name to view all available columns and associated data types.

Screenshot that shows the Log Analytics query results for Azure Diagnostics data about the Azure OpenAI resource.

To see all available columns of data, you can remove the scoping parameters line | project ... from the query:

AzureDiagnostics
| take 100

To examine the Azure Metrics (AzureMetrics) data for your resource, run the following query:

AzureMetrics
| take 100
| project TimeGenerated, MetricName, Total, Count, Maximum, Minimum, Average, TimeGrain, UnitName

The query returns a sample of 100 entries and displays a subset of the available columns of Azure Metrics data:

Screenshot that shows the Log Analytics query results for Azure Metrics data about the Azure OpenAI resource.

Note

When you select Monitoring > Logs in the Azure OpenAI menu for your resource, Log Analytics opens with the query scope set to the current resource. The visible log queries include data from that specific resource only. To run a query that includes data from other resources or data from other Azure services, select Logs from the Azure Monitor menu in the Azure portal. For more information, see Log query scope and time range in Azure Monitor Log Analytics for details.

Set up alerts

Azure Monitor alerts proactively notify you when important conditions are found in your monitoring data. They allow you to identify and address issues in your system before your users notice them. You can set alerts on metrics, logs, and the activity log. Different types of alerts have different benefits and drawbacks.

Every organization's alerting needs vary and can change over time. Generally, all alerts should be actionable and have a specific intended response if the alert occurs. If an alert doesn't require an immediate response, the condition can be captured in a report rather than an alert. Some use cases might require alerting anytime certain error conditions exist. In other cases, you might need alerts for errors that exceed a certain threshold for a designated time period.

Errors below certain thresholds can often be evaluated through regular analysis of data in Azure Monitor Logs. As you analyze your log data over time, you might discover that a certain condition doesn't occur for an expected period of time. You can track for this condition by using alerts. Sometimes the absence of an event in a log is just as important a signal as an error.

Depending on what type of application you're developing with your use of Azure OpenAI, Azure Monitor Application Insights might offer more monitoring benefits at the application layer.

Next steps