How can we get Usage Information while calling Azure OpenAI via Streaming?

Mundra, Ashish 0 Reputation points

We are calling Azure OpenAI from C# API using something like below. How can we get Usage Information e.g. GeneratedTokens, PromptTokens and TotalTokens using this mechanism?

await foreach (StreamingChatChoice choice in response.Value.GetChoicesStreaming(cancellationToken))
    await foreach (ChatMessage chatMessage in choice.GetMessageStreaming(cancellationToken))

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
1,470 questions
{count} votes

1 answer

Sort by: Most helpful
  1. AshokPeddakotla-MSFT 22,206 Reputation points

    Mundra, Ashish Apologies for the delayed response.

    Can we include a custom header in our Requests to OpenAI API lets say application-name. Then we enable Request Logs to be sent to Log Analytics Workspace (LAW). Then our Azure OpenAI instance can send the Request Logs to the same LAW.  Will then there be a way to create a report which can corelate this Request header/or User with number of Prompts that were used by that Request, so that we can group by application-name and get Tokens consumed by that Application every month?

    AFAIK, currently these are not available.

    However, as mentioned in Azure OpenAI Service Announces New Models and Multimodal Advancements at Microsoft Ignite 2023 post,

    We are launching improved monitoring and observability capabilities for Azure OpenAI Service on the Azure Portal. With Azure OpenAI’s out-of-the-box metrics dashboard, customers can get a bird’s eye view of the most important usage and performance metrics for Azure OpenAI Service within seconds. You can understand the usage trends for your resource by breaking down API requests by model name, model version, deployment, operation and stream type (streaming or non-streaming), view error rates, latency, and utilization, among others. To get started, select your Azure OpenAI resource on Azure Portal, click on ‘Overview’ on the left pane and select ‘Monitoring’ on the right pane.

    In addition, we are launching an enhanced metrics stack on Azure Monitor for Provisioned Throughput customers. As token costs vary significantly from model to model, we are launching ‘Active Tokens’ as an enhanced measure of your token utilization for provisioned deployments. Provisioned-managed customers can also understand the utilization % for their deployment and act when their calls are throttled when utilization exceeds 100%. Starting 11/27, customers will have a way to understand the latency of their streaming requests through the ‘Time to Response’ metric, which measures the time taken for the first response to appear after you have sent a response, excluding any client-side latency. Apart from the out-of-the-box dashboard, we have enhanced our Diagnostics Logs experience by adding deployment, model and response/request related schema to give you more granular visibility into your service usage.

    Hope this helps. Do let us know if you have any further queries.

    0 comments No comments