Retrieving token usage in Azure OpenAI response when streaming is enabled

chaymr 186 Reputation points
2024-02-29T00:41:38.2966667+00:00

I have an Azure OpenAI deployment used by multiple internal users that charges back based on token usage found in the "usage" field of the API response. However, users who stream the response with "stream=True" do not receive the "usage" field in the Azure OpenAI response. Is there any way to retrieve the token count even with "stream=True"?.

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
4,098 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Ramr-msft 17,826 Reputation points
    2024-02-29T12:37:58.8533333+00:00

    Thanks for the question, Here is the sample for token count for stream enabled. Jupyter notebooks to calculate tokens usage with Tiktoken for scenarios with and without Token Streaming. https://github.com/LazaUK/AOAI-Streaming-TokenUsage/tree/main

    1 person found this answer helpful.

  2. Tom Villani 0 Reputation points
    2025-03-31T15:48:33.9366667+00:00

    Posting the answer here in case it helps others.

    The stream_options: {"include_usage": True} option must be set using the model_extras keyword argument in the Azure client:

    
    from azure.ai.inference import ChatCompletionsClient
    from azure.core.credentials import AzureKeyCredential
    from azure.ai.inference.models import SystemMessage, UserMessage
    
    
    client = ChatCompletionsClient(
        endpoint=os.getenv("<YOUR ENDPOINT ENV VAR>"),
        credential=AzureKeyCredential("<YOUR AZURE KEY ENV VAR>")
    )
    
    for chunk in client.complete(
        messages=[
            SystemMessage(content="You are a helpful assistant."),
            UserMessage(content="I am going to Paris, what should I see?")
        ],
        stream=True,
        model_extras={"stream_options": {"include_usage": True}}
    ):
        if hasattr(chunk, "usage") and chunk.usage is not None:
            print(chunk.usage)
    
    
    

    For the last chunk received, prints:

    {'completion_tokens': 561, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens': 28, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}, 'total_tokens': 589}
    
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.