Azure OpenAI streaming token usage

김세형 135 Reputation points
2024-07-09T13:12:25.7466667+00:00

Situation

We have multiple services that use GPT model, and the services use streaming chat completion.

And, token usage monitoring is required for each service. So, It needs retrieving token usage from stream response.

Problem

But, the response doesn't provide token usage with stream.

Azure OpenAI vs OpenAI

OpenAI has the token usage option for stream response. It has been about two months since the feature was applied to openAI.

So, Is there any plan to release this feature in Azure?
stream_options={"include_usage": True}, # retrieving token usage for stream response

OpenAI feature release https://github.com/openai/openai-python/releases/tag/v1.26.0

OpenAI cookbook https://cookbook.openai.com/examples/how_to_stream_completions#4-how-to-get-token-usage-data-for-streamed-chat-completion-response

Azure API release https://github.com/Azure/azure-rest-api-specs/blob/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/stable/2024-06-01/inference.yaml

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
4,080 questions
{count} votes

3 answers

Sort by: Most helpful
  1. Maverick Li (Medalsoft) 5 Reputation points
    2024-09-26T03:37:04.2533333+00:00
    from langchain_openai import AzureChatOpenAI
    import asyncio
    from langchain_core.messages import HumanMessage
    
    llm = AzureChatOpenAI(
        api_key="xxxx",
        azure_endpoint="https://xxxxxx.openai.azure.com/",
        api_version="2024-08-01-preview",
        openai_api_type="azure",
        azure_deployment="gpt-4o",
        model_name="gpt-4o",
        temperature=0,
        stream=True,
        stream_options={"include_usage": True},
        # model_kwargs={"stream_options": {"include_usage": True}}
    )
    
    req = [HumanMessage(
        content=[{'type': 'text', 'text': "what's the pic describe"},
                 {'type': 'image_url', 'image_url': {
                     "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/7/70/Snow_Man.jpg/500px-Snow_Man.jpg"}}])]
    
    req_2 = [HumanMessage(content=[{'type': 'text', 'text': 'tell me a joke'}])]
    
    
    async def fetch_joke():
        async for event in llm.astream_events(req, version="v2"):
            if event["event"] == "on_chat_model_end":
                print(f'Token usage: {event["data"]["output"].usage_metadata}\n')
            elif event["event"] == "on_chat_model_stream":
                chunk = event["data"]["chunk"]
                print(chunk)
            else:
                pass
    
    
    asyncio.run(fetch_joke())
    
    
    

    when chat with image, it occured a errorUser's image

    but no image is ok

    User's image

    1 person found this answer helpful.
    0 comments No comments

  2. VasaviLankipalle-MSFT 18,676 Reputation points Moderator
    2024-07-09T21:11:09.23+00:00

    Hello @김세형 , Thanks for using Microsoft Q&A Platform.

    Unfortunately, we don't have any ETA to share with you at this moment. I hope you understand.

    You can provide product Feeback here: https://feedback.azure.com/d365community/forum/79b1327d-d925-ec11-b6e6-000d3a4f06a4

    I hope this helps.

    Regards,

    Vasavi

    -Please kindly accept the answer and vote 'yes' if you feel helpful to support the community, thanks.


  3. Tom Villani 0 Reputation points
    2025-03-31T16:00:20.24+00:00

    Posting the answer here in case it helps others who were as frustrated by the documentation and non-answers from Microsoft.

    The stream_options: {"include_usage": True} option must be set using the model_extras keyword argument in the Azure client:

    
    from azure.ai.inference import ChatCompletionsClient
    from azure.core.credentials import AzureKeyCredential
    from azure.ai.inference.models import SystemMessage, UserMessage
    
    
    client = ChatCompletionsClient(
        endpoint=os.getenv("<YOUR ENDPOINT ENV VAR>"),
        credential=AzureKeyCredential("<YOUR AZURE KEY ENV VAR>")
    )
    
    for chunk in client.complete(
        messages=[
            SystemMessage(content="You are a helpful assistant."),
            UserMessage(content="I am going to Paris, what should I see?")
        ],
        stream=True,
        model_extras={"stream_options": {"include_usage": True}}
    ):
        if hasattr(chunk, "usage") and chunk.usage is not None:
            print(chunk.usage)
    
    
    
    {'completion_tokens': 561, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens': 28, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}, 'total_tokens': 589}
    

    Report a concern

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.