Azure OpenAI streaming token usage

김세형 105 Reputation points
2024-07-09T13:12:25.7466667+00:00

Situation

We have multiple services that use GPT model, and the services use streaming chat completion.

And, token usage monitoring is required for each service. So, It needs retrieving token usage from stream response.

Problem

But, the response doesn't provide token usage with stream.

Azure OpenAI vs OpenAI

OpenAI has the token usage option for stream response. It has been about two months since the feature was applied to openAI.

So, Is there any plan to release this feature in Azure?
stream_options={"include_usage": True}, # retrieving token usage for stream response

OpenAI feature release https://github.com/openai/openai-python/releases/tag/v1.26.0

OpenAI cookbook https://cookbook.openai.com/examples/how_to_stream_completions#4-how-to-get-token-usage-data-for-streamed-chat-completion-response

Azure API release https://github.com/Azure/azure-rest-api-specs/blob/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/stable/2024-06-01/inference.yaml

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,132 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Maverick Li (Medalsoft) 5 Reputation points
    2024-09-26T03:37:04.2533333+00:00
    from langchain_openai import AzureChatOpenAI
    import asyncio
    from langchain_core.messages import HumanMessage
    
    llm = AzureChatOpenAI(
        api_key="xxxx",
        azure_endpoint="https://xxxxxx.openai.azure.com/",
        api_version="2024-08-01-preview",
        openai_api_type="azure",
        azure_deployment="gpt-4o",
        model_name="gpt-4o",
        temperature=0,
        stream=True,
        stream_options={"include_usage": True},
        # model_kwargs={"stream_options": {"include_usage": True}}
    )
    
    req = [HumanMessage(
        content=[{'type': 'text', 'text': "what's the pic describe"},
                 {'type': 'image_url', 'image_url': {
                     "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/7/70/Snow_Man.jpg/500px-Snow_Man.jpg"}}])]
    
    req_2 = [HumanMessage(content=[{'type': 'text', 'text': 'tell me a joke'}])]
    
    
    async def fetch_joke():
        async for event in llm.astream_events(req, version="v2"):
            if event["event"] == "on_chat_model_end":
                print(f'Token usage: {event["data"]["output"].usage_metadata}\n')
            elif event["event"] == "on_chat_model_stream":
                chunk = event["data"]["chunk"]
                print(chunk)
            else:
                pass
    
    
    asyncio.run(fetch_joke())
    
    
    

    when chat with image, it occured a errorUser's image

    but no image is ok

    User's image

    You found this answer helpful.
    0 comments No comments

  2. VasaviLankipalle-MSFT 17,396 Reputation points
    2024-07-09T21:11:09.23+00:00

    Hello @김세형 , Thanks for using Microsoft Q&A Platform.

    Unfortunately, we don't have any ETA to share with you at this moment. I hope you understand.

    You can provide product Feeback here: https://feedback.azure.com/d365community/forum/79b1327d-d925-ec11-b6e6-000d3a4f06a4

    I hope this helps.

    Regards,

    Vasavi

    -Please kindly accept the answer and vote 'yes' if you feel helpful to support the community, thanks.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.