Azure OpenAI streaming token usage

Question

Azure OpenAI streaming token usage

김세형 135

Situation

We have multiple services that use GPT model, and the services use streaming chat completion.

And, token usage monitoring is required for each service. So, It needs retrieving token usage from stream response.

Problem

But, the response doesn't provide token usage with stream.

Azure OpenAI vs OpenAI

OpenAI has the token usage option for stream response. It has been about two months since the feature was applied to openAI.

So, Is there any plan to release this feature in Azure?
stream_options={"include_usage": True}, # retrieving token usage for stream response

OpenAI feature release https://github.com/openai/openai-python/releases/tag/v1.26.0

OpenAI cookbook https://cookbook.openai.com/examples/how_to_stream_completions#4-how-to-get-token-usage-data-for-streamed-chat-completion-response

Azure API release https://github.com/Azure/azure-rest-api-specs/blob/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference/stable/2024-06-01/inference.yaml

Vivi 0 Reputation points

2024-09-09T07:16:50.75+00:00

Is there any workaround to get the usage anyway?
Ken Au 0 Reputation points

2024-09-10T16:59:19.8133333+00:00

Including usage during streaming completion is available in API version 2024-08-01-preview
Jay 86 Reputation points

2024-10-09T13:18:41.3666667+00:00

Its even available now with api version 2024-06-01 stable version I'm able to successfully validate it.I guess the swagger is not yet updated

Tom Villani 0

Posting here in case it helps others:

You need to use the model_extras keyword argument in the ChatCompletionsClient.complete(...) method:


from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential
from azure.ai.inference.models import SystemMessage, UserMessage


client = ChatCompletionsClient(
    endpoint=os.getenv("<YOUR ENDPOINT ENV VAR>"),
    credential=AzureKeyCredential("<YOUR AZURE KEY ENV VAR>")
)

for chunk in client.complete(
    messages=[
        SystemMessage(content="You are a helpful assistant."),
        UserMessage(content="I am going to Paris, what should I see?")
    ],
    stream=True,
    model_extras={"stream_options": {"include_usage": True}}
):
    if hasattr(chunk, "usage") and chunk.usage is not None:
        print(chunk.usage)

3 answers

Your answer

Vivi 0 Reputation points

2024-09-09T07:16:50.75+00:00

Is there any workaround to get the usage anyway?
Ken Au 0 Reputation points

2024-09-10T16:59:19.8133333+00:00

Including usage during streaming completion is available in API version 2024-08-01-preview
Jay 86 Reputation points

2024-10-09T13:18:41.3666667+00:00

Its even available now with api version 2024-06-01 stable version I'm able to successfully validate it.I guess the swagger is not yet updated
Tom Villani 0 Reputation points

2025-03-31T16:02:47.0066667+00:00

Posting here in case it helps others:

You need to use the model_extras keyword argument in the ChatCompletionsClient.complete(...) method:

from azure.ai.inference import ChatCompletionsClient from azure.core.credentials import AzureKeyCredential from azure.ai.inference.models import SystemMessage, UserMessage client = ChatCompletionsClient( endpoint=os.getenv("<YOUR ENDPOINT ENV VAR>"), credential=AzureKeyCredential("<YOUR AZURE KEY ENV VAR>") ) for chunk in client.complete( messages=[ SystemMessage(content="You are a helpful assistant."), UserMessage(content="I am going to Paris, what should I see?") ], stream=True, model_extras={"stream_options": {"include_usage": True}} ): if hasattr(chunk, "usage") and chunk.usage is not None: print(chunk.usage)

Answer 1

from langchain_openai import AzureChatOpenAI
import asyncio
from langchain_core.messages import HumanMessage

llm = AzureChatOpenAI(
    api_key="xxxx",
    azure_endpoint="https://xxxxxx.openai.azure.com/",
    api_version="2024-08-01-preview",
    openai_api_type="azure",
    azure_deployment="gpt-4o",
    model_name="gpt-4o",
    temperature=0,
    stream=True,
    stream_options={"include_usage": True},
    # model_kwargs={"stream_options": {"include_usage": True}}
)

req = [HumanMessage(
    content=[{'type': 'text', 'text': "what's the pic describe"},
             {'type': 'image_url', 'image_url': {
                 "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/7/70/Snow_Man.jpg/500px-Snow_Man.jpg"}}])]

req_2 = [HumanMessage(content=[{'type': 'text', 'text': 'tell me a joke'}])]


async def fetch_joke():
    async for event in llm.astream_events(req, version="v2"):
        if event["event"] == "on_chat_model_end":
            print(f'Token usage: {event["data"]["output"].usage_metadata}\n')
        elif event["event"] == "on_chat_model_stream":
            chunk = event["data"]["chunk"]
            print(chunk)
        else:
            pass


asyncio.run(fetch_joke())

when chat with image, it occured a error User's image

but no image is ok

User's image

Answer 2

VasaviLankipalle-MSFT 18,706 Moderator

Hello @김세형 , Thanks for using Microsoft Q&A Platform.

Unfortunately, we don't have any ETA to share with you at this moment. I hope you understand.

You can provide product Feeback here: https://feedback.azure.com/d365community/forum/79b1327d-d925-ec11-b6e6-000d3a4f06a4

I hope this helps.

Regards,

Vasavi

-Please kindly accept the answer and vote 'yes' if you feel helpful to support the community, thanks.

Pushkar Nimkar 15 Reputation points

2024-08-14T22:00:15.67+00:00

This will be an important addition. It'll be great if you Azure team can prioritize this!
xue chris 10 Reputation points

2024-08-18T14:59:39.5333333+00:00

Yes. It's a very important feature to us. Wish your team can follow up this feature as soon as possible!
Daniel P 5 Reputation points

2024-08-22T19:10:42.46+00:00

Hi @VasaviLankipalle-MSFT ! Do you have an ETA now by any chance?
Vivi 0 Reputation points

2024-09-09T07:11:07.34+00:00

I am also waiting for this to be added. Any news?
Kwanghoon Lee 0 Reputation points

2024-11-07T08:13:12.6966667+00:00

@VasaviLankipalle-MSFT Hello there,

Is there any updates on this include_usage feature?

Answer 3

Posting the answer here in case it helps others who were as frustrated by the documentation and non-answers from Microsoft.

The stream_options: {"include_usage": True} option must be set using the model_extras keyword argument in the Azure client:


from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential
from azure.ai.inference.models import SystemMessage, UserMessage


client = ChatCompletionsClient(
    endpoint=os.getenv("<YOUR ENDPOINT ENV VAR>"),
    credential=AzureKeyCredential("<YOUR AZURE KEY ENV VAR>")
)

for chunk in client.complete(
    messages=[
        SystemMessage(content="You are a helpful assistant."),
        UserMessage(content="I am going to Paris, what should I see?")
    ],
    stream=True,
    model_extras={"stream_options": {"include_usage": True}}
):
    if hasattr(chunk, "usage") and chunk.usage is not None:
        print(chunk.usage)

{'completion_tokens': 561, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens': 28, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}, 'total_tokens': 589}

Report a concern

Share via

Azure OpenAI streaming token usage

Situation

Problem

Azure OpenAI vs OpenAI

3 answers

Your answer