Count the # of Prompt caching tokens for Azure OpenAI service

youyang 0

Hi, Azure team, I deploy the gpt-4o-mini-2024-07-18 model on azure openai service and call it using AzureOpenAI client:

        client = AzureOpenAI(
            api_key=<api key>,
            azure_endpoint=https://xxxx.openai.azure.com/
            api_version=2024-10-01-preview,
        )

and send messages using:

    completion = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        stream=False,
        temperature=0.4,
    )

However, when I print the completion.Usage, it outputs:

usage=CompletionUsage(completion_tokens=212, prompt_tokens=12554, total_tokens=12766)

I can't find fields like "prompt_tokens_details" or "cached_tokens" as the shown in https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/prompt-caching Is there any work around if I want to count the # of cached tokens in prompt?

Thanks

navba-MSFT 27,550 Reputation points Microsoft Employee Moderator

2024-11-25T05:22:21.8966667+00:00

@youyang Just following up to check if my suggestion helped. Please let me know if you have any further queries. I would be happy to help.
Richard Sloggett 0 Reputation points

2024-12-06T16:59:39.8666667+00:00

Would like to add a voice to this request - cached tokens are not added to the response for 4o models, despite them being supported for caching. @navba-MSFT Although the azure metrics suggest that caching is taking place, the values displayed there do not support a calculation - i.e. they are showing as 800% for me. Active tokens looked promising as a means to back calculate, but that is returning zero always for me.

Can you clarify exactly how we can calculate the actual token cost and whether when we can expect the cached_tokens information to included in API responses?

1 answer

navba-MSFT 27,550 Reputation points Microsoft Employee Moderator

2024-11-22T05:51:20.6066667+00:00

@youyang Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

.

You can leverage the Azure OpenAI metrics as shown below to gather the inference and cached token usage details:

You can also apply Splitting for the model deployment name:

Hope this helps.
Please sign in to rate this answer.

2 people found this answer helpful.

0 comments No comments
Sign in to comment

Use comments to ask for clarification, additional information, or improvements to the question.

Your answer