Count the # of Prompt caching tokens for Azure OpenAI service

youyang 0 Reputation points
2024-11-22T03:34:42.1933333+00:00

Hi, Azure team, I deploy the gpt-4o-mini-2024-07-18 model on azure openai service and call it using AzureOpenAI client:

        client = AzureOpenAI(
            api_key=<api key>,
            azure_endpoint=https://xxxx.openai.azure.com/
            api_version=2024-10-01-preview,
        )

and send messages using:

    completion = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
        stream=False,
        temperature=0.4,
    )

However, when I print the completion.Usage, it outputs:

usage=CompletionUsage(completion_tokens=212, prompt_tokens=12554, total_tokens=12766)

I can't find fields like "prompt_tokens_details" or "cached_tokens" as the shown in https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/prompt-caching Is there any work around if I want to count the # of cached tokens in prompt?

Thanks

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
4,113 questions
{count} votes

1 answer

Sort by: Most helpful
  1. navba-MSFT 27,550 Reputation points Microsoft Employee Moderator
    2024-11-22T05:51:20.6066667+00:00

    @youyang Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

    .

    You can leverage the Azure OpenAI metrics as shown below to gather the inference and cached token usage details:

    User's image

    You can also apply Splitting for the model deployment name:

    User's image

    Hope this helps.

    2 people found this answer helpful.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.