Prompt caching works for OpenAI model o1-preview but not gpt-4o

Glenn Wright 40 Reputation points
2024-11-13T15:46:58.93+00:00

According this documentation:

https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/prompt-caching

...support for prompt caching was added in the 2024-10-01-preview API version, for several models including gpt-4o-2024-08-06 and o1-preview-2024-09-12. I tested sending long, repeated prompts to both of these models, and the response had the expected "prompt_token_details" property for the o1-preview model but not for the gpt-4o model. Is there an ETA for when this feature will be available for gpt-4o?

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
4,092 questions
{count} votes

Accepted answer
  1. VasaviLankipalle-MSFT 18,676 Reputation points Moderator
    2024-11-14T21:35:07.9666667+00:00

    Hello @Glenn Wright , I agree with the answer provided by Daniel FANG.

    "Does this mean that the gpt-4o-2024-08-06 model is doing prompt caching under the hood" -> yes "but that it's not indicating it in a way that's visible to the user because the API response parameter is not supported?" -> correct, because that version of API does not have cached_tokens

    However, as mentioned earlier,

    Official support for prompt caching was first added in API version 2024-10-01-preview. At this time, only o1-preview-2024-09-12 and o1-mini-2024-09-12 models support the cached_tokens API response parameter.

    We don't have any ETA details regarding GPT-4o model support cached_tokens API response parameter.

    Please check the what's new page and the respective documentation for the latest updates.

    I hope this helps.

    Regards,

    Vasavi


    Please kindly accept the answer and vote 'yes' if you feel helpful to support the community, thanks.

    1 person found this answer helpful.
    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Daniel Fang 1,060 Reputation points MVP
    2024-11-14T11:05:21.4366667+00:00

    Hey Glenn

    In your link, last paragraph in the Getting started section, it says: "Prompt caching is enabled by default with no additional configuration needed for supported models." also "Prompt caching is enabled by default. There is no opt-out option."

    https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/prompt-caching

    So I think the answer to your question is:
    "Does this mean that the gpt-4o-2024-08-06 model is doing prompt caching under the hood" -> yes
    "but that it's not indicating it in a way that's visible to the user because the API response parameter is not supported?" -> correct, because that version of API does not have cached_tokens

    2 people found this answer helpful.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.