Prompt caching in Azure OpenAI?

Christian 185 Reputation points
2024-10-02T11:39:05.5933333+00:00

OpenAi recently announced Prompt Caching in the API. Apparently it caches input tokens when the promt is larger than 1024 tokens.
https://openai.com/index/api-prompt-caching/

Is this feature enable in Azure OpenAI? If it's not, is there an ETA?

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
4,080 questions
{count} votes

Accepted answer
  1. Abel Wenning 81 Reputation points
    2024-10-22T18:20:14.3933333+00:00

    -- UPDATED --

    As of 10/23/2024 only the following models support prompt caching with Azure OpenAI:

    • o1-preview-2024-09-12 *
    • o1-mini-2024-09-12 *
    • gpt-4o-2024-05-13 **
    • gpt-4o-2024-08-06 **
    • gpt-4o-mini-2024-07-18 **
    • Only these 2 support the cached_tokens API response parameter

    "Prompt caching is enabled by default. There is no opt-out option."
    "The o1-series models are text only ..."
    "For gpt-4o and gpt-4o-mini models, prompt caching is supported for:" Messages, Images, Tool use, Structured outputs (see reference link for details)

    Reference: https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/prompt-caching

    (Thank you @saravananpalanivel-9941 for pointing out the updated list in the reference)

    -- ORIGINAL --

    Yes, as of yesterday, per Microsoft:
    Currently only the following models support prompt caching with Azure OpenAI:

    • o1-preview-2024-09-12
    • o1-mini-2024-09-12

    Reference: https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/prompt-caching

    Cc: @YutongTie-MSFT , @ Christian-7033 @TienHoang-5634, @Dexter Awoyemi , @koritadenadakddp3-4938 (sorry, I don't know how to effectively tag the other 3 [or 8] people)

    1 person found this answer helpful.
    0 comments No comments

2 additional answers

Sort by: Most helpful
  1. Saravanan Palanivel 10 Reputation points
    2024-10-24T08:43:30.15+00:00

    As of 23rd October, prompt caching is extended for following models as well

    • gpt-4o-2024-05-13
    • gpt-4o-2024-08-06
    • gpt-4o-mini-2024-07-18
    1 person found this answer helpful.

  2. Laxman R Iyer 0 Reputation points
    2025-02-25T08:02:47.1066667+00:00

    Still when i am using "model":"gpt-4o-mini-2024-07-18", cached_tokens is appearing as 0, how to enable it? I am using 1024 length of tokens as an input.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.