-- UPDATED --
As of 10/23/2024 only the following models support prompt caching with Azure OpenAI:
-
o1-preview-2024-09-12
* -
o1-mini-2024-09-12
* -
gpt-4o-2024-05-13
** -
gpt-4o-2024-08-06
** -
gpt-4o-mini-2024-07-18
**
- Only these 2 support the
cached_tokens
API response parameter
"Prompt caching is enabled by default. There is no opt-out option."
"The o1-series models are text only ..."
"For gpt-4o
and gpt-4o-mini
models, prompt caching is supported for:" Messages, Images, Tool use, Structured outputs (see reference link for details)
Reference: https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/prompt-caching
(Thank you @saravananpalanivel-9941 for pointing out the updated list in the reference)
-- ORIGINAL --
Yes, as of yesterday, per Microsoft:
Currently only the following models support prompt caching with Azure OpenAI:
-
o1-preview-2024-09-12
-
o1-mini-2024-09-12
Reference: https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/prompt-caching
Cc: @YutongTie-MSFT , @ Christian-7033 @TienHoang-5634, @Dexter Awoyemi , @koritadenadakddp3-4938 (sorry, I don't know how to effectively tag the other 3 [or 8] people)