Prompt caching in Azure OpenAI?

Question

Prompt caching in Azure OpenAI?

Christian 185

OpenAi recently announced Prompt Caching in the API. Apparently it caches input tokens when the promt is larger than 1024 tokens.
https://openai.com/index/api-prompt-caching/

Is this feature enable in Azure OpenAI? If it's not, is there an ETA?

YutongTie-MSFT 53,966 Reputation points Moderator

2024-10-02T20:34:44.3333333+00:00

Hello @Christian

Thanks for reaching out to us, this feature is not available in Azure right now, since this feature was just released by OpenAI yesterday, there is no ETA released from Azure OpenAI side.

I can check with internally to see is there any near future plan, will let you know if there is one.

Regards,

Yutong
Tien Hoang 25 Reputation points

2024-10-15T08:26:55.9266667+00:00

Hi YutongTie-MSFT,

Please tell me know is prompt caching available on Azure or not right now?

Thank you.
Dexter Awoyemi 15 Reputation points

2024-10-20T20:30:53.8566667+00:00

Hi @YutongTie-MSFT - are there any timelines you're able to share? Thanks
Korita, Denada (K-DDP/3) 15 Reputation points

2024-10-22T11:16:51.9133333+00:00

Is there any update on this? @YutongTie-MSFT
Milkowski, Robert 25 Reputation points

2024-10-24T09:29:29.24+00:00

I still (tried this morning) don't see cached_tokens field returned for gpt-4o-2024=08-06 when querying Azure (it is there when querying OpenAI direct).
Deleted

This comment has been deleted due to a violation of our Code of Conduct. The comment was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.

Accepted answer

2 additional answers

Your answer

YutongTie-MSFT 53,966 Reputation points Moderator

2024-10-02T20:34:44.3333333+00:00

Hello @Christian

Thanks for reaching out to us, this feature is not available in Azure right now, since this feature was just released by OpenAI yesterday, there is no ETA released from Azure OpenAI side.

I can check with internally to see is there any near future plan, will let you know if there is one.

Regards,

Yutong
Tien Hoang 25 Reputation points

2024-10-15T08:26:55.9266667+00:00

Hi YutongTie-MSFT,

Please tell me know is prompt caching available on Azure or not right now?

Thank you.
Dexter Awoyemi 15 Reputation points

2024-10-20T20:30:53.8566667+00:00

Hi @YutongTie-MSFT - are there any timelines you're able to share? Thanks
Korita, Denada (K-DDP/3) 15 Reputation points

2024-10-22T11:16:51.9133333+00:00

Is there any update on this? @YutongTie-MSFT
Milkowski, Robert 25 Reputation points

2024-10-24T09:29:29.24+00:00

I still (tried this morning) don't see cached_tokens field returned for gpt-4o-2024=08-06 when querying Azure (it is there when querying OpenAI direct).
Deleted

This comment has been deleted due to a violation of our Code of Conduct. The comment was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.

Answer 1

-- UPDATED --

As of 10/23/2024 only the following models support prompt caching with Azure OpenAI:

o1-preview-2024-09-12 *
o1-mini-2024-09-12 *
gpt-4o-2024-05-13 **
gpt-4o-2024-08-06 **
gpt-4o-mini-2024-07-18 **

Only these 2 support the cached_tokens API response parameter

"Prompt caching is enabled by default. There is no opt-out option."
"The o1-series models are text only ..."
"For gpt-4o and gpt-4o-mini models, prompt caching is supported for:" Messages, Images, Tool use, Structured outputs (see reference link for details)

Reference: https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/prompt-caching

(Thank you @saravananpalanivel-9941 for pointing out the updated list in the reference)

-- ORIGINAL --

Yes, as of yesterday, per Microsoft:
Currently only the following models support prompt caching with Azure OpenAI:

o1-preview-2024-09-12
o1-mini-2024-09-12

Reference: https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/prompt-caching

Cc: @YutongTie-MSFT , @ Christian-7033 @TienHoang-5634, @Dexter Awoyemi , @koritadenadakddp3-4938 (sorry, I don't know how to effectively tag the other 3 [or 8] people)

Answer 2

Saravanan Palanivel 10

As of 23rd October, prompt caching is extended for following models as well

gpt-4o-2024-05-13
gpt-4o-2024-08-06
gpt-4o-mini-2024-07-18

Milkowski, Robert 25 Reputation points

2024-10-24T09:30:31.7266667+00:00

I still (tried this morning) don't see cached_tokens field returned for gpt-4o-2024=08-06 when querying Azure (it is there when querying OpenAI direct).
Saravanan Palanivel 10 Reputation points

2024-10-24T09:31:32.94+00:00

At this time, only o1-preview-2024-09-12 and o1-mini-2024-09-12 models support the cached_tokens API response parameter.
Vitcu, Marvin 10 Reputation points

2024-10-24T11:21:58.1+00:00

Milkowski, Robert

this is expected at the moment. The documentation mentions:

Official support for prompt caching was first added in API version 2024-10-01-preview. At this time, only o1-preview-2024-09-12 and o1-mini-2024-09-12 models support the cached_tokens API response parameter.
Milkowski, Robert 25 Reputation points

2024-10-24T11:38:38.0366667+00:00

So what do they mean by that support has been added to gpt-4o as well? It uses prompt caching but doesn't include stats in its replies?
Vitcu, Marvin 10 Reputation points

2024-10-24T11:49:40.33+00:00

Milkowski, Robert

yes. I interpreted it the same as you. Apparently prompt caching is turned on for gpt-4o, but there is no way of telling.

You don't get any information about how many tokes were cached.

Since the latency also is currently unaffected by prompt caching (based on my initial testing), it's almost impossible to verify if you use prompt caching or not.

As a current "workaround": You could use o1-mini for setting up everything so it works best with prompt caching. With o1-mini you can get information about the cached tokens. At the end replace the model with gpt-4o again.

Answer 3

Laxman R Iyer 0

Still when i am using "model":"gpt-4o-mini-2024-07-18", cached_tokens is appearing as 0, how to enable it? I am using 1024 length of tokens as an input.

Laxman R Iyer 0 Reputation points

2025-02-25T08:04:33.2233333+00:00

Pls help me in this regard.

Share via

Prompt caching in Azure OpenAI?

2 additional answers

Your answer