Prompt caching works for OpenAI model o1-preview but not gpt-4o

Question

Prompt caching works for OpenAI model o1-preview but not gpt-4o

Glenn Wright 40

According this documentation:

https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/prompt-caching

...support for prompt caching was added in the 2024-10-01-preview API version, for several models including gpt-4o-2024-08-06 and o1-preview-2024-09-12. I tested sending long, repeated prompts to both of these models, and the response had the expected "prompt_token_details" property for the o1-preview model but not for the gpt-4o model. Is there an ETA for when this feature will be available for gpt-4o?

James Hamil 27,221 Reputation points Microsoft Employee Moderator

2024-11-13T18:06:27.66+00:00

Hi @Glenn Wright , we are investigating your issue and will update you shortly.

Best,

James
VasaviLankipalle-MSFT 18,676 Reputation points Moderator

2024-11-13T19:36:32.2966667+00:00

Hello @Glenn Wright , Thanks for using Microsoft Q&A Platform.

Official support for prompt caching was first added in API version 2024-10-01-preview. At this time, only o1-preview-2024-09-12 and o1-mini-2024-09-12 models support the cached_tokens API response parameter.

Please check the what's new page and the respective documentation for the latest updates.
Glenn Wright 40 Reputation points

2024-11-13T19:58:54.31+00:00
I'm confused about the difference between "Supported models" and "API support":

~

Supported models

Currently only the following models support prompt caching with Azure OpenAI:

o1-preview-2024-09-12

o1-mini-2024-09-12

gpt-4o-2024-05-13

gpt-4o-2024-08-06

gpt-4o-mini-2024-07-18

API support

Official support for prompt caching was first added in API version 2024-10-01-preview. At this time, only o1-preview-2024-09-12 and o1-mini-2024-09-12 models support the cached_tokens API response parameter.

~

Does this mean that the gpt-4o-2024-08-06 model is doing prompt caching under the hood, but that it's not indicating it in a way that's visible to the user because the API response parameter is not supported?
VasaviLankipalle-MSFT 18,676 Reputation points Moderator

2024-11-13T20:29:12.5466667+00:00

Hello @Glenn Wright , please allow sometime will check internally and get back to you on this.

Accepted answer

1 additional answer

Your answer

James Hamil 27,221 Reputation points Microsoft Employee Moderator

2024-11-13T18:06:27.66+00:00

Hi @Glenn Wright , we are investigating your issue and will update you shortly.

Best,

James
VasaviLankipalle-MSFT 18,676 Reputation points Moderator

2024-11-13T19:36:32.2966667+00:00

Hello @Glenn Wright , Thanks for using Microsoft Q&A Platform.

Official support for prompt caching was first added in API version 2024-10-01-preview. At this time, only o1-preview-2024-09-12 and o1-mini-2024-09-12 models support the cached_tokens API response parameter.

Please check the what's new page and the respective documentation for the latest updates.
Glenn Wright 40 Reputation points

2024-11-13T19:58:54.31+00:00

I'm confused about the difference between "Supported models" and "API support":

~

Supported models

Currently only the following models support prompt caching with Azure OpenAI:

o1-preview-2024-09-12

o1-mini-2024-09-12

gpt-4o-2024-05-13

gpt-4o-2024-08-06

gpt-4o-mini-2024-07-18

API support

Official support for prompt caching was first added in API version 2024-10-01-preview. At this time, only o1-preview-2024-09-12 and o1-mini-2024-09-12 models support the cached_tokens API response parameter.

~

Does this mean that the gpt-4o-2024-08-06 model is doing prompt caching under the hood, but that it's not indicating it in a way that's visible to the user because the API response parameter is not supported?
VasaviLankipalle-MSFT 18,676 Reputation points Moderator

2024-11-13T20:29:12.5466667+00:00

Hello @Glenn Wright , please allow sometime will check internally and get back to you on this.

Answer 1

Hello @Glenn Wright , I agree with the answer provided by Daniel FANG.

"Does this mean that the gpt-4o-2024-08-06 model is doing prompt caching under the hood" -> yes "but that it's not indicating it in a way that's visible to the user because the API response parameter is not supported?" -> correct, because that version of API does not have cached_tokens

However, as mentioned earlier,

Official support for prompt caching was first added in API version 2024-10-01-preview. At this time, only o1-preview-2024-09-12 and o1-mini-2024-09-12 models support the cached_tokens API response parameter.

We don't have any ETA details regarding GPT-4o model support cached_tokens API response parameter.

Please check the what's new page and the respective documentation for the latest updates.

I hope this helps.

Regards,

Vasavi

Please kindly accept the answer and vote 'yes' if you feel helpful to support the community, thanks.

Answer 2

Hey Glenn

In your link, last paragraph in the Getting started section, it says: "Prompt caching is enabled by default with no additional configuration needed for supported models." also "Prompt caching is enabled by default. There is no opt-out option."

https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/prompt-caching

So I think the answer to your question is:
"Does this mean that the gpt-4o-2024-08-06 model is doing prompt caching under the hood" -> yes
"but that it's not indicating it in a way that's visible to the user because the API response parameter is not supported?" -> correct, because that version of API does not have cached_tokens

Share via

Prompt caching works for OpenAI model o1-preview but not gpt-4o

1 additional answer

Your answer