Hello @Glenn Wright , I agree with the answer provided by Daniel FANG.
"Does this mean that the gpt-4o-2024-08-06 model is doing prompt caching under the hood" -> yes "but that it's not indicating it in a way that's visible to the user because the API response parameter is not supported?" -> correct, because that version of API does not have
cached_tokens
However, as mentioned earlier,
Official support for prompt caching was first added in API version
2024-10-01-preview
. At this time, onlyo1-preview-2024-09-12
ando1-mini-2024-09-12
models support thecached_tokens
API response parameter.
We don't have any ETA details regarding GPT-4o model support cached_tokens
API response parameter.
Please check the what's new page and the respective documentation for the latest updates.
I hope this helps.
Regards,
Vasavi
Please kindly accept
the answer and vote 'yes
' if you feel helpful to support the community, thanks.