Vitcu, Marvin Greetings & Welcome to Microsoft Q&A forum!
I understand your concern. Prompt caching allows you to reduce overall request latency
For these tests, I used a range of input sizes, from 10k to 100k tokens per request. However, repeating the same user request multiple times (in order to leverage the prompt caching), did not lead to faster response times.
As per my understanding, Prompts that haven't been used recently are automatically removed from the cache. To minimize evictions, maintain a consistent stream of requests with the same prompt prefix.
Also, see Performance and latency and follow the best practices to improve the latency.
In all the models I tested, there were no improvements in latency. The time it took to generate an answer remained consistent.
Unfortunately, I haven't found a way to check this.
Do you have any samples that shows the difference? Did you try with different prompts?