A catalog of AI models in Microsoft Foundry that you can discover, compare, and deploy using Azure’s built‑in tools for evaluation, fine‑tuning, and inference
Hello @LOSTMSU
Thank you for reaching out.
Based on the behavior you observed, what you are seeing is currently expected for DeepSeek V4 Pro deployments in Azure AI Foundry.
At this time, prefix caching (also referred to as prompt caching) is not currently supported for DeepSeek V4 Pro or most non-Azure OpenAI Foundry models. Because of this, the service will return:
"cached_tokens": 0
even when a large portion of the prompt remains identical across requests.
This applies to both:
-
/completions -
/responses
APIs.
Currently, Azure prompt caching support is primarily available for supported Azure OpenAI GPT-series models. DeepSeek V4 Pro does not yet expose server-side token reuse/prefix caching capabilities through Azure AI Foundry.
Because of this:
- There is currently no API parameter, deployment setting, or portal configuration that enables prefix caching for DeepSeek V4 Pro.
- Reusing the same long prompt prefix will still result in full prompt token processing on each request.
-
cached_tokenswill continue to report0.
For Azure OpenAI GPT models that support prompt caching, the behavior is different. Those models can:
- reuse previously processed prompt prefixes,
- report cached token counts,
- support
prompt_cache_key, - and reduce repeated prompt-processing cost/latency for agentic or multi-turn workloads.
Regarding your scenario specifically “Without prefix caching agentic workloads and multiturn chats are unfeasible on Azure.”
We understand the concern. For long-context agentic workflows using DeepSeek models, current recommended approaches are typically application-side optimizations such as:
- maintaining conversation memory externally,
- sending only incremental/delta context,
- summarizing older turns,
- retrieval-based context injection (RAG),
- or client-side caching/orchestration.
If server-side prompt caching is a hard requirement, you may currently need to evaluate supported Azure OpenAI GPT-series models instead, where prompt caching support is available.
At this time, there is no public ETA for prompt caching support on DeepSeek V4 Pro within Azure AI Foundry.
Please refer to the following documentation for additional details:
Prompt caching overview (Azure OpenAI) https://learn.microsoft.com/azure/ai-services/openai/how-to/prompt-caching
I Hope this helps. Do let me know if you have any further queries.
If this answers your query, please do click Accept Answer and Yes for was this answer helpful.
Thank you!