Performance Issue with Azure DeepSeek Model – High Response Time

Question

Performance Issue with Azure DeepSeek Model – High Response Time

sivasankar 5

We have created and deployed the Azure DeepSeek model following the guidelines provided in this documentation link. While the model is functional, we are experiencing significant latency issues. A single query takes more than one minute to generate a response, which is impacting our use case.

Deployment Details:

Region: East US
Model: DeepSeek-R1

Could you please assist us in identifying the cause of this issue and suggest possible optimizations to improve response time and reduce latency?

Looking forward to your support.

2 answers

Your answer

Answer 1

Pavankumar Purilla 8,335 Microsoft External Staff Moderator

Hi sivasankar,
Greetings & Welcome to Microsoft Q&A forum! Thanks for posting your query!

Setting max_tokens to a lower value, like 800, helps the model generate shorter responses, reducing processing time. This improves response speed while keeping the output useful and relevant.

Enabling response streaming (stream=True) lets the model send parts of the response as soon as they are ready. This makes interactions feel faster since users don’t have to wait for the full response.

Using a shorter input prompt means the model processes less text, which speeds up response time. Keeping prompts clear and concise helps the model focus on what’s important and respond more quickly.
I hope this information helps.

Pavankumar Purilla 8,335 Reputation points Microsoft External Staff Moderator

2025-02-11T15:20:53.4933333+00:00

Hi sivasankar,
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution, please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Answer 2

Hi @Pavankumar Purilla ,

Thank you for your response. I've noticed that when using Deepseek, the response time is noticeably slower compared to other Azure models like GPT-4, even when using the same max tokens and streaming enabled.

While the model eventually produces proper responses, the speed seems to be lagging. Is there any specific reason behind this difference in performance? Are there any configuration changes or optimizations that could be made to improve Deepseek's response time?

Share via

Performance Issue with Azure DeepSeek Model – High Response Time

2 answers

Your answer