Network Slowness with Azure OpenAI Chat Completion

Question

Network Slowness with Azure OpenAI Chat Completion

Baghel, Praveen 0

Azure OpenAI Chat completion is experiencing significant delays in returning responses. In a production environment hosted on an Azure VM, response times are typically within a few seconds; however, today it is taking over 8 minutes to receive a response. What steps can be taken to diagnose and resolve this issue?

Pavankumar Purilla 8,335 Reputation points Microsoft External Staff Moderator

2025-06-19T05:13:18.2833333+00:00

Hi Baghel, Praveen
Did you get any chance to check the response. Thank you!
Pavankumar Purilla 8,335 Reputation points Microsoft External Staff Moderator

2025-06-20T11:34:43.6066667+00:00

Hi Baghel, Praveen
Just following up to see if you had a chance to review the above response. Thank you!

1 answer

Your answer

Pavankumar Purilla 8,335 Reputation points Microsoft External Staff Moderator

2025-06-19T05:13:18.2833333+00:00

Hi Baghel, Praveen
Did you get any chance to check the response. Thank you!
Pavankumar Purilla 8,335 Reputation points Microsoft External Staff Moderator

2025-06-20T11:34:43.6066667+00:00

Hi Baghel, Praveen
Just following up to see if you had a chance to review the above response. Thank you!

Answer 1

Hi Baghel, Praveen

To diagnose and resolve the significant delays in Azure OpenAI Chat completion responses,

Here are some steps:

Verify if there are any current outages or issues that might be affecting the Azure services. You can check the Azure status for any active events
Adjust the max_tokens parameter and consider using stop sequences to limit the response size. This can help reduce the time taken to generate responses.
Ensure that there are no network connectivity issues between your application and the Azure services that could be causing delays.
If applicable, enable streaming in your requests. This allows tokens to be returned as they are generated, potentially improving the perceived response time.
enable Application Insights to trace where the bottleneck is—whether it’s in the API call, network, or downstream processing.
Ensure that your Azure VM has adequate resources allocated. If your VM is under heavy load or if there are insufficient resources, it could lead to increased response times.
If your bot or application is performing background tasks that could interfere with response times, review and optimize these processes.
models (like gpt-3.5-turbo-1106) are slower than others. Try switching to a different version (e.g., gpt-3.5-turbo-0613) to see if performance improves.

Kindly refer below link: troubleshoot-latency

chat-completion-api-extremely-slow-and-hanging

Thank You.

Baghel, Praveen 0 Reputation points

2025-06-20T13:27:48.2+00:00

Thanks for the help. Please close this topic.
simo-k 10,495 Reputation points Volunteer Moderator

2025-06-20T20:20:27.01+00:00

Hi Baghel, Praveen

I've converted the advisor's reply into an answer.
If it solves your problem, please accept the answer.

Share via

Network Slowness with Azure OpenAI Chat Completion

1 answer

Your answer