Network Slowness with Azure OpenAI Chat Completion

Baghel, Praveen 0 Reputation points
2025-06-16T13:34:54.2633333+00:00

Azure OpenAI Chat completion is experiencing significant delays in returning responses. In a production environment hosted on an Azure VM, response times are typically within a few seconds; however, today it is taking over 8 minutes to receive a response. What steps can be taken to diagnose and resolve this issue?

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
4,081 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Saideep Anchuri 9,425 Reputation points Microsoft External Staff Moderator
    2025-06-18T04:21:39.0566667+00:00

    Hi Baghel, Praveen

    To diagnose and resolve the significant delays in Azure OpenAI Chat completion responses,

    Here are some steps:

    1. Verify if there are any current outages or issues that might be affecting the Azure services. You can check the Azure status for any active events
    2. Adjust the max_tokens parameter and consider using stop sequences to limit the response size. This can help reduce the time taken to generate responses.
    3. Ensure that there are no network connectivity issues between your application and the Azure services that could be causing delays.
    4. If applicable, enable streaming in your requests. This allows tokens to be returned as they are generated, potentially improving the perceived response time.
    5. enable Application Insights to trace where the bottleneck is—whether it’s in the API call, network, or downstream processing.
    6. Ensure that your Azure VM has adequate resources allocated. If your VM is under heavy load or if there are insufficient resources, it could lead to increased response times.
    7. If your bot or application is performing background tasks that could interfere with response times, review and optimize these processes.
    8. models (like gpt-3.5-turbo-1106) are slower than others. Try switching to a different version (e.g., gpt-3.5-turbo-0613) to see if performance improves.

    Kindly refer below link: troubleshoot-latency

    chat-completion-api-extremely-slow-and-hanging

    Thank You.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.