To diagnose and resolve the significant delays in Azure OpenAI Chat completion responses,
Here are some steps:
- Verify if there are any current outages or issues that might be affecting the Azure services. You can check the Azure status for any active events
- Adjust the
max_tokens
parameter and consider using stop sequences to limit the response size. This can help reduce the time taken to generate responses. - Ensure that there are no network connectivity issues between your application and the Azure services that could be causing delays.
- If applicable, enable streaming in your requests. This allows tokens to be returned as they are generated, potentially improving the perceived response time.
- enable Application Insights to trace where the bottleneck is—whether it’s in the API call, network, or downstream processing.
- Ensure that your Azure VM has adequate resources allocated. If your VM is under heavy load or if there are insufficient resources, it could lead to increased response times.
- If your bot or application is performing background tasks that could interfere with response times, review and optimize these processes.
- models (like
gpt-3.5-turbo-1106
) are slower than others. Try switching to a different version (e.g.,gpt-3.5-turbo-0613
) to see if performance improves.
Kindly refer below link: troubleshoot-latency
chat-completion-api-extremely-slow-and-hanging
Thank You.