Azure OpenAI service responding very slow and the performance is not consistent

Nayak, Rajesh 65 Reputation points
2023-05-03T18:04:25.16+00:00

Hi ,

We are experimenting with Azure OpenAI services. We are using GPT-3(text-davinci-003) and chatGPT for our use case but the performance of the endpoint is not consistent. For a prompt with few-shot examples the GPT-3 endpoint sometimes taking 10 secs to respond and sometime the same call takes 2.5 sec. Would you be able to guide how to enhance the performance of the model endpoints ?

Not Monitored
Not Monitored
Tag not monitored by Microsoft.
37,174 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Fabien Snauwaert 0 Reputation points
    2023-07-19T15:52:32.04+00:00

    Switched to Azure OpenAI from OpenAI's OpenAI. Using region francecentral. The measured response time went up, from an average of 2407 ms with OpenAI to 3032 ms with Azure, or an extra latency of +643 ms.

    This is with model gpt-35-turbo version 0613 for both.

    Has anyone tested this across region and models?

    I'm wondering if:

    • Some regions wouldn't be faster?
    • Would 0301 be faster than 0613?
    0 comments No comments

  2. Deaglan Mullan 0 Reputation points
    2024-01-21T00:26:47.74+00:00

    Still seems to be an issue. When performing well it is much faster than OpenAI. But it is very inconsistent. Sometimes it takes 1s. Sometimes it takes up to 30s . (West Europe). EDIT: My bad. I realize it is reaching the quota limit of 1k tokens per minute which is causing it to hang.

    0 comments No comments