API for gpt-4-1106-preview extremely slow

Marijn Otte 70 Reputation points
2024-01-15T11:31:21.34+00:00

When we do API calls for the gpt-4-1106-preview model, the average response time is around 60 seconds. When we use the chat GUI in the Azure AI studio on the same model, with the same parameters, the response takes 10 - 20 seconds? What can we do to speed up the API? We already tried to tune the temperature, max_tokens and top_p parameters and to minimize the content filters, but they all make no significant difference.

Example API call:

time curl -X POST -H "Content-Type: application/json" -H "api-key: XXX" -d '{
  "messages": [
    {
      "role": "user",
      "content": "What does a cow eat?"
    }
  ],
  "model": "gpt-4-1106-preview",
  "stream": true,
  "temperature": 0.7,
  "frequency_penalty": 0,
  "presence_penalty": 0
}' "https://XXX-sweden.openai.azure.com/openai/deployments/gpt-4-1106-preview/chat/completions?api-version=2023-09-01-preview"


.....

data: [DONE]


real	1m7,174s
user	0m0,079s
sys	0m0,024s
Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
2,480 questions
{count} votes

14 answers

Sort by: Most helpful
  1. Shackles 5 Reputation points
    2024-03-11T12:13:09.53+00:00

    Same here, it's extremely slow (tested in Sweden). Even streaming takes forever.

    1 person found this answer helpful.
    0 comments No comments

  2. Srinu Tamada 5 Reputation points
    2024-03-12T05:42:12.48+00:00

    When we are testing before demo it took 50 seconds and on demo time it took more than 2 minutes and few times errored out. GPT 4 is giving quality response but as its taking lot of time to respond, users are not showing interest in using it.

    1 person found this answer helpful.
    0 comments No comments

  3. Dev Intentface 5 Reputation points
    2024-03-18T15:06:13.8733333+00:00

    Same experience, extremely slow but I guess it is due to content filters. Any way, content filters or not, this not usable at all.

    1 person found this answer helpful.
    0 comments No comments

  4. Brendan Kehoe (Ops) 5 Reputation points
    2024-03-19T08:08:57.16+00:00

    Same in UK South the 1106 preview is at least 3 times slower than the 0613 version in my testing

    1 person found this answer helpful.
    0 comments No comments

  5. Jan H 5 Reputation points
    2024-04-03T07:50:22.5133333+00:00

    We are experiencing the same performance issues. Model 3.5 Turbo is not good enough in it's responses and GPT 4 is so slow users won't work with it.

    MS, you have a great service in potential, but untill performance for GPT 4 is up to speed, this can't be used in any real user scenario.

    PLS find a solution. This has so much potential.

    1 person found this answer helpful.
    0 comments No comments