API for gpt-4-1106-preview extremely slow

Marijn Otte 70

When we do API calls for the gpt-4-1106-preview model, the average response time is around 60 seconds. When we use the chat GUI in the Azure AI studio on the same model, with the same parameters, the response takes 10 - 20 seconds? What can we do to speed up the API? We already tried to tune the temperature, max_tokens and top_p parameters and to minimize the content filters, but they all make no significant difference.

Example API call:

time curl -X POST -H "Content-Type: application/json" -H "api-key: XXX" -d '{
  "messages": [
    {
      "role": "user",
      "content": "What does a cow eat?"
    }
  ],
  "model": "gpt-4-1106-preview",
  "stream": true,
  "temperature": 0.7,
  "frequency_penalty": 0,
  "presence_penalty": 0
}' "https://XXX-sweden.openai.azure.com/openai/deployments/gpt-4-1106-preview/chat/completions?api-version=2023-09-01-preview"


.....

data: [DONE]


real	1m7,174s
user	0m0,079s
sys	0m0,024s

Saurabh Sharma 23,791 Reputation points Microsoft Employee

2024-01-16T04:39:01.1933333+00:00

Hi @Marijn Otte
Welcome to Microsoft Q&A! Thanks for posting the question.

Are you getting this every time you call the rest api ? I have tried this multiple times in my environment, but I am getting the results in approximately in 12 seconds.

Also, in which region your openai resource is? Thanks

Saurabh
Marijn Otte 70 Reputation points

2024-01-16T09:21:04.9566667+00:00

Hi @Saurabh Sharma ,

Thank you for your reply. Yes, I get this every time. I tried the regions Sweden Central, France Central and East US 2. East US 2 seems to be a little faster, but still always above 30 seconds.When I use the GPT-4 0613 model the response takes around 5-7 seconds, so the issue is related the 1106-preview model.
Tobi Akinyemi 5 Reputation points

2024-01-19T04:14:52.67+00:00

Same issue, extremely slow
Haas, David (Jamie) 5 Reputation points

2024-01-19T21:30:07.5733333+00:00

I am using Canada east with gpt-4-1106-preview for a company tool and the response times are typically over 1 minute. It's an awful experience for the users and I'm not sure it can be resolved.
Matthew Steele 10 Reputation points

2024-01-21T18:43:55.8966667+00:00

I have the same/even worse experience. Most short calls take 20-30s (for a handful of tokens), but requests with ~1-5k tokens of context return a timeout error after 5 minutes of waiting > 90% of the time. Unusable. Using east-us-2.
Deleted

This comment has been deleted due to a violation of our Code of Conduct. The comment was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.
Juan Camilo Parra Martinez 0 Reputation points

2024-01-22T19:13:35.82+00:00

Same issue here, GPT 4-0613 answers in 20-30 seconds, gpt-4-1106-preview answer in 1:16 minutes, almost 3-4 times, on playground times are similar.
Juan Camilo Parra Martinez 0 Reputation points

2024-01-22T19:29:13.91+00:00

If I create a new OpenAI Resource performance is normal.
dankronstal 90 Reputation points

2024-01-25T05:01:50.0366667+00:00

I'm also seeing this (reported separately, before I noticed this thread, in this post https://learn.microsoft.com/en-us/answers/questions/1511386/azure-openai-service-gpt-4-is-excessively-(and-rec. The short version is that I feel like I tested everything (model versions, APIs, time of day, service-level token consumption, payload sizes), except multi-region (which other folks here seem to have done anyway) and this is a hugely impactful service degradation. Here's my experience:

My example above is very simple, but performance is actually similar for tests involving 2500+ tokens (vs ~400 for the completed interaction illustrated) so it's not a matter of token optimization. Still a major issue today, and while Saurabh is correct that GPT-3.5 models are faster, the real issue here is that with consistent testing over the past several months the performance of GPT-4 models have degraded very seriously and recently. So something has changed on that side, in terms of service capacity.
dankronstal 90 Reputation points

2024-01-25T05:08:10.58+00:00

duplicate comment - sorry.
Wanis Elabbar 5 Reputation points

2024-03-06T09:16:32.2233333+00:00

Would like to bump this. The response of "GPT4 Turbo" or 1106-preview is extremely slow compared to the previous version of GPT4 "0613"!
Brady Begeman 25 Reputation points

2024-04-15T18:38:58.2833333+00:00

gpt-4-1106-preview (GPT-4 Turbo) is still effectively useless because the latency is on average 30 seconds or more before a response even starts. This is unacceptable for a product which is being sold as an enterprise solution. GPT-4 Turbo on ChatGPT is very quick (2 seconds or so), why is Azure OpenAI not held to the same standard of performance? These aren't new products at this point...

It's been over a month. What are you doing Microsoft?

14 answers

Shackles 5 Reputation points

2024-03-11T12:13:09.53+00:00

Same here, it's extremely slow (tested in Sweden). Even streaming takes forever.
Please sign in to rate this answer.

1 person found this answer helpful.

0 comments No comments
Sign in to comment
Srinu Tamada 5 Reputation points

2024-03-12T05:42:12.48+00:00

When we are testing before demo it took 50 seconds and on demo time it took more than 2 minutes and few times errored out. GPT 4 is giving quality response but as its taking lot of time to respond, users are not showing interest in using it.
Please sign in to rate this answer.

1 person found this answer helpful.

0 comments No comments
Sign in to comment
Dev Intentface 5 Reputation points

2024-03-18T15:06:13.8733333+00:00

Same experience, extremely slow but I guess it is due to content filters. Any way, content filters or not, this not usable at all.
Please sign in to rate this answer.

1 person found this answer helpful.

0 comments No comments
Sign in to comment
Brendan Kehoe (Ops) 5 Reputation points

2024-03-19T08:08:57.16+00:00

Same in UK South the 1106 preview is at least 3 times slower than the 0613 version in my testing
Please sign in to rate this answer.

1 person found this answer helpful.

0 comments No comments
Sign in to comment
Jan H 5 Reputation points

2024-04-03T07:50:22.5133333+00:00

We are experiencing the same performance issues. Model 3.5 Turbo is not good enough in it's responses and GPT 4 is so slow users won't work with it.

MS, you have a great service in potential, but untill performance for GPT 4 is up to speed, this can't be used in any real user scenario.

PLS find a solution. This has so much potential.
Please sign in to rate this answer.

1 person found this answer helpful.

0 comments No comments
Sign in to comment

Share via

API for gpt-4-1106-preview extremely slow

14 answers