API for gpt-4-1106-preview extremely slow

Marijn Otte 70

When we do API calls for the gpt-4-1106-preview model, the average response time is around 60 seconds. When we use the chat GUI in the Azure AI studio on the same model, with the same parameters, the response takes 10 - 20 seconds? What can we do to speed up the API? We already tried to tune the temperature, max_tokens and top_p parameters and to minimize the content filters, but they all make no significant difference.

Example API call:

time curl -X POST -H "Content-Type: application/json" -H "api-key: XXX" -d '{
  "messages": [
    {
      "role": "user",
      "content": "What does a cow eat?"
    }
  ],
  "model": "gpt-4-1106-preview",
  "stream": true,
  "temperature": 0.7,
  "frequency_penalty": 0,
  "presence_penalty": 0
}' "https://XXX-sweden.openai.azure.com/openai/deployments/gpt-4-1106-preview/chat/completions?api-version=2023-09-01-preview"


.....

data: [DONE]


real	1m7,174s
user	0m0,079s
sys	0m0,024s

Saurabh Sharma 23,851 Reputation points Microsoft Employee Moderator

2024-01-16T04:39:01.1933333+00:00

Hi @Marijn Otte
Welcome to Microsoft Q&A! Thanks for posting the question.

Are you getting this every time you call the rest api ? I have tried this multiple times in my environment, but I am getting the results in approximately in 12 seconds.

Also, in which region your openai resource is? Thanks

Saurabh
Marijn Otte 70 Reputation points

2024-01-16T09:21:04.9566667+00:00

Hi @Saurabh Sharma ,

Thank you for your reply. Yes, I get this every time. I tried the regions Sweden Central, France Central and East US 2. East US 2 seems to be a little faster, but still always above 30 seconds.When I use the GPT-4 0613 model the response takes around 5-7 seconds, so the issue is related the 1106-preview model.
Tobi Akinyemi 5 Reputation points

2024-01-19T04:14:52.67+00:00

Same issue, extremely slow
Haas, David (Jamie) 5 Reputation points

2024-01-19T21:30:07.5733333+00:00

I am using Canada east with gpt-4-1106-preview for a company tool and the response times are typically over 1 minute. It's an awful experience for the users and I'm not sure it can be resolved.
Matthew Steele 10 Reputation points

2024-01-21T18:43:55.8966667+00:00

I have the same/even worse experience. Most short calls take 20-30s (for a handful of tokens), but requests with ~1-5k tokens of context return a timeout error after 5 minutes of waiting > 90% of the time. Unusable. Using east-us-2.
Deleted

This comment has been deleted due to a violation of our Code of Conduct. The comment was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.
Juan Camilo Parra Martinez 0 Reputation points

2024-01-22T19:13:35.82+00:00

Same issue here, GPT 4-0613 answers in 20-30 seconds, gpt-4-1106-preview answer in 1:16 minutes, almost 3-4 times, on playground times are similar.
Juan Camilo Parra Martinez 0 Reputation points

2024-01-22T19:29:13.91+00:00

If I create a new OpenAI Resource performance is normal.
dankronstal 90 Reputation points

2024-01-25T05:01:50.0366667+00:00

I'm also seeing this (reported separately, before I noticed this thread, in this post https://learn.microsoft.com/en-us/answers/questions/1511386/azure-openai-service-gpt-4-is-excessively-(and-rec. The short version is that I feel like I tested everything (model versions, APIs, time of day, service-level token consumption, payload sizes), except multi-region (which other folks here seem to have done anyway) and this is a hugely impactful service degradation. Here's my experience:

My example above is very simple, but performance is actually similar for tests involving 2500+ tokens (vs ~400 for the completed interaction illustrated) so it's not a matter of token optimization. Still a major issue today, and while Saurabh is correct that GPT-3.5 models are faster, the real issue here is that with consistent testing over the past several months the performance of GPT-4 models have degraded very seriously and recently. So something has changed on that side, in terms of service capacity.
dankronstal 90 Reputation points

2024-01-25T05:08:10.58+00:00

duplicate comment - sorry.
WanisElabbar-4383 205 Reputation points

2024-03-06T09:16:32.2233333+00:00

Would like to bump this. The response of "GPT4 Turbo" or 1106-preview is extremely slow compared to the previous version of GPT4 "0613"!
Brady Begeman 25 Reputation points

2024-04-15T18:38:58.2833333+00:00

gpt-4-1106-preview (GPT-4 Turbo) is still effectively useless because the latency is on average 30 seconds or more before a response even starts. This is unacceptable for a product which is being sold as an enterprise solution. GPT-4 Turbo on ChatGPT is very quick (2 seconds or so), why is Azure OpenAI not held to the same standard of performance? These aren't new products at this point...

It's been over a month. What are you doing Microsoft?

15 answers

Geibig Olaf (PS-SC/PAE) 10 Reputation points

2024-03-12T16:53:11.1433333+00:00

Same for me. My RAG app that I'm using in my project became unusable. GPT-3.5 is not an option because the quality is much worse. It used to be fine last year but some time in January it started to degrade. If this isn't fixed, I need to explore other LLM options which isn't easy as my employer has strict compliance requirements. Setting the max_tokens to a low value helps but my RAG app does not allow me to set this parameter.
Please sign in to rate this answer.

2 people found this answer helpful.

0 comments No comments
Sign in to comment

Use comments to ask for clarification, additional information, or improvements to the question.
Sebastian Scott 10 Reputation points

2024-03-18T13:19:10.4566667+00:00

The same is for us. we are considering moving to a different model provider bc the long latency is straining the usage..
Please sign in to rate this answer.

2 people found this answer helpful.

0 comments No comments
Sign in to comment

Use comments to ask for clarification, additional information, or improvements to the question.
Jack 10 Reputation points

2024-02-26T19:45:09.4+00:00

Same here, please address the issue.
Please sign in to rate this answer.

1 person found this answer helpful.

0 comments No comments
Sign in to comment

Use comments to ask for clarification, additional information, or improvements to the question.
oh john 5 Reputation points

2024-03-07T14:43:31.98+00:00

Same here. extremely slow and unusable.
Please sign in to rate this answer.

1 person found this answer helpful.

0 comments No comments
Sign in to comment

Use comments to ask for clarification, additional information, or improvements to the question.
Martijn Muurman 5 Reputation points

2024-03-08T07:32:17.2+00:00

I can confirm this as well. Same prompt using OpenAI directly is a few seconds. Op Azure I get timeouts exceeding 100 seconds. Using both gpt4 preview versions
Please sign in to rate this answer.

1 person found this answer helpful.

0 comments No comments
Sign in to comment

Use comments to ask for clarification, additional information, or improvements to the question.

Share via

API for gpt-4-1106-preview extremely slow

15 answers

Your answer