Same here, it's extremely slow (tested in Sweden). Even streaming takes forever.
API for gpt-4-1106-preview extremely slow
When we do API calls for the gpt-4-1106-preview model, the average response time is around 60 seconds. When we use the chat GUI in the Azure AI studio on the same model, with the same parameters, the response takes 10 - 20 seconds? What can we do to speed up the API? We already tried to tune the temperature, max_tokens and top_p parameters and to minimize the content filters, but they all make no significant difference.
Example API call:
time curl -X POST -H "Content-Type: application/json" -H "api-key: XXX" -d '{
"messages": [
{
"role": "user",
"content": "What does a cow eat?"
}
],
"model": "gpt-4-1106-preview",
"stream": true,
"temperature": 0.7,
"frequency_penalty": 0,
"presence_penalty": 0
}' "https://XXX-sweden.openai.azure.com/openai/deployments/gpt-4-1106-preview/chat/completions?api-version=2023-09-01-preview"
.....
data: [DONE]
real 1m7,174s
user 0m0,079s
sys 0m0,024s
14 answers
Sort by: Most helpful
-
-
Srinu Tamada 5 Reputation points
2024-03-12T05:42:12.48+00:00 When we are testing before demo it took 50 seconds and on demo time it took more than 2 minutes and few times errored out. GPT 4 is giving quality response but as its taking lot of time to respond, users are not showing interest in using it.
-
Dev Intentface 5 Reputation points
2024-03-18T15:06:13.8733333+00:00 Same experience, extremely slow but I guess it is due to content filters. Any way, content filters or not, this not usable at all.
-
Brendan Kehoe (Ops) 5 Reputation points
2024-03-19T08:08:57.16+00:00 Same in UK South the 1106 preview is at least 3 times slower than the 0613 version in my testing
-
Jan H 5 Reputation points
2024-04-03T07:50:22.5133333+00:00 We are experiencing the same performance issues. Model 3.5 Turbo is not good enough in it's responses and GPT 4 is so slow users won't work with it.
MS, you have a great service in potential, but untill performance for GPT 4 is up to speed, this can't be used in any real user scenario.
PLS find a solution. This has so much potential.