Same for me. My RAG app that I'm using in my project became unusable. GPT-3.5 is not an option because the quality is much worse. It used to be fine last year but some time in January it started to degrade. If this isn't fixed, I need to explore other LLM options which isn't easy as my employer has strict compliance requirements. Setting the max_tokens to a low value helps but my RAG app does not allow me to set this parameter.
API for gpt-4-1106-preview extremely slow
When we do API calls for the gpt-4-1106-preview model, the average response time is around 60 seconds. When we use the chat GUI in the Azure AI studio on the same model, with the same parameters, the response takes 10 - 20 seconds? What can we do to speed up the API? We already tried to tune the temperature, max_tokens and top_p parameters and to minimize the content filters, but they all make no significant difference.
Example API call:
time curl -X POST -H "Content-Type: application/json" -H "api-key: XXX" -d '{
"messages": [
{
"role": "user",
"content": "What does a cow eat?"
}
],
"model": "gpt-4-1106-preview",
"stream": true,
"temperature": 0.7,
"frequency_penalty": 0,
"presence_penalty": 0
}' "https://XXX-sweden.openai.azure.com/openai/deployments/gpt-4-1106-preview/chat/completions?api-version=2023-09-01-preview"
.....
data: [DONE]
real 1m7,174s
user 0m0,079s
sys 0m0,024s
15 answers
Sort by: Most helpful
-
-
Sebastian Scott 10 Reputation points
2024-03-18T13:19:10.4566667+00:00 The same is for us. we are considering moving to a different model provider bc the long latency is straining the usage..
-
Jack 10 Reputation points
2024-02-26T19:45:09.4+00:00 Same here, please address the issue.
-
oh john 5 Reputation points
2024-03-07T14:43:31.98+00:00 Same here. extremely slow and unusable.
-
Martijn Muurman 5 Reputation points
2024-03-08T07:32:17.2+00:00 I can confirm this as well. Same prompt using OpenAI directly is a few seconds. Op Azure I get timeouts exceeding 100 seconds. Using both gpt4 preview versions