Dramatic latency increase after 10, Oct for OpenAI GPT-4 32k

uify 20

We originally shifted from OpenAI's API to the Azure-hosted OpenAI models because of a significantly lower latency with almost real-time usability (ranging from a couple 100s ms to 2s). But starting from roughly 10, Oct the latency increased almost six-fold on average which make some of our features unusable. We essentially need to rethink the product for some features to avoid running into UX issues. Our instance is hosted in Switzerland.

Any idea why this is? Anything that changed or any settings that were added that we can modify to speed it up again? Or is it solely because of increasing demand for the service (which is unlikely considering the step-like jump)?

YutongTie-MSFT 48,586 Reputation points

2023-10-22T10:57:22.53+00:00

@uify Thanks for reaching out to us, may I know which region you are in so that we can investigate it further?

Regards,

Yutong
uify 20 Reputation points

2023-10-23T06:37:46.5566667+00:00

We are in Western Europe, Germany. Instance is in Switzerland.
Thanks in advance!
uify 20 Reputation points

2023-10-23T06:42:45.9766667+00:00

We are in Western Europe, Germany. The instance is in Switzerland.
Thanks in advance!
malik woods 5 Reputation points

2023-10-24T10:25:32.4766667+00:00

same when using sweden central region. Seeing latencies of 50+ seconds compared to 28 seconds when using openai's API
YutongTie-MSFT 48,586 Reputation points

2023-10-24T19:10:29.4333333+00:00

@uify @malik woods Thanks for your reporting, could you please share a screenshot of the latency so that we can take a look? You can find it in the Azure portal -> your OpenAI resource -> Monitoring -> Metrics,

Also, could you please provide prompt size, max token set? Thanks a lot.
uify 20 Reputation points

2023-10-30T16:13:13.6466667+00:00
Apologies for the late reply!
I attached screenshots of:

Average processed prompt tokens

Average generated completion tokens

Latency

Hope that helps.
Pedro Bento de Faria 0 Reputation points

2023-11-02T16:32:27.5733333+00:00

We are facing the same issue. In the last few days, GPT 4 latency has increased to the point of negating the functionality of our service. We also moved to Azure from the Open AI API because of throttling issues and we are extremely disappointed to see that this problem has become just as bad, if not worse, in Azure. Our region is Canada.
Rubén Fernández Isla 0 Reputation points

2023-12-12T10:17:47.36+00:00

@uify Did you find an explanation for that latency increase or a possible solution? We are facing the same problem over the last few weeks.
Kress Alexander 0 Reputation points

2023-12-19T07:54:00.8933333+00:00

Hallo wir haben das gleiche Problem mit Schweden. Hat hier jemand schon eine Lösung ?
uify 20 Reputation points

2023-12-19T08:35:30.09+00:00

We moved part of our processing to another service and model because latency was much more stable. Complex tasks we still use GPT-4 for but we preprocess a lot of the stuff in anticipation of the request. Really took a long time to rethink the architecture but worth it to not be reliant on low latency since there are no guarantees and it seems to be something that Microsoft doesn't take serious enough.

Also always benchmark it with OpenAI's endpoints if you need to rely on their models and do not need to use an Azure-hosted instance because for some models there is continuously lower latency. There could be multiple reasons why latency is high but I suppose the demand for the models just increased over time and during peak hours jobs are queued since it is not economically feasible to optimize the architecture for latency but for utilization.

Maybe you can take this up again @YutongTie-MSFT to shed a bit more light on what's going on?

Share via

Dramatic latency increase after 10, Oct for OpenAI GPT-4 32k