GPT-4o via Azure OpenAI timing out constantly

Question

Hi,

GPT-4o via Azure API seems almost trivially broken in a way that really surprises me. Basically when streaming is enabled, once the number of input tokens exceeds around 15K, it'll time out before producing a single token. The timeout cannot be increased.

Basically means GPT-4o, a model that should support 128k only supports around 10k. Maybe not even that!

I'm happy to share my full code used for testing this but it's not very fancy at all and does only what you'd expect it to. This is when testing US WEST and US EAST 2. Both exhibit the same issue.

Has anyone else seen this?

I'd go as far as to suggest that GPT-4o is completely broken and mostly unusable to be honest - am I missing something or is this actually genuinely the case?

Answer

Hello @Matthew Hertz (London) , Thanks for using Microsoft Q&A Platform.

Sorry for the inconveniences that has caused. We have seen a similar issue related to Latency or Timeout issues for GPT-4o model.

If you are facing similar issue, then please note that the product team is already aware of this and working on the fix for all the regions. Currently the ETA is June 12th.

I hope this helps.

Regards,

Vasavi

-Please kindly accept the answer and vote 'yes' if you feel helpful to support the community, thanks.

Answer

Hi @VasaviLankipalle-MSFT

It's June 18th - US West, East and East 2 are still borderline broken for the flagship GenAI model.

Answer

Hi @VasaviLankipalle-MSFT

I've an update on this - we were previously using a custom content filter to reduce the false-positive rate of safe inputs being rejected (e.g., asking about financial futures contracts was considered "violent"!)

Using a custom content filter seemingly massively increases this latency. Reverting to "default" has reduced the latency back to normal.

Is there a way to use a custom content filter without incurring such a large latency cost?

Answer

US East2 has improved, but Sweden-central (EU's only location for this model) is unbearably slow (>100s to respond) vs 5-10s for the same request for the same model in USEAST2

Share via

GPT-4o via Azure OpenAI timing out constantly

4 answers