GPT-4o via Azure OpenAI timing out constantly

Matthew Hertz (London) 10 Reputation points


GPT-4o via Azure API seems almost trivially broken in a way that really surprises me. Basically when streaming is enabled, once the number of input tokens exceeds around 15K, it'll time out before producing a single token. The timeout cannot be increased.

Basically means GPT-4o, a model that should support 128k only supports around 10k. Maybe not even that!

I'm happy to share my full code used for testing this but it's not very fancy at all and does only what you'd expect it to. This is when testing US WEST and US EAST 2. Both exhibit the same issue.

Has anyone else seen this?

I'd go as far as to suggest that GPT-4o is completely broken and mostly unusable to be honest - am I missing something or is this actually genuinely the case?

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
2,472 questions
0 comments No comments
{count} vote

4 answers

Sort by: Most helpful
  1. VasaviLankipalle-MSFT 15,426 Reputation points

    Hello @Matthew Hertz (London) , Thanks for using Microsoft Q&A Platform.

    Sorry for the inconveniences that has caused. We have seen a similar issue related to Latency or Timeout issues for GPT-4o model.

    If you are facing similar issue, then please note that the product team is already aware of this and working on the fix for all the regions. Currently the ETA is June 12th.

    I hope this helps.



    -Please kindly accept the answer and vote 'yes' if you feel helpful to support the community, thanks.

    1 person found this answer helpful.

  2. Matthew Hertz (London) 10 Reputation points

    Hi @VasaviLankipalle-MSFT

    It's June 18th - US West, East and East 2 are still borderline broken for the flagship GenAI model.

    0 comments No comments

  3. Matthew Hertz (London) 10 Reputation points

    Hi @VasaviLankipalle-MSFT

    I've an update on this - we were previously using a custom content filter to reduce the false-positive rate of safe inputs being rejected (e.g., asking about financial futures contracts was considered "violent"!)

    Using a custom content filter seemingly massively increases this latency. Reverting to "default" has reduced the latency back to normal.

    Is there a way to use a custom content filter without incurring such a large latency cost?

    0 comments No comments

  4. Gleb Krivosheev 0 Reputation points

    US East2 has improved, but Sweden-central (EU's only location for this model) is unbearably slow (>100s to respond) vs 5-10s for the same request for the same model in USEAST2

    0 comments No comments