GPT-4o Deployment in West Europe - Severe Latency Issues

Guillaume Lameyse 25 Reputation points
2025-02-21T16:45:05.8533333+00:00

Hi,

We rely heavily on Azure’s GPT-4o deployment as a key component of our application. Our model deployment (August 2024 version) is hosted in the West Europe region. Typically, our API calls take:

  • 5 seconds for small requests
  • 45–90 seconds for larger requests

However, today we experienced severe performance degradation:

  • Small calls took 50 seconds instead of 5 (10x increase).
  • Large calls never returned a response at all—there were no errors, just no response.

We tested the same model using OpenAI's API, and it performed as expected, so the issue appears specific to our Azure deployment.

Additional Details:

  • We were well below our tokens per minute (TPM) and requests per minute (RPM) limits.
  • No recent changes were made to our application’s logic or request patterns.
  • This is a critical component for us, and we plan to significantly scale our usage in the coming months, so reliability is a major concern.

Questions:

  1. Is there any way to check the real-time status of our deployment?
  2. Are there known issues or regional limitations affecting West Europe?
  3. What steps can we take to ensure more stable performance and avoid similar incidents?
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,647 questions
{count} vote

Accepted answer
  1. Sina Salam 22,031 Reputation points Volunteer Moderator
    2025-02-23T17:24:38.4+00:00

    Hello Guillaume Lameyse,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that you are having issues with the GPT-4o Deployment in West Europe - Severe Latency Issues.

    I'm sorry to hear about the performance issues you're experiencing with your Azure GPT-4o deployment, this is getting common recently. You can enhance the reliability and efficiency of your Azure GPT-4o deployment with some of the steps to diagnose and improve stability:

    First, check the Azure Status page https://status.azure.com/en-us/status to monitor real-time service health. While there are no widely reported issues specific to the West Europe region, performance fluctuations may arise due to demand and infrastructure updates. You can stay informed by referring to the Azure OpenAI Service documentation.

    To enhance performance, consider monitoring and analyzing usage with Azure Monitor to detect anomalies. Optimizing requests, such as reducing token generation or enabling streaming, can significantly improve response times, as noted in this discussion similar case - https://learn.microsoft.com/en-us/answers/questions/1696807/gpt-4o-slow-to-complete-after-repeated-runs

    Also, deploying in multiple regions helps distribute workload and ensures redundancy and keeping your deployment up to date with the latest model versions, as detailed in this link: https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models , can also improve performance. If issues persist, reaching out to Azure Support via your Azure Portal will be a better.

    I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.