Latency and Timeout Errors for Azure OpenAI o3mini API request

Nicolas Narozniak 20 Reputation points
2025-05-19T19:35:23.55+00:00

Azure openai o3mini APIs are not working anymore. Requests result in timeout errors.

Zone : francecentral

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
4,080 questions
{count} votes

Accepted answer
  1. SriLakshmi C 6,010 Reputation points Microsoft External Staff Moderator
    2025-05-26T17:35:38.68+00:00

    Hi @Nicolas Narozniak,

    The latency issue affecting the o3-mini model in the EU region has now been mitigated. Could you please check on your end and confirm if it's resolved? Let me know if you have any further questions or need additional assistance.

    Thank you!

    1 person found this answer helpful.
    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Jerald Felix 1,475 Reputation points
    2025-05-20T04:03:10.2133333+00:00

    Hello Nicolas Narozniak,

    The latency and timeout issues you're experiencing with the Azure OpenAI o3-mini model in the France Central region are part of a broader pattern observed across multiple regions, including East US 2 and Sweden Central. These challenges have been linked to capacity constraints and architectural limitations, particularly affecting the o1 and o3-mini models.

    Known Issues and Root Causes

    • Extended Response Times: Users have reported response times exceeding 10 minutes, regardless of the reasoning_effort parameter settings.
    • Service Health Dashboard: Despite these issues, the Azure Service Health dashboard may not always reflect ongoing problems, as was the case in France Central.
    • Capacity Limitations: In regions like East US 2, similar latency problems were attributed to capacity limits. Microsoft's product team addressed this by implementing dynamic routing to alleviate timeouts.

    Recommended Actions

    Monitor Service Health: Regularly check the Azure Service Health dashboard for updates on your region.

    1. Optimize Requests:

    Reduce Prompt Complexity: Simplify prompts to decrease processing time.

    Limit Token Usage: Lower the max_tokens parameter to reduce response size.

    • Implement Streaming: Use streaming responses to receive data incrementally.
    1. Manage Request Rates: Even within quota limits, high concurrent requests can lead to throttling. Implement rate limiting to distribute requests evenly over time.
    2. Consider Alternative Regions: If feasible, deploy your application in regions with better performance metrics.
    3. Explore Provisioned Throughput: For latency-sensitive applications, consider using Provisioned Throughput Units (PTUs) to ensure consistent performance.

    Engage with Support: If issues persist, contact Azure Support to report the problem and receive assistance.

    By implementing these strategies, you can mitigate latency and timeout issues while Microsoft continues to enhance the Azure OpenAI service infrastructure.

    Best Regards,

    Jerald Felix


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.