Rate Limit on GPT-4o-0806 is broken

Markus von Staden 25 Reputation points
2025-02-20T12:49:14.19+00:00

We have GPT-4o-0513 deployed in production and we want to test version 0806. Even with the same rate limit as our production deployment, I immediately receive a "rate limit exceeded" error. I am the only one testing and using this deployment, with a rate limit of 100 TPM and 600 RPM. Everything works fine if I switch the deployment to version 0315, which leads me to believe that the rate limit in Azure is broken.

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,808 questions
0 comments No comments
{count} vote

Accepted answer
  1. Saideep Anchuri 4,115 Reputation points Microsoft External Staff
    2025-02-23T11:04:37.27+00:00

    Hi Markus von Staden

    I understand that you are encountering an issue, with the rate limits on GPT-4o-0806.

    Here are some steps:

    1. Verify that the rate limits for GPT-4o-0806 are set up correctly in your Azure portal.
    2. Review your quota allocations in the Azure OpenAI studio. Quotas are distributed based on the region and the model, and the new version might have different quota settings.
    3. Please check other available region of GPT 40 for lower latency.

    Reference thread: rate_limit_exceeded

    Kindly refer below link: Service quotas and limits

    Thank You.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.