How can I get higher rate limits on my GPT 4o Azure AI foundry deployment?

Jamie Voynow 0 Reputation points
2025-03-30T16:28:26.4933333+00:00

On my personal account I have a rate limit of millions of tokens / min, but on our Diesl.AI account (Backed by Microsoft for Startups) our rate limit is 600k / min. In my past, OpenAI directly has been very helpful with increasing my rate limits (for free). How do I get this sort of white glove support here?
Thanks.

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
4,101 questions
{count} votes

1 answer

Sort by: Most helpful
  1. SriLakshmi C 6,250 Reputation points Microsoft External Staff Moderator
    2025-03-31T10:33:34.71+00:00

    Hello Jamie Voynow,

    To request higher rate limits on your GPT-4o Azure AI Foundry deployment, you can submit a quota increase request and request a quota increase in the Azure AI Foundry.

    Companies must be registered businesses and pass domain verification to be eligible for quota limit increases. If your usage is consuming the existing quota allocation due to high demand, your request for an increase may be prioritized.

    As a Microsoft for Startups member, you have access to priority support. Sign in to the Microsoft for Startups portal and submit a request via the Guidance tab. A Startup Advisor will arrange a call to discuss your needs and assist you in the process.

    Technical Guidance & Support Overview.

    Additionally, you can enable the autoscale feature for your Azure AI services, which automatically adjusts rate limits based on real-time usage and capacity metrics to optimize performance.

    Please refer this Autoscale Azure AI limits.

    I Hope this helps. Do let me know if you have any further queries.


    If this answers your query, please do click Accept Answer and Yes for was this answer helpful.

    Thank you!

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.