Solutions Chimera Greetings & Welcome to Microsoft Q&A forum!
We are going to launch our MVP to 500 test users, which runs with openAI gpt-4o azure deployment.
It has a max 450k token rate limit / minute, which will be not enough for us on more busy hours, as we going to have 500 users for testing purposes, and one single call for us because of the usecase is around 15k token.
So it is roughly 30 call/minute, we would like to increase it to 500 call/minute which would be 7.5M token per minute limit.
At least for a month between january 17 and february 17, as that will be our UAT.
I understand that you are looking for details on increasing the rate limit quotas.
In addition to what Anthony mentioned, Quota increase requests can be submitted from the Quotas page in the Azure AI Foundry portal. Due to high demand, quota increase requests are being accepted and will be filled in the order they're received. Priority is given to customers who generate traffic that consumes the existing quota allocation, and your request might be denied if this condition isn't met.
For other rate limits, as mentioned by Anthony, submit a service request.
Do let me know if that helps or have any other queries.
If the response helped, please do click Accept Answer
and Yes
for was this answer helpful.
Doing so would help other community members with similar issue identify the solution. I highly appreciate your contribution to the community.