Hi @Muhammed Kashif,
Thank you for reaching out to Microsoft Q&A forum.
The 30K token-per-minute limit you're encountering for GPT-4o-mini in East US is likely due to the rate limits imposed by your current Azure OpenAI pricing tier. This is a typical measure to prevent abuse and ensure fair usage. If you're on the free tier, upgrading to a higher-tier plan (like the Standard tier) can help increase your quota.
Azure OpenAI Service applies rate limits (Tokens-per-Minute or TPM) based on your region and model. You can increase your limit by selecting the Edit option on your deployment model, then adjusting the Token per Minute Rate Limit.
For further adjustments, visit the Quota section under Shared Resources in Azure OpenAI Studio, where you can request a quota increase. Make sure to check the Azure OpenAI quota management documentation for more details on rate limits.
If requests aren’t distributed evenly over a minute, you might encounter a 429 error, even if your usage is within the average rate limit. Upgrading your subscription and adjusting your quota should resolve this issue and increase your token limits.
Hope this helps. Do let us know if you any further queries.
If this answers your query, do click Accept Answer
and Yes
for was this answer helpful.