An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
Hello @Emmanuella Onosemuode
Thank you for reaching out and for providing the detailed information regarding your production workload requirements for the gpt-image-2 model.
We understand that you are currently encountering the default RPM/concurrency limitation for gpt-image-2 in the East US 2 region and would like to increase the limit from 10 RPM to approximately 100 RPM to support high-volume commercial image generation and image editing workloads.
Based on your request:
• Model: gpt-image-2
• Region: East US 2
• Current limit: 10 RPM
• Requested limit: 100 RPM
• Use case: Production-scale image generation and image editing via API integration
Please note that RPM and concurrency limits for Azure OpenAI models are managed through Azure OpenAI quota allocation and regional backend capacity. Increases beyond the default allocation require review and approval by the Azure OpenAI engineering/quota management team.
Recommended next steps:
- Review your current usage metrics We recommend collecting usage evidence from: • Azure Monitor metrics • application logs • HTTP 429/rate-limit responses • latency/concurrency trends
Showing that your workload is consistently approaching or exceeding the current 10 RPM limit can help support the quota increase request.
- Gather deployment details Please ensure you have: • Azure OpenAI resource name/resource ID • deployment name for gpt-image-2 • target region (East US 2) • estimated production RPM/TPM requirements
Submit a quota increase request You can submit the request from:
Azure AI Foundry Portal → Management → Quota
Portal link: Azure AI Foundry Portal
Within the Quota blade: • Filter by:
Subscription
Model = gpt-image-2
Region = eastus2
• Select the quota row and choose: “Request quota”
In the request form, include: • Current RPM limit: 10 RPM • Requested RPM limit: 100 RPM • Requested concurrency increase • Business justification: “High-volume commercial image generation and image editing workloads for production API integration”
You may also use the direct quota request form: Azure OpenAI Quota Request Form
Official quota documentation: Azure OpenAI Quotas and Limits Documentation
Approval depends on:
regional GPU/model capacity availability,
subscription history,
production usage patterns,
and responsible AI/compliance review.
If East US 2 is under temporary capacity pressure, the engineering team may recommend:
phased quota increases,
alternative regions,
or additional deployments for workload distribution.
Interim optimization recommendations: While the quota request is under review, you may also consider: • batching smaller image requests when possible • implementing client-side throttling with exponential backoff and jitter • distributing workload across multiple deployments or regions if supported
Please note that quota increase reviews typically take several business days depending on regional capacity and request volume. You can monitor the request status directly from the Quota blade in the portal after submission.
Increase Token/RPM Limits in Azure OpenAI Service: https://learn.microsoft.com/azure/foundry/openai/quotas-limits#can-i-request-more-quota
Common throttling solutions & best practices: https://learn.microsoft.com/azure/ai-services/openai/how-to/quota
I Hope this helps. Do let me know if you have any further queries.
If this answers your query, please do click Accept Answer and Yes for was this answer helpful.
Thank you!