Azure OpenAI in Microsoft Foundry Models quotas and limits in Azure Government

This article contains a quick reference and a detailed description of the quotas and limits for Azure OpenAI in Azure Government.

Scope of quota

Quotas and limits aren't enforced at the tenant level. Instead, the highest level of quota restrictions is scoped at the Azure subscription level.

Regional quota allocation

Tokens per minute (TPM) and requests per minute (RPM) limits are defined per region, per subscription, and per model or deployment type.

For example, if the gpt-4.1 DataZone Standard model is listed with a quota of 5 million TPM and 5,000 RPM, then each region where that model or deployment type is available has its own dedicated quota pool of that amount for each of your Azure subscriptions. Within a single Azure subscription, it's possible to use a larger quantity of total TPM and RPM quota for a given model and deployment type, as long as you have resources and model deployments spread across multiple regions.

Quota tiers

In Azure Government, we don't support Quota Tiers or automatic adjustments to quota. Instead, we provide two levels including a Default level and an Enterprise level for customers with an Enterprise Agreement.

Can I request more quota?

Yes, using the Azure Gov Quota Request Form you can always request more quota. If the request is approved, the current tier will remain the same, but with more quota assigned.

DataZone Standard Tokens Per Minute (TPM)

Model Name	Default TPM	Enterprise TPM
gpt-5.1	300,000	1,000,000
gpt4.1	300,000	2,000,000
gpt-4.1-mini	300,000	2,000,000
gpt-4o	300,000	10,000,000
o3-mini	200,000	200,000

Standard Tokens Per Minute (TPM)

Model Name	Default TPM	Enterprise TPM
gpt4.1	300,000	2,000,000
gpt-4.1-mini	300,000	2,000,000
gpt-4o	80,000	80,000
text-embedding-3-large	350,000	350,000
text-embedding-3-small	350,000	350,000
text-embedding-ada-002	350,000	350,000

General best practices to remain within rate limits

To minimize issues related to rate limits, it's a good idea to use the following techniques:

Implement retry logic in your application.
Avoid sharp changes in the workload. Increase the workload gradually.
Test different load increase patterns.
Increase the quota assigned to your deployment. Move quota from another deployment, if necessary.

Regional quota capacity limits

You can view quota availability by region for your subscription in the Foundry portal.

To view quota capacity by region for a specific model or version, you can query the capacity API for your subscription. Provide a subscriptionId, model_name, and model_version and the API returns the available capacity for that model across all regions and deployment types for your subscription.

Note

Currently, both the Foundry portal and the capacity API return quota/capacity information for models that are retired and no longer available.

Explore how to manage quota for your Azure OpenAI deployments.
Learn more about the underlying models that power Azure OpenAI.

Feedback

Was this page helpful?

Last updated on 2026-04-20