Догађаји
Изградите интелигентне апликације
17. мар 21 - 21. мар 10
Придружите се серији састанака како бисте изградили скалабилна АИ решења заснована на стварним случајевима коришћења са колегама програмерима и стручњацима.
Региструјте се одмахОвај прегледач више није подржан.
Надоградите на Microsoft Edge бисте искористили најновије функције, безбедносне исправке и техничку подршку.
This article contains a quick reference and a detailed description of the quotas and limits for Azure OpenAI in Azure AI services.
The following sections provide you with a quick guide to the default quotas and limits that apply to Azure OpenAI:
Limit Name | Limit Value |
---|---|
Azure OpenAI resources per region per Azure subscription | 30 |
Default DALL-E 2 quota limits | 2 concurrent requests |
Default DALL-E 3 quota limits | 2 capacity units (6 requests per minute) |
Default Whisper quota limits | 3 requests per minute |
Maximum prompt tokens per request | Varies per model. For more information, see Azure OpenAI Service models |
Max Standard deployments per resource | 32 |
Max fine-tuned model deployments | 5 |
Total number of training jobs per resource | 100 |
Max simultaneous running training jobs per resource | 1 |
Max training jobs queued | 20 |
Max Files per resource (fine-tuning) | 50 |
Total size of all files per resource (fine-tuning) | 1 GB |
Max training job time (job will fail if exceeded) | 720 hours |
Max training job size (tokens in training file) x (# of epochs) | 2 Billion |
Max size of all files per upload (Azure OpenAI on your data) | 16 MB |
Max number or inputs in array with /embeddings |
2048 |
Max number of /chat/completions messages |
2048 |
Max number of /chat/completions functions |
128 |
Max number of /chat completions tools |
128 |
Maximum number of Provisioned throughput units per deployment | 100,000 |
Max files per Assistant/thread | 10,000 when using the API or Azure AI Foundry portal. In Azure OpenAI Studio the limit was 20. |
Max file size for Assistants & fine-tuning | 512 MB 200 MB via Azure AI Foundry portal |
Max size for all uploaded files for Assistants | 100 GB |
Assistants token limit | 2,000,000 token limit |
GPT-4o max images per request (# of images in the messages array/conversation history) | 50 |
GPT-4 vision-preview & GPT-4 turbo-2024-04-09 default max tokens |
16 Increase the max_tokens parameter value to avoid truncated responses. GPT-4o max tokens defaults to 4096. |
Max number of custom headers in API requests1 | 10 |
Message character limit | 1048576 |
Message size for audio files | 20 MB |
1 Our current APIs allow up to 10 custom headers, which are passed through the pipeline, and returned. Some customers now exceed this header count resulting in HTTP 431 errors. There's no solution for this error, other than to reduce header volume. In future API versions we will no longer pass through custom headers. We recommend customers not depend on custom headers in future system architectures.
Region | o1-mini | o1 | GPT-4 | GPT-4-32K | GPT-4-Turbo | GPT-4-Turbo-V | gpt-4o | gpt-4o-mini | GPT-35-Turbo | GPT-35-Turbo-Instruct | o1-mini - GlobalStandard | o1 - GlobalStandard | gpt-4o - GlobalStandard | gpt-4o-mini - GlobalStandard | GPT-4-Turbo - GlobalStandard | GPT-4o - Global-Batch | GPT-4o-mini - Global-Batch | GPT-4 - Global-Batch | GPT-4-Turbo - Global-Batch | gpt-35-turbo - Global-Batch | Text-Embedding-Ada-002 | text-embedding-3-small | text-embedding-3-large | GPT-4o - finetune | GPT-4o-mini - finetune | GPT-4 - finetune | Babbage-002 | Babbage-002 - finetune | Davinci-002 | Davinci-002 - finetune | GPT-35-Turbo - finetune | GPT-35-Turbo-1106 - finetune | GPT-35-Turbo-0125 - finetune |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
australiaeast | - | - | 40 K | 80 K | 80 K | 30 K | - | - | 300 K | - | - | - | 30 M | 50 M | 2 M | - | - | - | - | - | 350 K | - | - | - | - | - | - | - | - | - | - | - | - |
brazilsouth | - | - | - | - | - | - | - | - | - | - | - | - | 30 M | 50 M | 2 M | - | - | - | - | - | 350 K | - | - | - | - | - | - | - | - | - | - | - | - |
canadaeast | - | - | 40 K | 80 K | 80 K | - | - | - | 300 K | - | - | - | 30 M | 50 M | 2 M | - | - | - | - | - | 350 K | 350 K | 350 K | - | - | - | - | - | - | - | - | - | - |
eastus | 1 M | 600 K | - | - | 80 K | - | 1 M | 2 M | 240 K | 240 K | 50 M | 30 M | 30 M | 50 M | 2 M | 5 B | 15 B | 150 M | 300 M | 10 B | 240 K | 350 K | 350 K | - | - | - | - | - | - | - | - | - | - |
eastus2 | 1 M | 600 K | - | - | 80 K | - | 1 M | 2 M | 300 K | - | 50 M | 30 M | 30 M | 50 M | 2 M | - | - | - | - | - | 350 K | 350 K | 350 K | 250 K | - | - | - | - | - | - | 250 K | 250 K | 250 K |
francecentral | - | - | 20 K | 60 K | 80 K | - | - | - | 240 K | - | - | - | 30 M | 50 M | 2 M | - | - | - | - | - | 240 K | - | 350 K | - | - | - | - | - | - | - | - | - | - |
germanywestcentral | - | - | - | - | - | - | - | - | - | - | - | - | 30 M | 50 M | 2 M | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
japaneast | - | - | - | - | - | 30 K | - | - | 300 K | - | - | - | 30 M | 50 M | 2 M | - | - | - | - | - | 350 K | 350 K | 350 K | - | - | - | - | - | - | - | - | - | - |
koreacentral | - | - | - | - | - | - | - | - | - | - | - | - | 30 M | 50 M | 2 M | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
northcentralus | 1 M | 600 K | - | - | 80 K | - | 1 M | 2 M | 300 K | - | 50 M | 30 M | 30 M | 50 M | 2 M | - | - | - | - | - | 350 K | - | - | 250 K | 500 K | 100 K | 240 K | 250 K | 240 K | 250 K | 250 K | 250 K | 250 K |
norwayeast | - | - | - | - | 150 K | - | - | - | - | - | - | - | 30 M | 50 M | 2 M | - | - | - | - | - | 350 K | - | 350 K | - | - | - | - | - | - | - | - | - | - |
polandcentral | - | - | - | - | - | - | - | - | - | - | - | - | 30 M | 50 M | 2 M | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
southafricanorth | - | - | - | - | - | - | - | - | - | - | - | - | 30 M | 50 M | 2 M | - | - | - | - | - | 350 K | - | - | - | - | - | - | - | - | - | - | - | - |
southcentralus | 1 M | 600 K | - | - | 80 K | - | 1 M | 2 M | 240 K | - | 50 M | 30 M | 30 M | 50 M | 2 M | - | - | - | - | - | 240 K | - | - | - | - | - | - | - | - | - | - | - | - |
southindia | - | - | - | - | 150 K | - | - | - | 300 K | - | - | - | 30 M | 50 M | 2 M | - | - | - | - | - | 350 K | - | 350 K | - | - | - | - | - | - | - | - | - | - |
spaincentral | - | - | - | - | - | - | - | - | - | - | - | - | 30 M | 50 M | 2 M | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
swedencentral | 1 M | 600 K | 40 K | 80 K | 150 K | 30 K | 1 M | 2 M | 300 K | 240 K | 50 M | 30 M | 30 M | 50 M | 2 M | 5 B | 15 B | 150 M | 300 M | 10 B | 350 K | - | 350 K | 250 K | 500 K | 100 K | 240 K | 250 K | 240 K | 250 K | 250 K | 250 K | 250 K |
switzerlandnorth | - | - | 40 K | 80 K | - | 30 K | - | - | 300 K | - | - | - | 30 M | 50 M | 2 M | - | - | - | - | - | 350 K | - | - | - | - | - | - | - | - | - | - | - | - |
switzerlandwest | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | 250 K | - | 250 K | 250 K | 250 K | 250 K |
uksouth | - | - | - | - | 80 K | - | - | - | 240 K | - | - | - | 30 M | 50 M | 2 M | - | - | - | - | - | 350 K | - | 350 K | - | - | - | - | - | - | - | - | - | - |
westeurope | - | - | - | - | - | - | - | - | 240 K | - | - | - | 30 M | 50 M | 2 M | - | - | - | - | - | 240 K | - | - | - | - | - | - | - | - | - | - | - | - |
westus | 1 M | 600 K | - | - | 80 K | 30 K | 1 M | 2 M | 300 K | - | 50 M | 30 M | 30 M | 50 M | 2 M | 5 B | 15 B | 150 M | 300 M | 10 B | 350 K | - | - | - | - | - | - | - | - | - | - | - | - |
westus3 | 1 M | 600 K | - | - | 80 K | - | 1 M | 2 M | 300 K | - | 50 M | 30 M | 30 M | 50 M | 2 M | - | - | - | - | - | 350 K | - | 350 K | - | - | - | - | - | - | - | - | - | - |
Limit Name | Limit Value |
---|---|
Max files per resource | 500 |
Max input file size | 200 MB |
Max requests per file | 100,000 |
The table shows the batch quota limit. Quota values for global batch are represented in terms of enqueued tokens. When you submit a file for batch processing the number of tokens present in the file are counted. Until the batch job reaches a terminal state, those tokens will count against your total enqueued token limit.
Model | Enterprise agreement | Default | Monthly credit card based subscriptions | MSDN subscriptions | Azure for Students, Free Trials |
---|---|---|---|---|---|
gpt-4o |
5 B | 200 M | 50 M | 90 K | N/A |
gpt-4o-mini |
15 B | 1 B | 50 M | 90 K | N/A |
gpt-4-turbo |
300 M | 80 M | 40 M | 90 K | N/A |
gpt-4 |
150 M | 30 M | 5 M | 100 K | N/A |
gpt-35-turbo |
10 B | 1 B | 100 M | 2 M | 50 K |
o3-mini |
15 B | 1 B | 50 M | 90 K | N/A |
B = billion | M = million | K = thousand
Model | Enterprise agreement | Default | Monthly credit card based subscriptions | MSDN subscriptions | Azure for Students, Free Trials |
---|---|---|---|---|---|
gpt-4o |
500 M | 30 M | 30 M | 90 K | N/A |
gpt-4o-mini |
1.5 B | 100 M | 50 M | 90 K | N/A |
o3-mini |
1.5 B | 100 M | 50 M | 90 K | N/A |
Model | Tier | Quota Limit in tokens per minute (TPM) | Requests per minute |
---|---|---|---|
gpt-4.5 |
Enterprise Tier | 50 K | 50 |
gpt-4.5 |
Default | 50 K | 50 |
Важно
The ratio of RPM/TPM for quota with o1-series models works differently than older chat completions models:
This is particularly important for programmatic model deployment as this change in RPM/TPM ratio can result in accidental under allocation of quota if one is still assuming the 1:1000 ratio followed by older chat completion models.
There is a known issue with the quota/usages API where it assumes the old ratio applies to the new o1-series models. The API returns the correct base capacity number, but doesn't apply the correct ratio for the accurate calculation of TPM.
Model | Tier | Quota Limit in tokens per minute (TPM) | Requests per minute |
---|---|---|---|
o3-mini |
Enterprise agreement | 50 M | 5 K |
o1 & o1-preview |
Enterprise agreement | 30 M | 5 K |
o1-mini |
Enterprise agreement | 50 M | 5 K |
o3-mini |
Default | 5 M | 500 |
o1 & o1-preview |
Default | 3 M | 500 |
o1-mini |
Default | 5 M | 500 |
Model | Tier | Quota Limit in tokens per minute (TPM) | Requests per minute |
---|---|---|---|
o1-preview |
Enterprise agreement | 600 K | 100 |
o1-mini |
Enterprise agreement | 1 M | 100 |
o1-preview |
Default | 300 K | 50 |
o1-mini |
Default | 500 K | 50 |
gpt-4o
and gpt-4o-mini
, and gpt-4
(turbo-2024-04-09
) have rate limit tiers with higher limits for certain customer types.
Model | Tier | Quota Limit in tokens per minute (TPM) | Requests per minute |
---|---|---|---|
gpt-4o |
Enterprise agreement | 30 M | 180 K |
gpt-4o-mini |
Enterprise agreement | 50 M | 300 K |
gpt-4 (turbo-2024-04-09) |
Enterprise agreement | 2 M | 12 K |
gpt-4o |
Default | 450 K | 2.7 K |
gpt-4o-mini |
Default | 2 M | 12 K |
gpt-4 (turbo-2024-04-09) |
Default | 450 K | 2.7 K |
M = million | K = thousand
Model | Tier | Quota Limit in tokens per minute (TPM) | Requests per minute |
---|---|---|---|
gpt-4o |
Enterprise agreement | 10 M | 60 K |
gpt-4o-mini |
Enterprise agreement | 20 M | 120 K |
gpt-4o |
Default | 300 K | 1.8 K |
gpt-4o-mini |
Default | 1 M | 6 K |
M = million | K = thousand
Model | Tier | Quota Limit in tokens per minute (TPM) | Requests per minute |
---|---|---|---|
gpt-4o |
Enterprise agreement | 1 M | 6 K |
gpt-4o-mini |
Enterprise agreement | 2 M | 12 K |
gpt-4o |
Default | 150 K | 900 |
gpt-4o-mini |
Default | 450 K | 2.7 K |
M = million | K = thousand
The rate limits for each gpt-4o
audio model deployment are 100K TPM and 1K RPM. During the preview, Azure AI Foundry portal and APIs might inaccurately show different rate limits. Even if you try to set a different rate limit, the actual rate limit will be 100K TPM and 1K RPM.
Model | Tier | Quota Limit in tokens per minute (TPM) | Requests per minute |
---|---|---|---|
gpt-4o-audio-preview |
Default | 450 K | 1 K |
gpt-4o-realtime-preview |
Default | 800 K | 1 K |
gpt-4o-mini-audio-preview |
Default | 2 M | 1 K |
gpt-4o-mini-realtime-preview |
Default | 800 K | 1 K |
M = million | K = thousand
Global standard deployments use Azure's global infrastructure, dynamically routing customer traffic to the data center with best availability for the customer’s inference requests. Similarly, Data zone standard deployments allow you to leverage Azure global infrastructure to dynamically route traffic to the data center within the Microsoft defined data zone with the best availability for each request. This enables more consistent latency for customers with low to medium levels of traffic. Customers with high sustained levels of usage might see greater variability in response latency.
The Usage Limit determines the level of usage above which customers might see larger variability in response latency. A customer’s usage is defined per model and is the total tokens consumed across all deployments in all subscriptions in all regions for a given tenant.
Напомена
Usage tiers only apply to standard, data zone standard, and global standard deployment types. Usage tiers don't apply to global batch and provisioned throughput deployments.
Model | Usage Tiers per month |
---|---|
gpt-4o |
12 Billion tokens |
gpt-4o-mini |
85 Billion tokens |
Model | Usage Tiers per month |
---|---|
gpt-4 + gpt-4-32k (all versions) |
6 Billion |
If your Azure subscription is linked to certain offer types your max quota values are lower than the values indicated in the above tables.
Tier | Quota Limit in tokens per minute (TPM) |
---|---|
Azure for Students, Free Trials | 1 K (all models) Exception o-series & GPT 4.5 Preview: 0 |
MSDN subscriptions | GPT 3.5 Turbo Series: 30 K GPT-4 series: 8 K o-series: 0 GPT 4.5 Preview: 0 |
Monthly credit card based subscriptions 1 | GPT 3.5 Turbo Series: 30 K GPT-4 series: 8 K o-series: 0 GPT 4.5 Preview: 0 |
1 This currently applies to offer type 0003P
In the Azure portal you can view what offer type is associated with your subscription by navigating to your subscription and checking the subscriptions overview pane. Offer type corresponds to the plan field in the subscription overview.
To minimize issues related to rate limits, it's a good idea to use the following techniques:
Quota increase requests can be submitted via the quota increase request form. Due to high demand, quota increase requests are being accepted and will be filled in the order they're received. Priority is given to customers who generate traffic that consumes the existing quota allocation, and your request might be denied if this condition isn't met.
For other rate limits, submit a service request.
Explore how to manage quota for your Azure OpenAI deployments. Learn more about the underlying models that power Azure OpenAI.
Догађаји
Изградите интелигентне апликације
17. мар 21 - 21. мар 10
Придружите се серији састанака како бисте изградили скалабилна АИ решења заснована на стварним случајевима коришћења са колегама програмерима и стручњацима.
Региструјте се одмахОбука
Модул
Optimize spend and performance with Azure OpenAI Service provisioned reservations - Training
This module introduces you to provisioned deployments in Azure OpenAI services.