Azure OpenAI service quotas and limits
This article contains a quick reference and a detailed description of the quotas and limits for the Azure OpenAI service in Azure Cognitive Services.
Quotas and limits reference
The following sections provide you with a quick guide to the quotas and limits that apply to the Azure OpenAI service
|Limit Name||Limit Value|
|OpenAI resources per region||2|
|Requests per second per deployment||20 requests per second for: text-davinci-003, text-davinci-002, text-davinci-fine-tune-002, code-cushman-002, code-davinci-002, code-davinci-fine-tune-002
50 requests per second for all other text models.
|Max fine-tuned model deployments||2|
|Ability to deploy same model to multiple deployments||Not allowed|
|Total number of training jobs per resource||100|
|Max simultaneous running training jobs per resource||1|
|Max training jobs queued||20|
|Max Files per resource||50|
|Total size of all files per resource||1 GB|
|Max training job time (job will fail if exceeded)||120 hours|
|Max training job size (tokens in training file * # of epochs)||Ada: 40-M tokens
Babbage: 40-M tokens
Curie: 40-M tokens
Cushman: 40-M tokens
General best practices to mitigate throttling during autoscaling
To minimize issues related to throttling, it's a good idea to use the following techniques:
- Implement retry logic in your application.
- Avoid sharp changes in the workload. Increase the workload gradually.
- Test different load increase patterns.
- Create another OpenAI service resource in the same or different regions, and distribute the workload among them.
The next sections describe specific cases of adjusting quotas.
Request an increase to a limit on transactions-per-second or number of fine-tuned models deployed
The limit of concurrent requests defines how high the service can scale before it starts to throttle your requests.
Have the required information ready
- OpenAI Resource ID
- Deployment Name
How to get this information:
- Go to the Azure portal.
- Select the Azure OpenAI resource for which you would like to increase the request limit.
- From the Resource Management group, select Properties.
- Copy and save the values of the following fields:
- Resource ID
- Location (your endpoint region)
- From the Resource Management group, select Deployments.
- Copy and save the name of the Deployment you're requesting a limit increase
Create and submit a support request
Initiate the increase of the limit for concurrent requests for your resource, or if necessary check the current limit, by submitting a support request. Here's how:
- Ensure you have the required information listed in the previous section.
- Go to the Azure portal.
- Select the OpenAI service resource for which you would like to increase (or to check) the concurrency request limit.
- In the Support + troubleshooting group, select New support request. A new window will appear, with auto-populated information about your Azure subscription and Azure resource.
- In Summary, describe what you want (for example, "Increase OpenAI request limit").
- In Problem type, select Quota or Subscription issues.
- In Problem subtype, select Increasing limits or access to specific functionality
- Select Next: Solutions. Proceed further with the request creation.
- On the Details tab, in the Description field, enter the following:
- Include details on which limit you're requesting an increase for.
- The Azure resource information you collected previously.
- Any other required information.
- On the Review + create tab, select Create.
- Note the support request number in Azure portal notifications. You'll be contacted shortly about your request.
Learn more about the underlying models that power Azure OpenAI.