Azure OpenAI service quotas and limits

This article contains a quick reference and a detailed description of the quotas and limits for the Azure OpenAI service in Azure Cognitive Services.

Quotas and limits reference

The following sections provide you with a quick guide to the quotas and limits that apply to the Azure OpenAI service

Limit Name Limit Value
OpenAI resources per region 2
Requests per second per deployment 20 requests per second for: text-davinci-003, text-davinci-002, text-davinci-fine-tune-002, code-cushman-002, code-davinci-002, code-davinci-fine-tune-002

50 requests per second for all other text models.
Max fine-tuned model deployments 2
Ability to deploy same model to multiple deployments Not allowed
Total number of training jobs per resource 100
Max simultaneous running training jobs per resource 1
Max training jobs queued 20
Max Files per resource 50
Total size of all files per resource 1 GB
Max training job time (job will fail if exceeded) 120 hours
Max training job size (tokens in training file * # of epochs) Ada: 40-M tokens
Babbage: 40-M tokens
Curie: 40-M tokens
Cushman: 40-M tokens
Davinci: 10-M

General best practices to mitigate throttling during autoscaling

To minimize issues related to throttling, it's a good idea to use the following techniques:

  • Implement retry logic in your application.
  • Avoid sharp changes in the workload. Increase the workload gradually.
  • Test different load increase patterns.
  • Create another OpenAI service resource in the same or different regions, and distribute the workload among them.

The next sections describe specific cases of adjusting quotas.

Request an increase to a limit on transactions-per-second or number of fine-tuned models deployed

The limit of concurrent requests defines how high the service can scale before it starts to throttle your requests.

Have the required information ready

  • OpenAI Resource ID
  • Region
  • Deployment Name

How to get this information:

  1. Go to the Azure portal.
  2. Select the Azure OpenAI resource for which you would like to increase the request limit.
  3. From the Resource Management group, select Properties.
  4. Copy and save the values of the following fields:
    • Resource ID
    • Location (your endpoint region)
  5. From the Resource Management group, select Deployments.
    • Copy and save the name of the Deployment you're requesting a limit increase

Create and submit a support request

Initiate the increase of the limit for concurrent requests for your resource, or if necessary check the current limit, by submitting a support request. Here's how:

  1. Ensure you have the required information listed in the previous section.
  2. Go to the Azure portal.
  3. Select the OpenAI service resource for which you would like to increase (or to check) the concurrency request limit.
  4. In the Support + troubleshooting group, select New support request. A new window will appear, with auto-populated information about your Azure subscription and Azure resource.
  5. In Summary, describe what you want (for example, "Increase OpenAI request limit").
  6. In Problem type, select Quota or Subscription issues.
  7. In Problem subtype, select Increasing limits or access to specific functionality
  8. Select Next: Solutions. Proceed further with the request creation.
  9. On the Details tab, in the Description field, enter the following:
    • Include details on which limit you're requesting an increase for.
    • The Azure resource information you collected previously.
    • Any other required information.
  10. On the Review + create tab, select Create.
  11. Note the support request number in Azure portal notifications. You'll be contacted shortly about your request.

Next steps

Learn more about the underlying models that power Azure OpenAI.