Events
17 Mar, 9 pm - 21 Mar, 10 am
Join the meetup series to build scalable AI solutions based on real-world use cases with fellow developers and experts.
Register nowThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
This article provides guidance for how customers can access higher rate limits on certain Azure AI services resources.
Each Azure AI services resource has a pre-configured static call rate (transactions per second) which limits the number of concurrent calls that customers can make to the backend service in a given time frame. The autoscale feature will automatically increase/decrease a customer's resource's rate limits based on near-real-time resource usage metrics and backend service capacity metrics.
This feature is disabled by default for every new resource. If your resource supports autoscale, follow these instructions to enable it:
Go to your resource's page in the Azure portal, and select the Overview tab on the left pane. Under the Essentials section, find the Autoscale line and select the link to view the Autoscale Settings pane and enable the feature.
Autoscale feature is available in the paid subscription tier of the following services:
No, the autoscale feature isn't available to free tier subscriptions.
No, you may still get 429
errors for rate limit excess. If your application triggers a spike, and your resource reports a 429
response, autoscale checks the available capacity projection section to see whether the current capacity can accommodate a rate limit increase and respond within five minutes.
If the available capacity is enough for an increase, autoscale gradually increases the rate limit cap of your resource. If you continue to call your resource at a high rate that results in more 429
throttling, your TPS rate will continue to increase over time. If this action continues for one hour or more, you should reach the maximum rate (up to 1000 TPS) currently available at that time for that resource.
If the available capacity isn't enough for an increase, the autoscale feature waits five minutes and checks again.
By default, Azure AI services resources have a default rate limit of 10 TPS. If you need a higher default TPS, submit a ticket by following the New Support Request link on your resource's page in the Azure portal. Remember to include a business justification in the request.
Azure AI services pricing hasn't changed and can be accessed here. We'll only bill for successful calls made to Azure AI services APIs. However, increased call rate limits mean more transactions are completed, and you may receive a higher bill.
Be aware of potential errors and their consequences. If a bug in your client application causes it to call the service hundreds of times per second, that would likely lead to a much higher bill, whereas the cost would be much more limited under a fixed rate limit. Errors of this kind are your responsibility. We highly recommend that you perform development and client update tests against a resource with a fixed rate limit prior to using the autoscale feature.
Yes, you can disable the autoscale feature through Azure portal or CLI and return to your default call rate limit setting. If your resource was previously approved for a higher default TPS, it goes back to that rate. It can take up to five minutes for the changes to go into effect.
Events
17 Mar, 9 pm - 21 Mar, 10 am
Join the meetup series to build scalable AI solutions based on real-world use cases with fellow developers and experts.
Register nowTraining
Module
Dynamically meet changing web app performance requirements with autoscale rules - Training
Respond to periods of high activity by incrementally adding resources, and then removing these resources when activity drops, to reduce costs.
Certification
Microsoft Certified: Azure AI Engineer Associate - Certifications
Design and implement an Azure AI solution using Azure AI services, Azure AI Search, and Azure Open AI.