Compute throttling limits

How do the throttling policies work?

How does Microsoft Compute determine throttling limits?

Throttling limits for Virtual Machines

Throttling limits for Virtual Machine Scale Sets

Throttling limits for Virtual Machine Scale Set Virtual Machines

Troubleshooting guidelines

FAQs

Is there any action required from users?

What benefits do the throttling policies provide?

Does the customer get an alert when they're about to reach their throttling limits?

Share via

2024-08-22

Applies to: ✔️ Linux VMs ✔️ Windows VMs ✔️ Flexible scale sets ✔️ Uniform scale sets

Microsoft Compute implements throttling mechanism to help with the overall performance of the service and to give a consistent experience to the customers. API requests that exceed the maximum allowed limits are throttled and users get an HTTP 429 error. All Compute throttling policies are implemented on a per-region basis.

Microsoft Compute implements throttling policies that limit the number of API requests made per resource and per subscription per region per minute. If the number of API requests exceeds these limits, the requests are throttled. Here's how these limits work:

Per Resource Limit – Each resource, such as a virtual machine (VM), has a specific limit for API requests. For instance, let us assume that a user creates 10 VMs in a subscription. The user can invoke up to 12 update requests for each VM in one minute. If the user exceeds the limit for the VM, API requests are throttled. This limit ensures that a few resources don’t consume the subscription level limits and throttle other resources.
Subscription Limit – In addition to resource limits, there's an overarching limit on the number of API requests across all resources within a subscription. Any API requests beyond this limit are throttled, regardless of whether the limit for an individual resource has been reached. For instance, let us assume that a user has 200 VMs in a subscription. Even though user is entitled to initiate up to 12 Update VM requests for each VM, the aggregate limit for Update VM API requests is capped at 1500 per min. Any Update VM API requests for the subscription exceeding 1500 are throttled.

To determine the limits for each resource and subscription, Microsoft Compute uses Token Bucket Algorithm. This algorithm creates buckets for each limit and holds a specific number of tokens in each bucket. The number of tokens in a bucket represent the throttling limit at any given minute.

At the start of throttling window, when the resource is created, the bucket is filled to its Maximum Capacity. Each API request initiated by the user consumes one token. When the token count depletes to zero, subsequent API requests are throttled. Bucket is replenished with new tokens every minute at a consistent rate called Bucket Refill Rate for a resource and a subscription.

For Instance: Let us consider the 'throttling policy for VM Update API' that stipulates a Bucket Refill Rate of four tokens per minute, and a Maximum Bucket Capacity of 12 tokens. The user invokes the Update VM API request for a virtual machine (VM) as per the following table. Initially, the bucket is filled with 12 tokens at the start of the throttling window. By the fourth minute, the user utilizes all 12 tokens, leaving the bucket empty. In the fifth minute, the bucket is replenished with four new tokens in accordance with the Bucket Refill Rate. So, four API requests can be made in the fifth minute, while Microsoft Compute throttles one API request due to insufficient tokens.

Similar process is followed for determining the throttling limits at subscription level. The following sections detail the Bucket refill rate and Maximum bucket capacity that is used to determine throttling limits for Virtual Machines, Virtual Machine Scale Sets and Virtual Machines Scale Set VMs.

API requests for Virtual Machines are categorized into seven distinct policies. Each policy has its own limits, depending upon how resource intensive the API requests under that policy are. Following table contains a comprehensive list of these policies, the corresponding REST APIs, and their respective throttling limits:

Policy category	REST APIs	Resource Level	Resource Level	Subscription Level	Subscription Level
		Bucket refill rate (Per Min)	Maximum Bucket capacity (Per Min)	Bucket refill rate (Per Min)	Maximum Bucket capacity (Per Min)
Put VM (Create new VMs)	Create	4	12	500	1,500
Update VM (Update existing VMs)	Update Reapply Restart Power Off Start Generalize Convert To Managed Disks Redeploy Perform Maintenance Capture Run Command Create Or Update Extensions - Update Extensions - Delete Reimage Update Run Commands - Update Run Commands - Delete Run Commands - Create Or Update	4	12	500	1,500
Delete VM (Delete VMs)	Delete Simulate Eviction Deallocate	4	12	500	1,500
Low Cost Get VM (Get information on single VM)	Get Instance View Extensions - Get List Available Sizes Retrieve Boot Diagnostics Data Run Commands - Get By Virtual Machine Run Commands - List By Virtual Machine	12	36	8,000	24,000
High Cost Get VM¹ (Get information on multiple VMs)	List List All List By Location	NA	NA	300	900
Get Operation (Get information on async VM operations)	Status of asynchronous operations	15	45	5,000	15,000
VM Guest Patch Operations (Assess & install guest patches)	Assess Patches Install Patches	2	6	200	600

¹ Only subscription level policies are applicable.

API requests for Virtual Machine Scale Set(Uniform & Flex) are categorized into 5 distinct policies. Each policy has its own limits, depending upon how resource intensive the API requests under that policy are. These policies are applicable to both Flex and Uniform orchestration modes. Following table contains a comprehensive list of these policies, the corresponding REST APIs, and their respective throttling limits:

Policy category	REST APIs	Resource Level	Resource Level	Subscription Level	Subscription Level
		Bucket refill rate (Per Min)	Maximum Bucket capacity (Per Min)	Bucket refill rate (Per Min)	Maximum Bucket capacity (Per Min)
Put (Create new scale set)	Create	4	12	125	375
Update (Update existing scaleset)	Update Start² Restart² Redeploy² Perform Maintenance² Reimage² Reimage All² Create Or Update Rolling Upgrades - Cancel Extensions - Create Extensions - Update Extensions - Delete Force Recovery Service Fabric Platform Update Domain Walk Convert To Single Placement Group Set Orchestration Service State	4	12	500	1,500
Delete (Delete scale set)	Delete Power Off² Deallocate	4	12	175	525
Low Cost Get (Get information on single scale set)	Get List Skus Rolling Upgrades - Get Latest Get OS Upgrade History	12	36	800	2,400
High Cost Get (Get resource intensive information)	Get Instance View List² List All² List By Location²	10	30	360	1,080

² Only subscription level policies are applicable.

API requests for Virtual Machine Scale Set Virtual Machines are categorized into 3 distinct policies. Each policy has its own limits, depending upon how resource intensive the API requests under that policy are. Following table contains a comprehensive list of these policies, the corresponding REST APIs, and their respective throttling limits:

Policy category	REST APIs	Resource Level	Resource Level	Subscription Level	Subscription Level
		Bucket refill rate (Per Min)	Maximum Bucket capacity (Per Min)	Bucket refill rate (Per Min)	Maximum Bucket capacity (Per Min)
Update scale set VMs (Update existing VMs in a scale set)	Start Restart Reimage ReimageAll Update SimulateEviction Extensions- Create Or Update RunCommands - Create Or Update RunCommands - Update	4	12	500	1,500
Delete scale set VMs (Delete scale set VMs)	Delete PowerOff Deallocate Extensions- Delete RunCommands - Delete	4	12	500	1,500
Get scale set VMs (Get information on scale set VMs)	Get GetInstance View Extensions- Get RunCommands - Get RetrieveBoot Diagnostics Data	12	36	2,000	6,000

In case users are still facing challenges due to Compute throttling, refer to Troubleshooting throttling errors in Azure - Virtual Machines. It has details on how to troubleshoot throttling issues, and best practices to avoid being throttled.

Users don’t need to change anything in their configuration or workloads. All existing APIs continue to work as is.

The throttling policies offer several benefits:

All Compute resources have a uniform window of 1 min. Users can successfully invoke API calls, 1 min after getting throttled.
No single resource can use up all the limits under a subscription as limits are defined at resource level.
Microsoft Compute is introducing a new algorithm, Token Bucket Algorithm, for determining the limits. The algorithm provides extra buffer to the customers, while making high number of API requests.

As part of every response, Microsoft Compute returns x-ms-ratelimit-remaining-resource which can be used to determine the throttling limits against the policies. A list of applicable throttling policies is returned as a response to Call rate informational headers.

Number of tokens in the beginning (A)

Requests per minute (B)

Throttled requests (C)

Remaining tokens at the end of period
D = Max(A-B, 0)