Compute throttling limits

Applies to: ✔️ Linux VMs ✔️ Windows VMs ✔️ Flexible scale sets ✔️ Uniform scale sets

Microsoft Compute implements throttling mechanism to help with the overall performance of the service and to give a consistent experience to the customers. API requests that exceed the maximum allowed limits are throttled and users get an HTTP 429 error. All Compute throttling policies are implemented on a per-region basis.

How do the throttling policies work?

Microsoft Compute implements throttling policies that limit the number of API requests made per resource and per subscription per region per minute. If the number of API requests exceeds these limits, the requests are throttled. Here's how these limits work:

  • Per Resource Limit – Each resource, such as a virtual machine (VM), has a specific limit for API requests. For instance, let us assume that a user creates 10 VMs in a subscription. The user can invoke up to 12 update requests for each VM in one minute. If the user exceeds the limit for the VM, API requests are throttled. This limit ensures that a few resources don’t consume the subscription level limits and throttle other resources.

  • Subscription Limit – In addition to resource limits, there's an overarching limit on the number of API requests across all resources within a subscription. Any API requests beyond this limit are throttled, regardless of whether the limit for an individual resource has been reached. For instance, let us assume that a user has 200 VMs in a subscription. Even though user is entitled to initiate up to 12 Update VM requests for each VM, the aggregate limit for Update VM API requests is capped at 1500 per min. Any Update VM API requests for the subscription exceeding 1500 are throttled.

How does Microsoft Compute determine throttling limits?

To determine the limits for each resource and subscription, Microsoft Compute uses Token Bucket Algorithm. This algorithm creates buckets for each limit and holds a specific number of tokens in each bucket. The number of tokens in a bucket represent the throttling limit at any given minute.

At the start of throttling window, when the resource is created, the bucket is filled to its Maximum Capacity. Each API request initiated by the user consumes one token. When the token count depletes to zero, subsequent API requests are throttled. Bucket is replenished with new tokens every minute at a consistent rate called Bucket Refill Rate for a resource and a subscription.

For Instance: Let us consider the 'throttling policy for VM Update API' that stipulates a Bucket Refill Rate of four tokens per minute, and a Maximum Bucket Capacity of 12 tokens. The user invokes the Update VM API request for a virtual machine (VM) as per the following table. Initially, the bucket is filled with 12 tokens at the start of the throttling window. By the fourth minute, the user utilizes all 12 tokens, leaving the bucket empty. In the fifth minute, the bucket is replenished with four new tokens in accordance with the Bucket Refill Rate. So, four API requests can be made in the fifth minute, while Microsoft Compute throttles one API request due to insufficient tokens.

(min) 1st 2nd 3rd 4th 5th 6th
Number of tokens in the beginning (A) 12 12 8 12 4 4
Requests per minute (B) 0 8 0 13 5 0
Throttled requests (C) 0 0 0 1 1 0
Remaining tokens at the end of period
D = Max(A-B, 0)
12 4 8 0 0 4

Similar process is followed for determining the throttling limits at subscription level. The following sections detail the Bucket refill rate and Maximum bucket capacity that is used to determine throttling limits for Virtual Machines, Virtual Machine Scale Sets and Virtual Machines Scale Set VMs.

Throttling limits for Virtual Machines

API requests for Virtual Machines are categorized into seven distinct policies. Each policy has its own limits, depending upon how resource intensive the API requests under that policy are. Following table contains a comprehensive list of these policies, the corresponding REST APIs, and their respective throttling limits:

Policy category REST APIs Resource Level Resource Level Subscription Level Subscription Level
Bucket refill rate (Per Min) Maximum Bucket capacity
(Per Min)
Bucket refill rate
(Per Min)
Maximum Bucket capacity
(Per Min)
Put VM
(Create new VMs)
Create 4 12 500 1,500
Update VM
(Update existing VMs)
Update
Reapply Restart
Power Off
Start
Generalize
Convert To Managed Disks
Redeploy
Perform Maintenance
Capture
Run Command
Create Or Update
Extensions - Update
Extensions - Delete
Reimage
Update
Run Commands - Update
Run Commands - Delete
Run Commands - Create Or Update
4 12 500 1,500
Delete VM
(Delete VMs)
Delete
Simulate Eviction
Deallocate
4 12 500 1,500
Low Cost Get VM
(Get information on single VM)
Get
Instance View
Extensions - Get
List Available Sizes
Retrieve Boot Diagnostics Data
Run Commands - Get By Virtual Machine
Run Commands - List By Virtual Machine
12 36 8,000 24,000
High Cost Get VM1
(Get information on multiple VMs)
List
List All
List By Location
NA NA 300 900
Get Operation
(Get information on async VM operations)
Status of asynchronous operations 15 45 5,000 15,000
VM Guest Patch Operations
(Assess & install guest patches)
Assess Patches
Install Patches
2 6 200 600

1 Only subscription level policies are applicable.

Throttling limits for Virtual Machine Scale Sets

API requests for Virtual Machine Scale Set(Uniform & Flex) are categorized into 5 distinct policies. Each policy has its own limits, depending upon how resource intensive the API requests under that policy are. These policies are applicable to both Flex and Uniform orchestration modes. Following table contains a comprehensive list of these policies, the corresponding REST APIs, and their respective throttling limits:

Policy category REST APIs Resource Level Resource Level Subscription Level Subscription Level
Bucket refill rate
(Per Min)
Maximum Bucket capacity
(Per Min)
Bucket refill rate (Per Min) Maximum Bucket capacity
(Per Min)
Put
(Create new scale set)
Create 4 12 125 375
Update
(Update existing scaleset)
Update
Start2
Restart2
Redeploy2
Perform Maintenance2
Reimage2
Reimage All2
Create Or Update
Rolling Upgrades - Cancel
Extensions - Create
Extensions - Update
Extensions - Delete
Force Recovery Service Fabric Platform Update Domain Walk
Convert To Single Placement Group
Set Orchestration Service State
4 12 500 1,500
Delete
(Delete scale set)
Delete
Power Off2
Deallocate
4 12 175 525
Low Cost Get
(Get information on single scale set)
Get
List Skus
Rolling Upgrades - Get Latest
Get OS Upgrade History
12 36 800 2,400
High Cost Get
(Get resource intensive information)
Get Instance View
List2
List All2
List By Location2
10 30 360 1,080

2 Only subscription level policies are applicable.

Throttling limits for Virtual Machine Scale Set Virtual Machines

API requests for Virtual Machine Scale Set Virtual Machines are categorized into 3 distinct policies. Each policy has its own limits, depending upon how resource intensive the API requests under that policy are. Following table contains a comprehensive list of these policies, the corresponding REST APIs, and their respective throttling limits:

Policy category REST APIs Resource Level Resource Level Subscription Level Subscription Level
Bucket refill rate
(Per Min)
Maximum Bucket capacity
(Per Min)
Bucket refill rate
(Per Min)
Maximum Bucket capacity
(Per Min)
Update scale set VMs
(Update existing VMs in a scale set)
Start
Restart
Reimage
ReimageAll
Update
SimulateEviction
Extensions- Create Or Update
RunCommands - Create Or Update
RunCommands - Update
4 12 500 1,500
Delete scale set VMs
(Delete scale set VMs)
Delete
PowerOff
Deallocate
Extensions- Delete
RunCommands - Delete
4 12 500 1,500
Get scale set VMs
(Get information on scale set VMs)
Get
GetInstance View
Extensions- Get
RunCommands - Get
RetrieveBoot Diagnostics Data
12 36 2,000 6,000

Troubleshooting guidelines

In case users are still facing challenges due to Compute throttling, refer to Troubleshooting throttling errors in Azure - Virtual Machines. It has details on how to troubleshoot throttling issues, and best practices to avoid being throttled.

FAQs

Is there any action required from users?

Users don’t need to change anything in their configuration or workloads. All existing APIs continue to work as is.

What benefits do the throttling policies provide?

The throttling policies offer several benefits:

  • All Compute resources have a uniform window of 1 min. Users can successfully invoke API calls, 1 min after getting throttled.

  • No single resource can use up all the limits under a subscription as limits are defined at resource level.

  • Microsoft Compute is introducing a new algorithm, Token Bucket Algorithm, for determining the limits. The algorithm provides extra buffer to the customers, while making high number of API requests.

Does the customer get an alert when they're about to reach their throttling limits?

As part of every response, Microsoft Compute returns x-ms-ratelimit-remaining-resource which can be used to determine the throttling limits against the policies. A list of applicable throttling policies is returned as a response to Call rate informational headers.