Applies to: ✔️ Linux VMs ✔️ Windows VMs ✔️ Flexible scale sets ✔️ Uniform scale sets
Microsoft Compute implements throttling mechanism to help with the
overall performance of the service and to give a consistent experience
to the customers. API requests that exceed the maximum allowed limits
are throttled and users get an HTTP 429
error.
All Compute throttling policies are implemented on a per-region basis.
How do the throttling policies work?
Microsoft Compute implements throttling policies that limit the
number of API requests made per resource and per subscription per region
per minute. If the number of API requests exceeds these limits, the
requests are throttled. Here's how these limits work:
Per Resource Limit – Each resource, such as a virtual machine
(VM), has a specific limit for API requests. For instance, let us
assume that a user creates 10 VMs in a subscription. The user
can invoke up to 12 update requests for each VM in one minute. If the
user exceeds the limit for the VM, API requests are throttled.
This limit ensures that a few resources don’t consume the
subscription level limits and throttle other resources.
Subscription Limit – In addition to resource limits, there's
an overarching limit on the number of API requests across all
resources within a subscription. Any API requests beyond this limit
are throttled, regardless of whether the limit for an individual resource has been reached. For instance, let us assume that a user has 200 VMs in a subscription. Even though user is entitled to initiate up to 12
Update VM requests for each VM, the aggregate limit for Update VM
API requests is capped at 1500 per min. Any Update VM API requests
for the subscription exceeding 1500 are throttled.
How does Microsoft Compute determine throttling limits?
To determine the limits for each resource and subscription, Microsoft
Compute uses Token Bucket Algorithm. This algorithm creates buckets
for each limit and holds a specific number of tokens in each bucket. The
number of tokens in a bucket represent the throttling limit at any given
minute.
At the start of throttling window, when the resource is created, the
bucket is filled to its Maximum Capacity. Each API request initiated
by the user consumes one token. When the token count depletes to zero,
subsequent API requests are throttled. Bucket is replenished with
new tokens every minute at a consistent rate called Bucket Refill Rate
for a resource and a subscription.
For Instance: Let us consider the 'throttling policy for VM Update API'
that stipulates a Bucket Refill Rate of four tokens per minute, and a
Maximum Bucket Capacity of 12 tokens. The user invokes the Update VM
API request for a virtual machine (VM) as per the following table. Initially, the
bucket is filled with 12 tokens at the start of the throttling window.
By the fourth minute, the user utilizes all 12 tokens, leaving the
bucket empty. In the fifth minute, the bucket is replenished with four new
tokens in accordance with the Bucket Refill Rate. So, four API
requests can be made in the fifth minute, while Microsoft Compute throttles one API request due to insufficient tokens.
(min)
1st
2nd
3rd
4th
5th
6th
Number of tokens in the beginning (A)
12
12
8
12
4
4
Requests per minute (B)
0
8
0
13
5
0
Throttled requests (C)
0
0
0
1
1
0
Remaining tokens at the end of period D = Max(A-B, 0)
12
4
8
0
0
4
Similar process is followed for determining the throttling limits at
subscription level. The following sections detail the Bucket refill rate
and Maximum bucket capacity that is used to determine throttling limits for
Virtual
Machines,
Virtual Machine Scale
Sets
and Virtual Machines Scale Set
VMs.
Throttling limits for Virtual Machines
API requests for Virtual Machines are categorized into seven distinct
policies. Each policy has its own limits, depending upon how
resource intensive the API requests under that policy are. Following table contains a
comprehensive list of these policies, the corresponding REST APIs, and
their respective throttling limits:
1 Only subscription level policies are applicable.
Throttling limits for Virtual Machine Scale Sets
API requests for Virtual Machine Scale Set(Uniform & Flex) are categorized into 5
distinct policies. Each policy has its own limits, depending upon how resource intensive the API requests under that policy are. These
policies are applicable to both Flex and Uniform orchestration modes.
Following table contains a comprehensive list of these policies, the corresponding REST
APIs, and their respective throttling limits:
2 Only subscription level policies are applicable.
Throttling limits for Virtual Machine Scale Set Virtual Machines
API requests for Virtual Machine Scale Set Virtual Machines are categorized into 3 distinct
policies. Each policy has its own limits, depending upon how
resource intensive the API requests under that policy are. Following table contains a
comprehensive list of these policies, the corresponding REST APIs, and
their respective throttling limits:
Policy category
REST APIs
Resource Level
Resource Level
Subscription Level
Subscription Level
Bucket refill rate (Per Min)
Maximum Bucket capacity (Per Min)
Bucket refill rate (Per Min)
Maximum Bucket capacity (Per Min)
Update scale set VMs (Update existing VMs in a scale set)
Users don’t need to change anything in their configuration or workloads. All existing APIs continue to work as is.
What benefits do the throttling policies provide?
The throttling policies offer several benefits:
All Compute resources have a uniform window of 1 min. Users
can successfully invoke API calls, 1 min after getting
throttled.
No single resource can use up all the limits under a subscription as
limits are defined at resource level.
Microsoft Compute is introducing a new algorithm, Token Bucket Algorithm, for determining the limits. The algorithm provides extra buffer to the customers, while making high number of API requests.
Does the customer get an alert when they're about to reach their throttling limits?
As part of every response, Microsoft Compute returns
x-ms-ratelimit-remaining-resource which can be used to determine the
throttling limits against the policies. A list of applicable throttling
policies is returned as a response to Call rate informational
headers.
Protect your backend APIs from information exposure and implement throttling (rate limiting) to prevent resource exhaustion with policies in Azure API Management.