Rate limits

Azure DevOps Services

Azure DevOps, like many software-as-a-service solutions, uses multi-tenancy to reduce costs and improve performance. This design leaves users vulnerable to performance issues and even outages when other users, of their shared resources, have spikes in their consumption. To combat these problems, Azure DevOps limits the resources individuals can consume, and the amount of requests they can make to certain commands. When these limits are exceeded, future requests may be either delayed or blocked.

When a user's requests are delayed by a significant amount, that user gets an email and sees a warning banner in the web. For the build service account and others without an email address, members of the Project Collection Administrators group get the email. For more information, see Usage monitoring.

When an individual user's requests get blocked, responses with HTTP code 429 (too many requests) are received, with a message similar to the following message:

TF400733: The request has been canceled: Request was blocked due to exceeding usage of resource <resource name> in namespace <namespace ID>.

For more information, see the following articles:

Current rate limits

Azure DevOps currently has a global consumption limit. This limit delays requests from individual users beyond a threshold when shared resources are in danger of being overwhelmed.

Global consumption limit

This limit is focused exclusively on avoiding outages when shared resources are close to being overwhelmed. Individual users typically only have their requests delayed when one of the following occurs:

  • One of their shared resources is at risk of being overwhelmed
  • Their personal usage exceeds 200 times the consumption of a typical user within a (sliding) five-minute window

The amount of the delay depends on the user's sustained level of consumption. Delays range from a few milliseconds per request up to 30 seconds. Once consumption goes to zero or the resource is no longer overwhelmed, the delays stop within five minutes. If consumption remains high, delays may continue indefinitely to protect the resource.

Azure DevOps throughput units (TSTUs)

Azure DevOps users consume many shared resources, and consumption depends on many factors. For example:

  • Uploading a large number of files to version control creates a large amount of load on databases and storage accounts.
  • Complex work item tracking queries create database load based on the number of work items they search through.
  • Builds drive load by downloading files from version control, producing log output, and so on.
  • All operations consume CPU and memory on various parts of the service.

To accommodate all of this, Azure DevOps resource consumption is expressed in abstract units called Azure DevOps throughput units, or TSTUs.

TSTUs eventually incorporate a blend of the following:

  • Azure SQL Database DTUs as a measure of database consumption
  • Application tier and job agent CPU, memory, and I/O as a measure of compute consumption
  • Azure Storage bandwidth as a measure of storage consumption

For now, TSTUs are primarily focused on Azure SQL Database DTUs, since Azure SQL Databases are the shared resources most commonly overwhelmed by excessive consumption.

A single TSTU is the average load we expect a single normal user of Azure DevOps to generate per five minutes. Normal users also generate spikes in load. These spikes are typically 10 or fewer TSTUs per five minutes. Less frequently, spikes go as high as 100 TSTUs. The global consumption limit is 200 TSTUs within a sliding five-minute window.

Pipelines

Rate limiting is similar for Azure Pipelines. Each pipeline is treated as an individual entity with its own resource consumption tracked. Even if build agents are self-hosted, they generate load in the form of cloning and sending logs.

We apply a 200 TSTU limit for an individual pipeline in a sliding 5-minute window. This limit is the same as the global consumption limit for users. If a pipeline is delayed or blocked by rate limiting, a message appears in the attached logs.

API client experience

When requests are delayed or blocked, Azure DevOps returns response headers to help API clients react. While not fully standardized, these headers are broadly in line with other popular services.

The following table lists the headers available and what they mean. Except for X-RateLimit-Delay, all of these headers get sent before requests start getting delayed. This design gives clients the opportunity to proactively slow down their rate of requests.

Header name

Description


Retry-After

The RFC 6585-specified header sent to tell you how long to wait before you send your next request to fall under the detection threshold. Units: seconds.


X-RateLimit-Resource

A custom header indicating the service and type of threshold that was reached. Threshold types and service names may vary over time and without warning. We recommend displaying this string to a human, but not relying on it for computation.


X-RateLimit-Delay

How long the request was delayed. Units: seconds with up to three decimal places (milliseconds).


X-RateLimit-Limit

Total number of TSTUs allowed before delays are imposed.


X-RateLimit-Remaining

Number of TSTUs remaining before being delayed. If requests are already being delayed or blocked, it's 0.


X-RateLimit-Reset

Time at which, if all resource consumption stopped immediately, tracked usage would return to 0 TSTUs. Expressed in Unix epoch time.


Recommendations

We recommend that you at least respond to the Retry-After header. If you detect a Retry-After header in any response, wait until that amount of time has passed before sending another request. Doing so helps your client application experience fewer enforced delays. Keep in mind that the response is 200, so you don't need to apply retry logic to the request.

If possible, we further recommend that you monitor X-RateLimit-Remaining and X-RateLimit-Limit headers.

Doing so allows you to approximate how quickly you're approaching the delay threshold.

Your client can intelligently react by spreading out its requests over time.