Q: What is the difference between autoscale and dynamic autoscale in Azure Cosmos DB?

Autoscale or autoscale provisioned throughput scales workloads based on the most active region and partition. In contrast, dynamic autoscale allows your workloads’ regions and partitions to scale independently based on usage. We recommend dynamic autoscale to all customers who plan to use autoscale.

Q: Can normalized RU/s consumption be 100 percent even if Provisioned Throughput or Autoscaled RU metrics do not scale to the maximum RU/s?

Yes. For more information, see Monitor normalized RU/s .

Q: How quickly does autoscale scale up based on increases in traffic?

With autoscale, the system scales the throughput (RU/s) T up or T down within the range of 0.1 × Tmax to Tmax based on incoming traffic. Because the scaling is automatic and instantaneous, at any point in time, you can consume up to the provisioned Tmax with no delay.

Q: What's the pricing for autoscale?

Each hour, you're billed for the highest throughput T the system scaled to within that hour. If your resource had no requests during the hour or didn't scale beyond 0.1 × Tmax , you're billed for the minimum of 0.1 × Tmax . For details, see the Azure Cosmos DB pricing page .

Question 1

What is the difference between autoscale and dynamic autoscale in Azure Cosmos DB?

Accepted Answer

Autoscale or autoscale provisioned throughput scales workloads based on the most active region and partition. In contrast, dynamic autoscale allows your workloads’ regions and partitions to scale independently based on usage. We recommend dynamic autoscale to all customers who plan to use autoscale.

Question 2

How can I enable dynamic autoscale on an account programatically?

Accepted Answer

You can use a Resource Manager template with preview version 2023-11-15-preview or later and stable version 2024-11-15 or later to set the property enablePerRegionPerPartitionAutoscale to true. You can use a Resource Provider REST API version 2024-11-15 or later to set enablePerRegionPerPartitionAutoscale to true.
You can also use the Azure CLI or PowerShell.

PowerShell
Azure CLI

// Add Azure Cosmos DB extension 1.18.0 or later for PowerShell
Install-Module -Name Az.CosmosDB -RequiredVersion 1.18.0

// update the account using this command to enable or disable the property
Update-AzCosmosDBAccount -EnablePerRegionPerPartitionAutoscale $true -ResourceGroupName "" -Name ""

// Run this command to see the enablement or disablement status:
Get-AzCosmosDBAccount -ResourceGroupName "" -Name ""

// Check if the azure cli version is 2.71.0 or later
// update the account using this command to enable or disable the property
az cosmosdb update --enable-prpp-autoscale true --name '' --resource-group ''

// Run this command to see the enablement or disablement status:
az cosmosdb show --name '' --resource-group ''

Question 3

Can normalized RU/s consumption be 100 percent even if Provisioned Throughput or Autoscaled RU metrics do not scale to the maximum RU/s?

Accepted Answer

Yes. For more information, see Monitor normalized RU/s.

Question 4

What metric should be used to determine whether the autoscale max RU/s or manual provisioned RU/s can be scaled up or down programatically?

Accepted Answer

We recommended taking advantage of Azure Cosmos DB's native dynamic scaling capability to manage RU/s for your resource. However, if needed, you should use the Normalized RU Consumption metric in Azure Monitor to make programmatic scaling decisions. The Normalized RU Consumption metric is a metric between 0% to 100% that is used to help measure the utilization of the autoscale max RU/s or manual provisioned throughput on a database or container. It is the most accurate way to determine whether scale-up or scale-down is possible. Other approaches, like using the ReadThroughputAsync() call in the Azure Cosmos DB SDKs to get the ProvisionedThroughput, or using ProvisionedThroughput in Azure Monitor are not recommended and will lead to inaccurate results. These metrics represent billed throughput with a delay and shouldn't be used for scaling decisions.

Question 5

How to interpret ProvisionedThroughput in dynamic autoscale and autoscale?

Accepted Answer

You can use the Provisioned Throughput to see the RU/s you're billed for in each hour.

In dynamic autoscale, Azure monitor Provisioned Throughput metric and the various SDK API metrics listed below reflect the aggregated highest RU/s scaled to per hour across each partition and region.

In autoscale, Azure monitor Provisioned Throughput metric and the various SDK API metrics listed below reflect the RU/s of the most active partition, uniformly applied across all partitions and regions, aggregated per hour.

SDK Type	Provisioned Throughput Represented by
.NET V3	Container.ReadThroughputAsync() Container.ReadThroughputAsync(requestOptions).Resource.Throughput
Java V4	cosmosContainer.readThroughput().getProperties().getManualThroughput()
Python	azure.cosmos.throughputproperties.offer_throughput
Node JS	azure.cosmos.OfferDefinition.offerThroughput

Question 6

What happens to databases or containers that were created in the earlier autopilot tier model?

Accepted Answer

Resources that were created in the earlier tier model are automatically supported in the new autoscale custom maximum RU/s model. The upper bound of the tier becomes the new maximum RU/s, which results in the same scale range.

For example, if you previously selected the tier that scaled between 400 RU/s and 4,000 RU/s, the database or container now shows a maximum RU/s of 4,000 RU/s, which scales between 400 RU/s and 4,000 RU/s. Then, you can change the maximum RU/s to a custom value based on your workload.

Question 7

What's the entry point RU/s for autoscale?

Accepted Answer

Starting in April 2022, you can set autoscale with a maximum RU/s as low as 1,000 RU/s (scales between 100 RU/s and 1,000 RU/s). You also can set a scale range of 200 RU/s to 2,000 RU/s or 300 RU/s to 3,000 RU/s. Previously, the entry point was 400 RU/s to 4,000 RU/s.

We recommend this configuration for workloads that have low throughput requirements, but which still might scale to the maximum RU/s.

Question 8

How quickly does autoscale scale up based on increases in traffic?

Accepted Answer

With autoscale, the system scales the throughput (RU/s) T up or T down within the range of 0.1 × Tmax to Tmax based on incoming traffic. Because the scaling is automatic and instantaneous, at any point in time, you can consume up to the provisioned Tmax with no delay.

Question 9

How do I determine what RU/s the system is currently scaled to?

Accepted Answer

Use Azure Monitor metrics to monitor both the provisioned autoscale maximum RU/s and the current throughput (RU/s) the system is scaled to.

Question 10

What's the pricing for autoscale?

Accepted Answer

Each hour, you're billed for the highest throughput T the system scaled to within that hour. If your resource had no requests during the hour or didn't scale beyond 0.1 × Tmax, you're billed for the minimum of 0.1 × Tmax. For details, see the Azure Cosmos DB pricing page.

Question 11

How does autoscale show up on my bill?

Accepted Answer

In single-write region accounts, the autoscale rate per 100 RU/s is 1.5 times the rate of standard (manual) provisioned throughput. Your bill shows the existing standard provisioned throughput meter. The quantity of this meter is multiplied by 1.5. For example, if the highest RU/s the system scaled to within an hour was 6,000 RU/s, you're billed 60 × 1.5 = 90 units of the meter for that hour.

In accounts that have multiple-write regions, the autoscale rate per 100 RU/s is the same as the rate for standard (manual) provisioned multiple-write region throughput. Your bill shows the existing multiple-write regions meter. Because the rates are the same, if you use autoscale, you see the same quantity as for standard throughput.

Question 12

Does autoscale work with reserved capacity?

Accepted Answer

Yes. With reserved capacity for accounts with single-write regions, the reservation discount for autoscale resources is applied to the meter usage at a ratio of 1.5 times the ratio of the specific region. For example, if you want to use reserved capacity to cover 10,000 autoscale RU/s, you should plan to purchase 15,000 RU/s of reserved capacity overall.

Multi-write region reserved capacity works the same for autoscale and standard (manual) provisioned throughput. For more information, see Azure Cosmos DB reserved capacity.

Question 13

Does autoscale work with the Azure Cosmos DB free tier?

Accepted Answer

Yes. In the free tier, you can use autoscale throughput on a database or on a container. Learn more about how free tier billing works with autoscale.

Question 14

Is autoscale supported for all APIs?

Accepted Answer

Yes. Autoscale is supported for all APIs: NoSQL, Gremlin, Table, Cassandra, and MongoDB.

Question 15

Is autoscale supported for multi-region write accounts?

Accepted Answer

Yes. The maximum RU/s is available in each region that you add to the Azure Cosmos DB account.

Question 16

How do I enable autoscale on new databases or containers?

Accepted Answer

Learn how to enable autoscale.

Question 17

Can I enable autoscale on an existing database or container?

Accepted Answer

Yes. You can also switch between autoscale and standard (manual) provisioned throughput. Currently, for all APIs, you can use the Azure portal, Azure CLI, PowerShell or one of the Azure Management SDKs to do these operations. You can't use the Azure Cosmos DB client SDKs or an Azure Resource Manager template to migrate between manual provisioned throughput and autoscale. However, you can use client SDKs or an Azure Resource Manager template to create new autoscale resources and to change the maximum RU/s on an existing autoscale resource.

Question 18

How does the migration between autoscale and standard (manual) provisioned throughput work?

Accepted Answer

Conceptually, changing the throughput type is a two-stage process. First, you send a request to change the throughput settings to use either autoscale or manual provisioned throughput. In both cases, the system automatically determines and sets an initial RU/s value based on current throughput settings and storage. During this step, no user-provided RU/s value is accepted. Then, after the update is complete, you can change the RU/s to accommodate your workload.

Migrate from standard (manual) provisioned throughput to autoscale

For a container, use the following formula to estimate the initial autoscale maximum RU/s:

MAX(1,000, current manual provisioned RU/s, maximum RU/s ever provisioned / 10, storage in GB × 10) rounded to the nearest 1,000 RU/s.

The actual initial autoscale maximum RU/s might vary depending on your account configuration.

Example #1: You have a container that has a 10,000 RU/s manual provisioned throughput and 25 GB of storage. When you enable autoscale, the initial autoscale maximum RU/s is 10,000 RU/s, which can scale between 1,000 RU/s and 10,000 RU/s.

Example #2: You have a container that has a 50,000 RU/s manual provisioned throughput and 25,000 GB of storage. When you enable autoscale, the initial autoscale maximum RU/s is 250,000 RU/s, which can scale between 25,000 RU/s and 250,000 RU/s.

Migrate from autoscale to standard (manual) provisioned throughput

The initial manual provisioned throughput is equal to the current autoscale maximum RU/s.

Example: You have an autoscale database or container that has a maximum RU/s of 20,000 RU/s (scales between 2,000 RU/s and 20,000 RU/s). When you update to use manual provisioned throughput, the initial throughput is 20,000 RU/s.

If you need to migrate a large number of throughput resources, consider using the Azure CLI.

Question 19

Can I use the Azure CLI, PowerShell, or Azure Resource Manager to manage databases or containers that use autoscale?

Accepted Answer

Yes. To programmatically enable autoscale on an existing database or container, you can use the Azure CLI or PowerShell.

To create a new database or container that uses autoscale, you can use the Azure CLI, PowerShell, or an Azure Resource Manager template.

Question 20

Is autoscale supported for shared throughput databases?

Accepted Answer

Yes. To enable autoscale for a shared throughput database, when you create the database, select autoscale and the Provision throughput option.

Question 21

How many containers are allowed per shared throughput database when autoscale is enabled?

Accepted Answer

Azure Cosmos DB enforces a maximum of 25 containers in a shared throughput database. The maximum applies to databases that have either autoscale or standard (manual) throughput.

Question 22

How does autoscale affect the database consistency level?

Accepted Answer

Autoscale has no effect on the consistency level of a database.

For more information, see Consistency levels.

Question 23

What storage limit is associated with each maximum RU/s option?

Accepted Answer

The storage limit in GB for each maximum RU/s is the maximum RU/s of the database or container divided by 10. For example, if the maximum RU/s is 20,000 RU/s, the resource can support 2,000 GB of storage.

For available maximum RU/s and storage options, see Provision throughput autoscale limits.

Question 24

What happens if I exceed the storage limit associated with my maximum throughput?

Accepted Answer

If the storage limit associated with the maximum throughput of the database or container is exceeded, Azure Cosmos DB automatically increases the maximum throughput to the next highest RU/s that can support that level of storage.

For an example scenario, if you start with a maximum RU/s of 50,000 RU/s (scales between 5,000 RU/s and 50,000 RU/s), you can store up to 5,000 GB of data. If your storage size increases to 5,001 GB, storage is now 6,000 GB and the new maximum RU/s is 60,000 RU/s (scales between 6,000 RU/s and 60,000 RU/s).

Question 25

Can I change the maximum RU/s on a database or container?

Accepted Answer

Yes. For more information, see How to provision autoscale throughput.

When you change the maximum RU/s, depending on the requested value, the asynchronous operation might take 4 to 6 hours to finish. Learn more.

Question 26

How do I increase the maximum RU/s?

Accepted Answer

When you send a request to increase the maximum RU/s Tmax, depending on the maximum RU/s selected, the service provisions more resources to support the higher maximum RU/s. While this is happening, your existing workload and operations aren't affected. The system continues to scale your database or container between the previous 0.1 × Tmax and Tmax until the new scale range of 0.1 × Tmax_new to Tmax_new is ready.

Question 27

How do I lower the maximum RU/s?

Accepted Answer

When you lower the maximum RU/s, the minimum value you can set it to is MAX(1,000, highest maximum RU/s ever provisioned / 10, current storage in GB × 10) rounded to the nearest 1,000 RU/s.

Example #1: You have an autoscale container that has a maximum RU/s of 20,000 RU/s (scales between 2,000 RU/s and 20,000 RU/s) and 1,500 GB of storage. The lowest, minimum value you can set maximum RU/s to is MAX(1,000, 20,000 / 10, 1,500 × 10) = 15,000 RU/s (scales between 1,500 RU/s and 15,000 RU/s).

Example #2: You have an autoscale container that has a maximum RU/s of 100,000 RU/s and 100 GB of storage. Now, you scale maximum RU/s up to 150,000 RU/s (scales between 15,000 RU/s and 150,000 RU/s). The lowest, minimum value you can now set maximum RU/s to is MAX(1,000, 150,000 / 10, 100 × 10) = 15,000 RU/s (scales between 1,500 RU/s and 15,000 RU/s).

For a shared throughput database, when you lower the maximum RU/s, the minimum value you can set it to is MAX(1,000, highest maximum RU/s ever provisioned / 10, current storage in GB × 10, 1,000 + (MAX(Container count - 25, 0) × 1,000)) rounded to the nearest 1,000 RU/s.

These formulas and examples apply to the minimum autoscale maximum RU/s you can set. They're separate from the 0.1 × Tmax to Tmax range that the system automatically scales to. Regardless of the maximum RU/s, the system always scales between 0.1 × Tmax and Tmax.

Question 28

How does TTL work with autoscale?

Accepted Answer

Time to Live (TTL) operations don't affect the scaling of RU/s in autoscale. Any RUs that are consumed due to TTL aren't part of the billed RU/s of the autoscale container.

For example, for an autoscale container that has 400 RU/s to 4,000 RU/s:

Hour 1: T=0: The container has no usage (no TTL or workload requests). The billable RU/s is 400 RU/s.
Hour 1: T=1: TTL is enabled.
Hour 1: T=2: The container starts to get requests. The requests consume 1,000 RU/s in 1 second. 200 RU/s worth of TTL are used. The billable RU/s is still 1,000 RU/s. Regardless of when the TTL deletes occur, they don't affect the autoscale scaling logic.

Question 29

How does maximum RU/s map to physical partitions?

Accepted Answer

When you first select the maximum RU/s, Azure Cosmos DB provisions by dividing the maximum RU/s by 10,000 RU/s to get the number of physical partitions that are required. Each physical partition can support up to 10,000 RU/s and 50 GB of storage. As storage size increases, Azure Cosmos DB automatically splits partitions to add more physical partitions to handle the storage increase. If storage exceeds the associated limit, Azure Cosmos DB increases the maximum RU/s.

The maximum RU/s of the database or container is divided evenly across all physical partitions. The total throughput that any single physical partition can scale to is the maximum RU/s of the database or container divided by the number of physical partitions.

Question 30

What happens if incoming requests exceed the maximum RU/s of the database or container?

Accepted Answer

If the overall consumed RU/s exceeds the maximum RU/s of the database or container, requests that exceed the maximum RU/s are throttled and return a code 429 status. Requests that result in more than 100 percent normalized utilization are throttled. Normalized utilization is defined as the maximum of the RU/s utilization across all physical partitions.

For example, your maximum throughput is 20,000 RU/s and you have two physical partitions, P_1 and P_2. Each partition is capable of scaling to 10,000 RU/s. In any given second, if P_1 used 6,000 RUs and P_2 used 8,000 RUs, the normalized utilization is MAX(6,000 RU / 10,000 RU, 8,000 RU / 10,000 RU) = 0.8.

Note

The Azure Cosmos DB client SDKs and data import tools (Azure Data Factory, the bulk executor library) automatically retry after a code 429 error is returned, so occasional code 429 errors aren't problematic. A sustained high number of code 429 errors might indicate that you need to increase the maximum RU/s or review your partitioning strategy to include a hot partition.

Question 31

Can throttling or rate limiting errors occur when autoscale is enabled?

Accepted Answer

Yes. It's possible to see code 429 errors in two scenarios.

First, when the overall consumed RU/s exceeds the maximum RU/s of the database or container, the service throttles requests accordingly.

Second, if a logical partition key value has a disproportionately higher number of requests compared to other partition key values, like in a hot partition, the underlying physical partition might exceed its RU/s budget. As a best practice, to avoid hot partitions, choose a good partition key that results in an even distribution of both storage and throughput.

For example, if you select the 20,000 RU/s maximum throughput option and you have 200 GB of storage, if you have four physical partitions, each physical partition can be autoscaled up to 5,000 RU/s. If a hot partition is on a specific logical partition key, you'll see code 429 errors when the underlying physical partition it resides in exceeds 5,000 RU/s or 100 percent normalized utilization.

Seeing code 429 errors when you use autoscale doesn't necessarily indicate an issue with your database or container. Generally for a production workload, if between 1 percent and 5 percent of requests have code 429 errors and your end-to-end latency is acceptable, the errors are a healthy sign that the RU/s is being fully utilized. No action is required.

Learn how to interpret and debug code 429 rate limiting errors.

Share via

What is the difference between autoscale and dynamic autoscale in Azure Cosmos DB?

How can I enable dynamic autoscale on an account programatically?

Can normalized RU/s consumption be 100 percent even if Provisioned Throughput or Autoscaled RU metrics do not scale to the maximum RU/s?

What metric should be used to determine whether the autoscale max RU/s or manual provisioned RU/s can be scaled up or down programatically?

How to interpret ProvisionedThroughput in dynamic autoscale and autoscale?

What happens to databases or containers that were created in the earlier autopilot tier model?

What's the entry point RU/s for autoscale?

How quickly does autoscale scale up based on increases in traffic?

How do I determine what RU/s the system is currently scaled to?

What's the pricing for autoscale?

How does autoscale show up on my bill?

Does autoscale work with reserved capacity?

Does autoscale work with the Azure Cosmos DB free tier?

Is autoscale supported for all APIs?

Is autoscale supported for multi-region write accounts?

How do I enable autoscale on new databases or containers?

Can I enable autoscale on an existing database or container?

How does the migration between autoscale and standard (manual) provisioned throughput work?

Can I use the Azure CLI, PowerShell, or Azure Resource Manager to manage databases or containers that use autoscale?

Is autoscale supported for shared throughput databases?

How many containers are allowed per shared throughput database when autoscale is enabled?

How does autoscale affect the database consistency level?

What storage limit is associated with each maximum RU/s option?

What happens if I exceed the storage limit associated with my maximum throughput?

Can I change the maximum RU/s on a database or container?

How do I increase the maximum RU/s?

How do I lower the maximum RU/s?

How does TTL work with autoscale?

How does maximum RU/s map to physical partitions?

What happens if incoming requests exceed the maximum RU/s of the database or container?

Can throttling or rate limiting errors occur when autoscale is enabled?

Next steps

Share via

Frequently asked questions about autoscale throughput in Azure Cosmos DB

What is the difference between autoscale and dynamic autoscale in Azure Cosmos DB?

How can I enable dynamic autoscale on an account programatically?

Can normalized RU/s consumption be 100 percent even if Provisioned Throughput or Autoscaled RU metrics do not scale to the maximum RU/s?

What metric should be used to determine whether the autoscale max RU/s or manual provisioned RU/s can be scaled up or down programatically?

How to interpret ProvisionedThroughput in dynamic autoscale and autoscale?

What happens to databases or containers that were created in the earlier autopilot tier model?

What's the entry point RU/s for autoscale?

How quickly does autoscale scale up based on increases in traffic?

How do I determine what RU/s the system is currently scaled to?

What's the pricing for autoscale?

How does autoscale show up on my bill?

Does autoscale work with reserved capacity?

Does autoscale work with the Azure Cosmos DB free tier?

Is autoscale supported for all APIs?

Is autoscale supported for multi-region write accounts?

How do I enable autoscale on new databases or containers?

Can I enable autoscale on an existing database or container?

How does the migration between autoscale and standard (manual) provisioned throughput work?

Can I use the Azure CLI, PowerShell, or Azure Resource Manager to manage databases or containers that use autoscale?

Is autoscale supported for shared throughput databases?

How many containers are allowed per shared throughput database when autoscale is enabled?

How does autoscale affect the database consistency level?

What storage limit is associated with each maximum RU/s option?

What happens if I exceed the storage limit associated with my maximum throughput?

Can I change the maximum RU/s on a database or container?

How do I increase the maximum RU/s?

How do I lower the maximum RU/s?

How does TTL work with autoscale?

How does maximum RU/s map to physical partitions?

What happens if incoming requests exceed the maximum RU/s of the database or container?

Can throttling or rate limiting errors occur when autoscale is enabled?

Next steps

Feedback

Additional resources