Hierarchical partition not distributed equally across physical partitions

Louis Lavens 26 Reputation points
2023-03-24T10:52:52+00:00

I have been working with hierarchical partitions and noticed some behavior where documents with the same primary partition key are always put into the same physical partition. Even though they all have a different secondary partition key and I have more than one physical partition provisioned.

I was able to reproduce this by creating a completely new container with 20K RU provisioned to it. With a hierarchical partitioning containing two levels: customer id and resource id. When only upserting data for a single customer but for different resources I notice a single physical partition being maxed out while the other available partitions are at 0%. When upserting data for multiple customers I can see load on multiple physical partitions at the same time.

This doesn't seem like the expected behavior that multiple logical partitions should be distributed across the available physical partitions equally.

I currently have a use case where I need to be able to support more than 1000 document updates / s for a single customer. But if they are always put in a single physical partition I will be hitting the quota on physical partition limits. I'm also talking here about low-volume data, so it is unlikely that a single customer is able to fill up an entire physical partition.

Is what I'm describing above expected behavior of the hierarchical partitions? Is there any way to change this behavior so the documents are distributed across the physical partitions?

Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,906 questions
{count} votes

Accepted answer
  1. Vahid Ghafarpour 23,385 Reputation points Volunteer Moderator
    2023-06-04T18:32:16.4133333+00:00

    If you need to distribute the load across multiple physical partitions for a single customer, you have a few options:

    Modify your partitioning strategy: Consider using a different partition key scheme that can distribute the load more evenly. For example, you could include a customer-specific identifier in the partition key to create multiple logical partitions for that customer. However, changing the partition key may require migrating existing data and can have implications on query performance.

    Increase the provisioned throughput: If you anticipate a high update rate for a single customer, you can provision higher throughput (RU/s) for the container. This will allow the customer's documents to consume more resources from the physical partition they are assigned to.

    Consider using multiple containers: If distributing the load evenly across physical partitions is critical, you can consider using multiple containers with separate partition keys for different customers. This approach ensures that each customer's data is stored in separate physical partitions.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.