Share via

Cosmos DB Optimization

Bubba Jones 221 Reputation points
2022-07-04T12:29:37.687+00:00

I am new to cosmos trying to optimize my DB design as much as possible however I feel sometimes there is conflicting information as to what the best approach should be. I have read that fields that are frequently updated should be separated from the main container and put in a separate container. On the other hand, that creates the need for more RU's to get the same information.

Approach 1 is to put everything in 1 container. In the following example, AverageRating, Views and Favorites will be very frequently updated:

Example 1:

    public class ForumPost   
    {  
        [JsonProperty(PropertyName = "partitionKey")]  
        public Guid Id { get; set; }  
        public string ForumText { get; set; }  
        public bool Approved { get; set; }  
     // The following fields are updated frequently  
        public double AverageRating { get; set; }  
        public int Views{ get; set; }  
        public int Favorites { get; set; }  
    }  
  

The above means I can get all the relevant data from a single read of the above container. This is what cosmos is to be all about, reducing joins of multiple tables etc. However I have read that if fields need to be updated virtually all the time, they should be put in a separate container like in the ForumPostInfo container below:

Example 2:

    public class ForumPost   
    {  
        [JsonProperty(PropertyName = "partitionKey")]  
        public Guid Id { get; set; }  
        public string ForumText { get; set; }  
        public bool Approved { get; set; }  
    }  
    public class ForumPostInfo  
    {  
        [JsonProperty(PropertyName = "partitionKey")]  
        public Guid ForumPostId { get; set; }  
     // The following fields are updated frequently  
        public double AverageRating { get; set; }  
        public int Views { get; set; }  
        public int Favorites { get; set; }  
    }  
  

The problem with the above approach is that when I need to perform reads to get all the relevant information, I now have to double my RU's as I have to read data from both containers.

So at this point I am not sure what approach to best take, given that in the example above AverageRating, Views and Favorites will be frequently updated. Maybe its a non issue if they are frequently updated and I can still safely keep them in the bigger container (example 1) that has all the relevant info. Any suggestions?

Azure Cosmos DB
Azure Cosmos DB

An Azure NoSQL database service for app development.

0 comments No comments

Answer accepted by question author

Anurag Sharma 17,636 Reputation points
2022-07-06T07:49:18.437+00:00

Hi @ BubbaJones-9922, thanks for providing more details.

Extending on the details provided by Mark, splitting the document into 2 documents means create 2 separate documents as below:

1st Document:

ForumPost document  
 {  
     "id": "1",  
     "ForumText ": "Random Text",          
     "Approved ": "true"  
}  

2nd Document:
// This will get updated a LOT

{  
     [  
         {  
             "AverageRating  ": "4.3",          
             "Views": "1024",  
             "Favorites": "117"  
         }      
     ]      
 }  

However, the partition key must be present in both the documents for the faster retrieval and also there should be one more property which would distinguish these documents, so only that specific doc will be retrieved.

You can also check the partial document update which might suit your requirements as well. Please check it once and we can discuss more if needed.

Partial document update in Azure Cosmos DB

Was this answer helpful?


1 additional answer

Sort by: Most helpful
  1. Mark Brown - MSFT 2,771 Reputation points Microsoft Employee Moderator
    2022-07-04T15:47:53.32+00:00

    Our guidance is to put highly updated properties into a separate document, not a separate collection. The two documents should share the same logical partition key value and include a discriminator property to allow you to inspect and deserialize each document into its corresponding data model class. This allows you to both query all properties with a single request and allows you to execute transactions across this data as well.

    However, whether you do this or not depends entirely upon whether doing so saves on throughput consumption. The only way to determine that is to test both designs and measure. The size of the document matters with larger documents gaining more benefit from shredding versus smaller documents.

    Was this answer helpful?


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.