Cosmos DB Optimization

Question

Cosmos DB Optimization

Bubba Jones 221

I am new to cosmos trying to optimize my DB design as much as possible however I feel sometimes there is conflicting information as to what the best approach should be. I have read that fields that are frequently updated should be separated from the main container and put in a separate container. On the other hand, that creates the need for more RU's to get the same information.

Approach 1 is to put everything in 1 container. In the following example, AverageRating, Views and Favorites will be very frequently updated:

Example 1:

    public class ForumPost   
    {  
        [JsonProperty(PropertyName = "partitionKey")]  
        public Guid Id { get; set; }  
        public string ForumText { get; set; }  
        public bool Approved { get; set; }  
     // The following fields are updated frequently  
        public double AverageRating { get; set; }  
        public int Views{ get; set; }  
        public int Favorites { get; set; }  
    }

The above means I can get all the relevant data from a single read of the above container. This is what cosmos is to be all about, reducing joins of multiple tables etc. However I have read that if fields need to be updated virtually all the time, they should be put in a separate container like in the ForumPostInfo container below:

Example 2:

    public class ForumPost   
    {  
        [JsonProperty(PropertyName = "partitionKey")]  
        public Guid Id { get; set; }  
        public string ForumText { get; set; }  
        public bool Approved { get; set; }  
    }  
    public class ForumPostInfo  
    {  
        [JsonProperty(PropertyName = "partitionKey")]  
        public Guid ForumPostId { get; set; }  
     // The following fields are updated frequently  
        public double AverageRating { get; set; }  
        public int Views { get; set; }  
        public int Favorites { get; set; }  
    }

The problem with the above approach is that when I need to perform reads to get all the relevant information, I now have to double my RU's as I have to read data from both containers.

So at this point I am not sure what approach to best take, given that in the example above AverageRating, Views and Favorites will be frequently updated. Maybe its a non issue if they are frequently updated and I can still safely keep them in the bigger container (example 1) that has all the relevant info. Any suggestions?

0 comments

Answer accepted by question author

Anurag Sharma 17,636

Hi @ BubbaJones-9922, thanks for providing more details.

Extending on the details provided by Mark, splitting the document into 2 documents means create 2 separate documents as below:

1st Document:

ForumPost document  
 {  
     "id": "1",  
     "ForumText ": "Random Text",          
     "Approved ": "true"  
}

2nd Document:
// This will get updated a LOT

{  
     [  
         {  
             "AverageRating  ": "4.3",          
             "Views": "1024",  
             "Favorites": "117"  
         }      
     ]      
 }

However, the partition key must be present in both the documents for the faster retrieval and also there should be one more property which would distinguish these documents, so only that specific doc will be retrieved.

You can also check the partial document update which might suit your requirements as well. Please check it once and we can discuss more if needed.

Partial document update in Azure Cosmos DB

Bubba Jones 221 Reputation points

2022-07-07T15:22:59.15+00:00

Hello AnuragSharma,

Thank you for the clarification. I am now working on a sample where I created two separate documents within the same container. The partition key shared between both documents is the Id of the first document. I am not sure if thats best practice.

Regarding your suggestion that there should be one or more properties that should distinguish the documents, I set a boolean true / false flag for this (true for document#1, false if its document #2) , though I am not convinced this is the best approach either.

Thank you for your patching suggestion. I happened also to stumble across it. I have now successfully patched document #2. This is now a bit off topic, but the problem I see with patching is that you are required to have the id of the document before patching it. The only solution I could come up with was to store the Id of document #2 in document #1 since document #1 is the document I will be doing most work with. Again i'm not sure if this is best practice. The only other alternative is to first read from document #2, acquire the id, then patch document #2 but this approach require's more RU's.
Bubba Jones 221 Reputation points

2022-07-08T08:09:20.973+00:00

@AnuragSharma

I found a workaround to my true / false flag issue by using an attribute in document #2 which does not exist in document #1 . Also regarding having the id for the Document#2, I use Guid's which I create locally then store in the respective objects before creating them in cosmos. That way document #1 already has the Guid if document #2 before the latter is created.

Thanks again for the more detailed clarified response, I needed the example you gave me.. I now consider this topic closed.

1 additional answer

Your answer

Bubba Jones 221 Reputation points

2022-07-07T15:22:59.15+00:00

Hello AnuragSharma,

Thank you for the clarification. I am now working on a sample where I created two separate documents within the same container. The partition key shared between both documents is the Id of the first document. I am not sure if thats best practice.

Regarding your suggestion that there should be one or more properties that should distinguish the documents, I set a boolean true / false flag for this (true for document#1, false if its document #2) , though I am not convinced this is the best approach either.

Thank you for your patching suggestion. I happened also to stumble across it. I have now successfully patched document #2. This is now a bit off topic, but the problem I see with patching is that you are required to have the id of the document before patching it. The only solution I could come up with was to store the Id of document #2 in document #1 since document #1 is the document I will be doing most work with. Again i'm not sure if this is best practice. The only other alternative is to first read from document #2, acquire the id, then patch document #2 but this approach require's more RU's.
Bubba Jones 221 Reputation points

2022-07-08T08:09:20.973+00:00

@AnuragSharma

I found a workaround to my true / false flag issue by using an attribute in document #2 which does not exist in document #1 . Also regarding having the id for the Document#2, I use Guid's which I create locally then store in the respective objects before creating them in cosmos. That way document #1 already has the Guid if document #2 before the latter is created.

Thanks again for the more detailed clarified response, I needed the example you gave me.. I now consider this topic closed.

Answer 1

Mark Brown - MSFT 2,771 Microsoft Employee Moderator

Our guidance is to put highly updated properties into a separate document, not a separate collection. The two documents should share the same logical partition key value and include a discriminator property to allow you to inspect and deserialize each document into its corresponding data model class. This allows you to both query all properties with a single request and allows you to execute transactions across this data as well.

However, whether you do this or not depends entirely upon whether doing so saves on throughput consumption. The only way to determine that is to test both designs and measure. The size of the document matters with larger documents gaining more benefit from shredding versus smaller documents.

Bubba Jones 221

So by keeping them in the same document can I assume you mean to following?

ForumPost document  
{  
    "id": "1",  
	"ForumText ": "Random Text",          
	"Approved ": "true"  
    // This will get updated a LOT  
    "ForumPostInfo" :  
    [  
        {  
            "AverageRating  ": "4.3",          
	        "Views": "1024",  
	        "Favorites": "117"  
        }      
    ]      
}

Below is the C# equivalent (observe the comment within the code).

     public class ForumPost   
     {  
         [JsonProperty(PropertyName = "partitionKey")]  
         public Guid Id { get; set; }  
         public string ForumText { get; set; }  
         public bool Approved { get; set; }  
         //   THIS IS NEW  
         public ForumPostInfo forumPostInfo { get; set; }  
     }  

     public class ForumPostInfo  
     {  
         public double AverageRating { get; set; }  
         public int Views { get; set; }  
         public int Favorites { get; set; }  
     }

Can you please confirm that the above is what you meant by your reply? If my above example is not what you are getting at, could you please provide an example of what you mean by separate document?

Bubba Jones 221 Reputation points

2022-07-05T11:54:05.58+00:00

I would really appreciate a response to my comment on your reply. I have posted another sample to see if we are on the same page.

Share via

Cosmos DB Optimization

1 additional answer

Your answer