Cosmos DB design and index / partition optimization

randomrabbit 116 Reputation points
2022-04-20T11:16:12.447+00:00

Hello,

I'm about to use Cosmos DB for storing streaming data where there will be a JSON object that changes in content. Which of the below examples will be a) faster to query and b) more RU efficient?

Version 1:

{
   _id: <string>,
   data: {
     fieldname: <any type>
  }
}

where the "fieldname" will change based on the data sent to the DB. Option 2:

{
   _id: <string>,
   data: {
     name: <string>,
     value: <any type>
  }
}

So in option 1. I would have a data object where the field name can change. In option 2 data.name always exists but has a value that changes.

What if I have to extend the model to store arrays of data, does that change the answer?

Edit: My queries will be based on filtering by the data names.
Edit: Can option 1 support partitioning based on a changing fieldname?

Thanks!

Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,060 questions
0 comments No comments
{count} votes

Accepted answer
  1. Anurag Sharma 17,386 Reputation points
    2022-04-20T14:21:49.09+00:00

    Hi @randomrabbit , welcome to Microsoft Q&A forum.

    If I understand correctly you want to know which of these sample document would be faster to update and consume less RUs.

    Updating the specific properties for documents you provided would not see any significant differences as long as both the document have similar size. Normally we retrieve entire document and then make changes to properties and update it. If you extend the models then size increases and so do the RU charges.

    The RUs depend on multiple factors as mentioned below:

    • Item size (in your case size looks similar for both)
    • Item indexing
    • Type of reads
    • Indexed properties

    There are many others factors to consume less RUs and make query faster (refer article). However if we are doing point reads (based on id or partition key) then query will be faster.

    Another way to reduce the RUs is to use the partial document update feature where in we can update specific fields/properties in a single document without needing to perform a full document read-replace operation.

    Now this solves the problem specific to cases mentioned in your 2 sample document because for both of them we are just going to retrieve the specific properties and replace that. As an example:

    {  
        _id: <string>,  
        data: {  
          fieldname: <any type>  
       }  
     }  
    

    For this document we will just need '_id' and 'data/fieldname'.

    {  
        _id: <string>,  
        data: {  
          name: <string>,  
          value: <any type>  
       }  
     }  
    

    For this document, we will just need '_id' and 'data/value'.

    Now to retrieve this document based on id, we can write something like below:

    ItemResponse<Object> response = await container.PatchItemAsync<Object>(  
                    id: "b864acc7-8cfe-47ff-b04c-c3f2a92a34df",                  
                    partitionKey: new PartitionKey("test"),  
                    patchOperations: new[] { PatchOperation.Replace("/data/value", "value2") }  
                );  
    

    Even if we extend the document to store very complex document this is going to be same request unit charge as we are just patching the properties that we need to update:

    Multiple benefits: Partial Document Update

    • Reduced network call payload, avoiding whole document to be sent on the wire
    • Avoiding extra “READ “operation for OCC check by the client and hence saving on the extra read RU charges
    • Significant savings on end-to-end latency for a modifying the document
    • Avoid extra CPU cycles on client side to read doc, perform occ checks, locally patch document & then send it over the wire as replace API call

    Only limitation here is we are querying based on id property of document.

    In any case, having a well designed partition key is most important.

    Please let us know if this helps or else we can discuss further.

    ----------

    If answer is helpful please click on 194699-image.png as it could help other members of the Microsoft Q&A community who have similar questions and are looking for solutions. Thank you for helping to improve Microsoft Q&A!

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Hasan Savran 321 Reputation points MVP
    2022-04-20T12:54:52.38+00:00

    Both models you shared are simple and I don't think it will change that much when it comes to RU.
    "Faster to query" and "more RU efficient" will mostly depend on your partition key and what is indexed in your situation.
    If you are planning to have more than 50 GB data in this container, take your time to pick a correct partition key.
    It sounds like name is a good candidate for partition key if its cardinality is high. Just be sure that you will not reach 20GB limit per partition.
    If you will not use "value" property in your where clauses, you may exclude it to save some money for storage.

    1 person found this answer helpful.