Setup 1
We have Cosmos DB (setup as serverless) that contains approx. 11,000,000 documents in one container. The container is partitioned by the property storageContainerName
(see the document JSON below). Currently we have 36 logical partitions and the number of documents vary from 800,000 to 10,000 in the different partitions (in the longer term we expect the number of documents to be evenly distributed among the partitions).
The document structure is identical to all documents in the database and it looks like below.
{
"id": "5c4eab89-3a16-45af-b54d-bc243c36e5b5:blobs:storage:recordings:20230412:88bc0094-5114-42be-a893-6417ca74fdca-0_C:20230412055549629:V1:20230412062831098-20230412062848128",
"storageContainerName": "5c4eab89-3a16-45af-b54d-bc243c36e5b5",
"sizeInBytes": 676807,
"startTime": "20230412062831098",
"videoSourceId": "88bc0094-5114-42be-a893-6417ca74fdca-0",
"ttl": 2592019,
"_rid": "vpx1AKoCsb2B5QAAAAAAAA==",
"_self": "dbs/vpx1AA==/colls/vpx1AKoCsb0=/docs/vpx1AKoCsb2B5QAAAAAAAA==/",
"_etag": ""1800964f-0000-0c00-0000-64364fa10000"",
"_attachments": "attachments/",
"_ts": 1681280929
}
The indexing policy of the container looks like below (default index everything).
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*"
}
],
"excludedPaths": [
{
"path": "/"_etag"/?"
}
]
}
Query 1
When executing the following query for the partition 89bd41d7-ad89-46cf-80ad-4c14ff012e70 that contains 11,000 documents
select sum(c.sizeInBytes) from c where c.storageContainerName = "89bd41d7-ad89-46cf-80ad-4c14ff012e70"
we get the following query stats.
Request Charge: 2225.94 RUs
Retrieved document count: 0
Retrieved document size: 0 bytes
Output document count: 1
Output document size: 235 bytes
Index hit document count: 0
Index lookup time: 16605.2899 ms
Document load time: 0 ms
Query engine execution time: 0.050100000000000006 ms
System function execution time: 0 ms
User defined function execution time: 0 ms
Document write time: 0 ms
Query 2
When executing the following query for the partition 89bd41d7-ad89-46cf-80ad-4c14ff012e70 that contains 11,000 documents
select c.startTime from c where c.storageContainerName = "89bd41d7-ad89-46cf-80ad-4c14ff012e70" and c.videoSourceId = "5862401d-d172-49f2-a7b4-e4e284a88c71-0" and c.startTime > "20230508113340124" and c.startTime < "20230509113340124"
we get the following query stats.
Request Charge: 389.08000000000004 RUs
Retrieved document count: 13029
Retrieved document size: 7746492 bytes
Output document count: 5095
Output document size: 173379 bytes
Index hit document count: 5108.88
Index lookup time: 5.46 ms
Document load time: 9351.130000000001 ms
Query engine execution time: 114.16000000000001 ms
System function execution time: 0 ms
User defined function execution time: 0 ms
Document write time: 5.510000000000001 ms
Setup 2
We have set up another Cosmos database with identical configuration as the one in Setup 1 above, but with 281,000 documents in the container. The difference compared to Setup 1 is that we have only one partition so far in that database.
Query 1
When executing the following query for the partition 5c4eab89-3a16-45af-b54d-bc243c36e5b5 that contains 281,000 documents
select sum(c.sizeInBytes) from c where c.storageContainerName = "5c4eab89-3a16-45af-b54d-bc243c36e5b5"
we get the following query stats.
Request Charge: 74.7 RUsRetrieved document count: 0
Retrieved document size: 0 bytes
Output document count: 1
Output document size: 89 bytes
Index hit document count: 0
Index lookup time: 18.3 ms
Document load time: 0 ms
Query engine execution time: 0.02 ms
System function execution time: 0 ms
User defined function execution time: 0 ms
Document write time: 0 ms
Query2
When executing the following query for the partition 5c4eab89-3a16-45af-b54d-bc243c36e5b5 that contains 281,000 documents
select c.startTime from c where c.storageContainerName = "5c4eab89-3a16-45af-b54d-bc243c36e5b5" and c.videoSourceId = "88bc0094-5114-42be-a893-6417ca74fdca-0" and c.startTime > "20230507000035000" and c.startTime < "20230508000035000"
we get the following query stats.
Request Charge: 160.85 RUs
Retrieved document count: 5065
Retrieved document size: 3008610 bytes
Output document count: 5065
Output document size: 172261 bytes
Index hit document count: 5065
Index lookup time: 0.89 ms
Document load time: 73.62 ms
Query engine execution time: 6.88 ms
System function execution time: 0 ms
User defined function execution time: 0 ms
Document write time: 0.62 ms
Comparison
In both setups we run queries against a single partition (i.e. no cross partition queries).
In Setup 1 we run the queries against a partition that contains 11,000 documents.
In Setup 2 we run the queries against a partition that contains 281,000 documents.
Query 1: Despite the partition in Setup 2 has 25 times more documents than the one in Setup 1 the execution time is almost 1,000 times faster in Setup 2 and the RU consumption is 30 times higher in Setup 1!!!
Query 2: Despite the partition in Setup 2 has 25 times more documents than the one in Setup 1 the execution time is almost 10 times faster in Setup 2 and the RU consumption is almost 2.5 times higher in Setup 1!!!
If the RU consumption and the execution times increase dramatically with the number of partitions then the solution will not scale. It seems like the indexes are global and not on a partition level. Is this how it is supposed to work?
N.B.
Another odd behaviour that we have run into is that if you run a sum on a property like below
select sum(c.sizeInBytes) from c where c.storageContainerName = "89bd41d7-ad89-46cf-80ad-4c14ff012e70"
the execution time decreases by a factor 1,000 and the RUs decreases by a factor 30 if the summed property has just a few possible values compared to when the property has no duplicate values at all.