Can I speed up transmission time of large data sets in Cosmos DB?

asked 2021-11-09T09:06:32.467+00:00
Iain White 121 Reputation points

I've got a large set of data and I'm trying to speed up the time it takes to return the records using the Cosmos SDK.

I've read the documentation on indexing and also watched this ( youtube video, making sure things like the region are set correctly.

However, it's still taking around 4 seconds to return around 17000 documents from a container.

  • The documents aren't particularly big, with only 5 fields (id, deviceId, time, measurement, errorMessage)
  • I've done experimentation with different partition keys, including having the records all within the same partition (the data I'm experimenting with only includes one device ID) - I've also used the unique 'id' as the partition key.
  • My query looks like this - 'select * from c where c.deviceId = 'XXXX' order by c.time'
  • I've seen that limiting the amount of data (by adding 'OFFSET 0 LIMIT 10') vastly reduces the response time, so this leads me to think that the fetching of the data isn't talking that much time it must be able to find all the records and sort them, but it's just taking time to return them.
  • When I use the portal's data explorer, it is taking roughly the same amount of time to return the results (after I set the 'Query results per page' in the settings to 20000)

No matter what I try I can't seem to get significant gains on the performance and get it below a second as I want.

Is 4 seconds longer than you'd expect? Is there anything else you could suggest I try? Will I be able to get all of these documents back in less than a second? Or is this simply how long you'd expect for this amount of data to be transmitted?

Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
910 questions
{count} votes

1 answer

Sort by: Most helpful
  1. answered 2021-12-02T21:22:29.97+00:00
    Oury Ba-MSFT 9,561 Reputation points Microsoft Employee

    Hi @Iain White Sorry for the delay in response to your question.

    You might want to create composite index for deviceId and time field combination. Refer to this blog:

    Common troubleshooting steps