Cosmos DB NoSQL LIMIT OFFSET performance - .NET SDK | Azure Portal

VD 66 Reputation points
2024-06-10T10:38:37.92+00:00

Hi Team,

As per MS documentation (Azure Cosmos Db for NoSQL), it is mentioned that OFFSET LIMIT query RU charge significantly increases as OFFSET increases and continuation token is the recommended approach.

https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/query/offset-limit

I was trying to reproduce same results and came across some observations while executing few SQL queries using OFFSET and LIMIT for pagination -

  1. If I execute a simple query (just a normal select with order by) from azure portal with LIMIT and OFFSET it shows higher RU in query metrics as OFFSET gets increased as per documentation - This seems to be because it is always resulting in full scan - The count of Retrieved documents and Output documents from Query Metrics indicates that.
  2. If same simple query is executed with .NET SDK v3, it results in less RU even if OFFSET increases. This might be because the only documents corresponding to LIMIT count seems to be scanned - it can be seen from count of Retrieved documents and Output documents from ServerSideCumulativeMetrics of response.
  3. If I try to execute some complex query e.g. involving join - it results in higher RUs in both cases i.e. from azure portal as well as .NET SDK, which seems to be because of the same reason of full scan - The count of Retrieved documents vs Output documents indicates that in both cases now.

I didn't find any documentation for how .NET SDK optimizes these queries but not when same is executed from azure portal.

There is a link which specifies some optimizations for LIMIT OFFSET, which mentions that these are not applicable for queries containing joins or subqueries, aggregates.

https://devblogs.microsoft.com/cosmosdb/introducing-a-new-system-function-and-optimized-query-operators/

Please let me know if this is correct understanding or is anything missing here?

Thank you!

.NET
.NET
Microsoft Technologies based on the .NET software framework.
3,536 questions
Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,501 questions
{count} votes

1 answer

Sort by: Most helpful
  1. ShaktiSingh-MSFT 14,191 Reputation points Microsoft Employee
    2024-06-11T03:51:27.55+00:00

    Hi VD •,

    Welcome to Microsoft Q&A forum.

    As I understand, you want to know Cosmos DB NoSQL LIMIT OFFSET performance in .NET SDK and Azure Portal.

    • The RU charge of a query with OFFSET LIMIT increases as the number of terms being offset increases. For queries that have multiple pages of results, we typically recommend using continuation tokens. Continuation tokens are a "bookmark" for the place where the query can later resume. If you use OFFSET LIMIT, there's no "bookmark." If you wanted to return the query's next page, you would have to start from the beginning.
    • You should use OFFSET LIMIT for cases when you would like to skip items entirely and save client resources. For example, you should use OFFSET LIMIT if you want to skip to the 1000th query result and have no need to view results 1 through 999. On the backend, OFFSET LIMIT still loads each item, including those items that are skipped. The performance advantage is measured in reducing client resources by avoiding processing items that aren't needed.

    In the .NET SDK and Java SDK, you can optionally use continuation tokens as a bookmark for your query's progress. Azure Cosmos DB for NoSQL query executions are stateless at the server side and can be resumed at any time using the continuation token. For the Python SDK, continuation tokens are only supported for single partition queries. The partition key must be specified in the options object because it's not sufficient to have it in the query itself.

    Here are some example for using continuation tokens:

    If the query returns a continuation token, then there are extra query results.

    In Azure Cosmos DB for NoSQL's REST API, you can manage continuation tokens with the x-ms-continuation header. As with querying with the .NET or Java SDK, if the x-ms-continuation response header isn't empty, it means the query has extra results.

    As long as you're using the same SDK version, continuation tokens never expire. You can optionally restrict the size of a continuation token. Regardless of the amount of data or number of physical partitions in your container, queries return a single continuation token.

    Refer Tips & Tricks for Query Pagination in Azure Cosmos DB for more insight into query performance.

    Also, take a look at Stackoverflow thread for reference: https://stackoverflow.com/questions/68643111/cosmos-db-net-sdk-v3-query-with-paging-example-needed

    More on query performance tips for Azure Cosmos DB SDKs:

    https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/cosmos-db/nosql/performance-tips-query-sdk.md#query-performance-tips-for-azure-cosmos-db-sdks

    Let me know if this helps or you have more queries.

    Thanks.

    0 comments No comments