Usage of Continuation Token in production CosmosDB

Divyanshu Bains 0 Reputation points
2025-04-24T09:25:06.2233333+00:00

My question relates to the working of Continuation Token in CosmosDB. My use case is the following - at some time t1, I want to start fetching some subset of documents from one of my containers in my CosmosDB account. This subset can be fetched using a simple boolean field, isActive, which needs to have a value of true. But the number of documents is huge (~300M) so doing it in one shot is not possible. I need some kind of pagination, and Continuation Token seems to achieve the same. But there are a few issues. To start off, the account is a production account and there are document creations, updates, and deletions happening at all times, even during the process of fetching pages. Secondly, my indexation policy has the following - "excludedPaths": [ { "path": "/*" } ] - and it also has some explicitly mentioned inludedPaths like - "includedPaths": [ { "path": "/myDocId/?" } ].

  1. The issue with this account being a production account is that I'm unsure if, and when, the documents created, updated, or deleted during the document fetching process will appear in my paginated results since I do not know the internal working of Continuation Token. From a requirements perspective, it is acceptable for me to miss the documents which are created, updated, or deleted after time t1 (when the document fetching process starts). However, if a document remains unchanged throughout the document fetching process, it has to appear at least once in my paginated response, i.e. it is acceptable if it appears once or more than once. Does Continuation Token guarantee this? If continuation token internally uses number of documents to skip to find the next page, it may happen that if a document is deleted which had already appeared in one of the previous pages, then the first document in the next page might get skipped.
  2. The issue with my indexation policy is that it explicitly excludes all fields except the specified ones. I'm not sure about this, but maybe it also excludes _rid from indexing, which is believed to be used in Continuation Token. If that is the case, will these fetch page calls become really expensive and slow? If that is indeed the case, can I change my query from "SELECT * FROM c WHERE c.isActive=@isActive" to "SELECT * FROM c.isActive=@isActive ORDER BY c.myDocId" where myDocId is indexed and guaranteed to be unique (however it is not guaranteed to be monotonically increasing or decreasing, if that matters).
  3. Is there a limit on the size of the continuation token? If yes, what would be the behavior of the pagination?
Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,906 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Narendra Pakkirigari 475 Reputation points Microsoft External Staff Moderator
    2025-05-05T11:51:49.7666667+00:00

    Hi Divyanshu Bains,

    1. If you are using the same query text and same SDK version, ContinuationToken guarantees 1 matching document only appear once in the query results. ContinuationToken does not use number of documents to skip to find the next page. (If you are not familiar with ContinuationToken, you can do some local testing according to the scenario customer described to check the content of ContinuationToken, and also confirm that 1 matching document only appear once in the query results.)
    2. seems you already noticed "_rid" will appear in the ContinuationToken. "_rid" is a system defined property and it is not allowed to be added in the "includedPaths" in index policy. "_rid" in the ContinuationToken is not for performance. Instead, "_rid" is necessary part in ContinuationToken to ensure the query can be resumed correctly. If the index policy excludes all fields except the specified ones, it will not make the page calls become expensive and slow. We suggest you indexing specific fields that are used in the query filters. 
    3. User can set the limit on the size of the ContinuationToken. The ContinuationToken contains both required and optional fields. The required fields are necessary for resuming the execution from where it was stooped. The optional fields may contain serialized index lookup work that was done but not yet utilized. This avoids redoing the work again in subsequent continuations and hence improve the query performance. 

    Please let us know if the provided information was helpful. Feel free to reach out if you have any further questions.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.