How to implement pagination using query_items_change_feed() in Azure Cosmos Python SDK?

Question

How to implement pagination using query_items_change_feed() in Azure Cosmos Python SDK?

Lekesh 0

I am trying to extract historical data from a CosmosDB container using Azure Python SDK. I want to implement pagination when using query_items_change_feed(). I tried specifying max_item_count in the arguments. But it does not seem to follow the specified value. There are no proper documentation on this method for Python SDK.

Is it possible to implement pagination using Azure Python SDK? If yes, how do we implement it?

Code snippet:

Tested this on sample container with 1000 documents. This code created 20 json files with documents data in each files respectively. I notice that while uploading the 1000 documents to the container, it was uploaded like 60, 40, 39, 60, ... and JSON files sizes are also exactly in the same order. Why does this behaviour occur?

with cosmos_client.CosmosClient(self.HOST, self.MASTER_KEY) as client:
	db = client.get_database_client(database=DATABASE_ID)
	container = db.get_container_client(container=CONTAINER_ID)

	change_feed_iterator = container.query_items_change_feed(
		is_start_from_beginning=True,
		max_item_count=5,
		partition_key_range_id=0
	)
	i=1
	page = change_feed_iterator.__iter__()
	for item_page in change_feed_iterator.by_page():
		try:
			json.dump(list(item_page), open(f"batch_data/batch_data_test/_{i}.json", "w"), indent=4)
			i+=1
		except StopIteration:
			break

1 answer

Your answer

Answer 1

GeethaThatipatri-MSFT 29,542 Microsoft Employee Moderator

Hi, @Lekesh Welcome to the Microsoft Q&A platform, thanks for posting the question

Looking at the code, I believe you'll need to call by_page() to create a paginated response. For example, client.query_items_change_feed(...).by_page()

Regards

Geetha

Lekesh 0 Reputation points

2023-02-17T16:18:32.6266667+00:00
Hey @GeethaThatipatri-MSFT !

Thank you for your response.

Yes, I did some research after posting the question. I found that it works only when the

I did some research after posting the question and found that it works only when the max_item_count argument is 100 or above. Below that the response is paged/grouped based on the _ts (timestamp) value created by CosmosDB internally. That is the reason why the response was behaving the way I mentioned in the question.

Here's my updated code:

with cosmos_client.CosmosClient(self.HOST, self.MASTER_KEY) as client: db = client.get_database_client(database=DATABASE_ID) container = db.get_container_client(container=CONTAINER_ID) change_feed_iterator = container.query_items_change_feed( is_start_from_beginning=True, max_item_count=200 ) i=1 pages = change_feed_iterator.by_page() while True: try: items = pages.next() json.dump(list(items), open(f"batch_data/batch_data_test/batch_{i}.json", "w"), indent=4) i+=1 except StopIteration: break
GeethaThatipatri-MSFT 29,542 Reputation points Microsoft Employee Moderator

2023-02-17T16:33:02.0866667+00:00

@Lekesh when you run it with a maxItemCount of less than 100, what happens with the pages? do you still just get one page with as many results as fit < 100?

Regards

Geetha
Lekesh 0 Reputation points

2023-02-19T17:59:50.63+00:00

@GeethaThatipatri-MSFT If the max_item_count < 100, it does return iterator of pages of objects but each page has items with same _ts value (timestamp value which CosmosDB adds internally). In my testing case, each page had varied count of objects (60, 40, 39, 60, ...) as mentioned in the original question.

Regards

Lekesh

Share via

How to implement pagination using query_items_change_feed() in Azure Cosmos Python SDK?

1 answer

Your answer