Customer plan to use cosmos DB and retain operational data for 10 years. What are the best practices to archive and retrieve data

Question

Customer plan to use cosmos DB and retain operational data for 10 years. What are the best practices to archive and retrieve data

Antony Stephen 16 Microsoft Employee

Customer plan to use cosmos DB as read optimized store for API calls and plan to retain operational data for 10 years. What are the best practices to archive and retrieve data. Thinking of offloading the data older than 6 months/1 year in in ADLS using ADF or Change Feed. Persist the older data as parquet in ADLS and read it back in SQL or Databricks.

Daily volume is expected to be (2 million Json documents)

What are the best practices for managing, archive and retrieve data in cosmos DB.

Deleted

This comment has been deleted due to a violation of our Code of Conduct. The comment was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.
SSingh-MSFT 16,371 Reputation points Moderator

2023-05-16T11:37:47.7566667+00:00

Hi
Antony Stephen •,

Welcome to Microsoft Q&A forum and thanks for using Azure Services.

As I understand, you want to use cosmos DB and retain operational data for 10 years. What are the best practices to archive and retrieve data and need to understand usage of ADF/Databricks in this usecase.

Regarding this ask, I am checking with the internal team.

Will get back to know with an update.

Thanks for your patience.
SSingh-MSFT 16,371 Reputation points Moderator

2023-05-18T04:52:46.76+00:00

Hi
Antony Stephen •,

Just checking in to see if the below answer helped. If this answers your query, do click Accept Answer and Mark Helpful for the same. And, if you have any further query do let us know.

1 answer

Your answer

Deleted

This comment has been deleted due to a violation of our Code of Conduct. The comment was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.
SSingh-MSFT 16,371 Reputation points Moderator

2023-05-16T11:37:47.7566667+00:00

Hi
Antony Stephen •,

Welcome to Microsoft Q&A forum and thanks for using Azure Services.

As I understand, you want to use cosmos DB and retain operational data for 10 years. What are the best practices to archive and retrieve data and need to understand usage of ADF/Databricks in this usecase.

Regarding this ask, I am checking with the internal team.

Will get back to know with an update.

Thanks for your patience.
SSingh-MSFT 16,371 Reputation points Moderator

2023-05-18T04:52:46.76+00:00

Hi
Antony Stephen •,

Just checking in to see if the below answer helped. If this answers your query, do click Accept Answer and Mark Helpful for the same. And, if you have any further query do let us know.

Answer 1

Hi
Antony Stephen •,

We have received the below reply on the ask:

For a read optimized store, they should choose a partition key aligned with their most common data access filters. The combination of the document ID + partition key allows them to perform point reads, what is the most optimized read operation, supported by performance SLAs. More information here and here.

For data archiving the options are:

1). To use Spark or ADF or Change Feed to export data to another data store. These options will use RUs to read the data. And to remove the "old" data, customers can use TTL (time to live) to automatically remove documents from transactional store.

2). To use Synapse Link, what will activate analytical store, and keep the data there for analytics and also infrequent access, at no RUs cost. The price per GB is $0.03. The customer can also use CDC from the analytical store to export the data to any other data store supported by ADF or Synapse Pipelines. It has built-in checkpoint (what was the last doc exported?), deletes, and updates. This is a no code option that doesn't use RUs. Analytical store has a TTL that is independent from transactional store TTL. Please note that analytical store is a columnar version of the data, so the data model is different. The customer should do a PoC to test.

Hope this information helps on the Cosmos DB part.

If the answer did not help, please add more context/follow-up question for it, and we will help you out. Else, if the answer helped, please click Accept answer so that it can help others in the community looking for help on similar topics.

Thank you.

Share via

Customer plan to use cosmos DB and retain operational data for 10 years. What are the best practices to archive and retrieve data

1 answer

Your answer