Customer plan to use cosmos DB and retain operational data for 10 years. What are the best practices to archive and retrieve data

Antony Stephen 16 Reputation points Microsoft Employee
2023-05-09T18:18:14.2466667+00:00

Customer plan to use cosmos DB as read optimized store for API calls and plan to retain operational data for 10 years. What are the best practices to archive and retrieve data. Thinking of offloading the data older than 6 months/1 year in in ADLS using ADF or Change Feed. Persist the older data as parquet in ADLS and read it back in SQL or Databricks.

Daily volume is expected to be (2 million Json documents)

What are the best practices for managing, archive and retrieve data in cosmos DB.

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,562 questions
Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,914 questions
{count} votes

1 answer

Sort by: Most helpful
  1. SSingh-MSFT 16,371 Reputation points Moderator
    2023-05-17T03:28:10.1133333+00:00

    Hi
    Antony Stephen
    •,

    We have received the below reply on the ask:

    For a read optimized store, they should choose a partition key aligned with their most common data access filters. The combination of the document ID + partition key allows them to perform point reads, what is the most optimized read operation, supported by performance SLAs. More information here and here.

    For data archiving the options are:

    1). To use Spark or ADF or Change Feed to export data to another data store. These options will use RUs to read the data. And to remove the "old" data, customers can use TTL (time to live) to automatically remove documents from transactional store.

    2). To use Synapse Link, what will activate analytical store, and keep the data there for analytics and also infrequent access, at no RUs cost. The price per GB is $0.03. The customer can also use CDC from the analytical store to export the data to any other data store supported by ADF or Synapse Pipelines. It has built-in checkpoint (what was the last doc exported?), deletes, and updates. This is a no code option that doesn't use RUs.  Analytical store has a TTL that is independent from transactional store TTL. Please note that analytical store is a columnar version of the data, so the data model is different. The customer should do a PoC to test.

    Hope this information helps on the Cosmos DB part.

    If the answer did not help, please add more context/follow-up question for it, and we will help you out. Else, if the answer helped, please click Accept answer so that it can help others in the community looking for help on similar topics.

    Thank you.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.