How to sync data from snowflake to cosmos using CDC mechanism in real time through ADf

mobileapp4.support 0 Reputation points
2024-04-19T04:43:25+00:00

We have a view on snowflake having approx 250 columns with millions of records. We are syncing this data from snowflake to cosmos using ADF pipeline.

Currently the data synced on the basis of particular column that is lst_update_time, which query all dataset and reflect all changes from snowflake to cosmos db.

Issue is that only few of columns are relevant to use on cosmos db, so we need a solution when there is any change on those columns then only ADF sync those records from snowflake to cosmos

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,583 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Vinodh247-1375 11,211 Reputation points
    2024-04-19T05:35:27.15+00:00

    Hi mobileapp4.support,

    Thanks for reaching out to Microsoft Q&A.

    Yes, this is achievable. The simplest way I could think off is as below. You can try this and let me know.

    1. Change Data Capture in Snowflake: Set up CDC in Snowflake to track changes in specific columns. When there’s a change (insert/update/delete) in the specified columns, Snowflake will be able to capture it.
    2. ADF Pipeline: Create an ADF pipeline that runs periodically (ex: every few minutes/hours based on your req). Use the Snowflake connector in ADF to read the changed data from Snowflake. Filter the records based on the columns that have changed (using the CDC info).
    3. Cosmos DB Sink: Configure the ADF pipeline to write the filtered records to Azure Cosmos DB. Use the Cosmos DB connector in ADF to perform the data transfer.

    Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will benefit other community members who face the same issue.