How we can implement CDC or copied data from source to destination, when source is the azure postgres database.

Kanwar, Pramod 20 Reputation points
2023-08-25T10:40:33.6833333+00:00

As CDC Feature (Preview) available in the Azure data factory. It has been noted and is not accessible for other connectors. Oracle and azure postgres connector are two examples of connectors.

All in all, we are looking the feature, source database changes would be copied in the destination.

It would be grateful if you could recommend the optimal approach in Azure Data Factory for azure postgres db.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,623 questions
{count} votes

Accepted answer
  1. KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator
    2023-09-11T17:25:07.83+00:00

    @Kanwar, Pramod In relevance to your follow-up query - Yes, you can go with Mapping data flow as it supports auto incremental copy when you have delta column that identifies the changes.

    Please refer to this doc: Auto incremental extraction in mapping data flow

    User's image

    Hope this helps.


    Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.


1 additional answer

Sort by: Most helpful
  1. Amira Bedhiafi 33,071 Reputation points Volunteer Moderator
    2023-08-27T11:52:11.42+00:00

    If your Azure Data Factory's built-in CDC feature does not support Azure PostgreSQL (it can be related to versioning issues), you can still design a custom CDC mechanism.

    PostgreSQL supports logical replication which can be used for capturing changes. Ensure that logical replication is enabled on your Azure PostgreSQL database.

    Maintain a separate table in PostgreSQL to track the changes for each table you wish to capture. The tracking table could have columns such as table_name, last_processed_id, and last_processed_timestamp.

    As for the Azure Data Factory Pipeline, you can proceed the following :

    • Lookup Activity: Look up the last processed ID/timestamp from the tracking table.
    • Copy Activity: Copy the new records from the source PostgreSQL table using a query that filters records based on the last processed ID/timestamp. Use a watermark approach where you filter records that have an ID/timestamp greater than the values fetched from the tracking table.
    • Stored Procedure Activity: After copying the data to the destination, update the tracking table with the new last processed ID/timestamp.

    For the Incremental Loads, ensure your tables in PostgreSQL have either an auto-incrementing ID or a timestamp column. This way, you can always fetch records that have been inserted/updated after the last fetched ID/timestamp.

    To capture deleted records, you might need a soft delete mechanism in your source system where records are marked as deleted but not actually removed.

    A timestamp or version column can help detect changes since the last CDC process for updated records.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.