How to cdc system columns in azure mapping data flow using 'Change data capture'

Amit Trivedi 40 Reputation points
2024-05-23T13:56:28.3133333+00:00

User's image

I am using the 'Change data capture' property of the dataflow in a pipeline to extract data incrementally. This option is working fine but I have to extract the cdc system columns like __$operation, __$start_lsn .
Kindly help , if there is a way to get cdc system columns with the settings given in image.

Note: I cannot use Query or Stored Procedure option(in data flows) to extract data incrementally because both of these options doesn't support 'Data Lineage' and this is mandatory requirement that the pipeline should have data lineage information.
I can write a stored procedure which use cdc to extract data incrementally but stored procedure doesn't support data lineage.

Please suggest some way, where I can extract data incrementally along with the Data Lineage information in the pipeline.
I am

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,941 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Smaran Thoomu 11,770 Reputation points Microsoft Vendor
    2024-05-24T10:01:57.2066667+00:00

    Hi @Amit Trivedi

    Thank you for reaching out to us with your query.

    As you mentioned, the 'Change data capture' option in Azure Data Factory Mapping Data Flow does not support the extraction of CDC system columns like __$operation and __$start_lsn.

    One workaround for this issue is to use the 'Copy data' activity in Azure Data Factory to extract data incrementally along with Data Lineage information in the pipeline. You can create a pipeline in Azure Data Factory and add a 'Copy data' activity to the pipeline. In the 'Copy data' activity, you can select the source and destination data stores and configure the incremental copy settings. You can configure the incremental copy settings by selecting the 'Use query' option and specifying the query that extracts the data incrementally. You can also specify the CDC system columns in the query to extract the required data. Once you have configured the 'Copy data' activity, you can run the pipeline to extract the data incrementally along with Data Lineage information.

    Alternatively, you can use a stored procedure to extract the data incrementally and include the CDC system columns in the output. However, as you mentioned, stored procedures do not support data lineage. Therefore, you can use the 'Data Flow' activity in Azure Data Factory to transform the data before copying it to the destination data store.

    I hope this helps you resolve your query.

    0 comments No comments