Error while using sql query in azure dataflow to capture changes using cdc tables.

Amit Trivedi 40 Reputation points
2023-09-19T10:31:00.27+00:00

I am using using a sql query in the source query of the source of a data flow in azure data factory.

Sql query uses cdc tables to get data incrementally. I am getting below error while executing the query:

"Job failed due to reason: com.microsoft.dataflow.Issues: DF-SQLCDC_003 - Native CDC supports only Table Format - [564 737],

source(

) ~> source1,EXE-0001,Dataflow cannot be analyzed as a graph"

I am using below query in the source of the dataflow activity to get data:

declare @WfStartDateTime varchar(50)

select @WfStartDateTime = '2023-09-15 12:33:44.067'

declare @previousEndDTM varchar(50)

SELECT @previousEndDTM = '2023-01-24 15:50:28.540'

declare @from_lsn binary(10) = sys.fn_cdc_map_time_to_lsn('smallest greater than', @previousEndDTM)

declare @to_lsn binary(10) = sys.fn_cdc_map_time_to_lsn('largest less than or equal', @WfStartDateTime)

***SELECT ****

FROM cdc.fn_cdc_get_net_changes_tblname_DM2(@from_lsn, @to_lsn, 'all')

I googled a lot but didn't find any way to fix this error.

Please help.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,624 questions
0 comments No comments
{count} votes

Accepted answer
  1. Amira Bedhiafi 33,071 Reputation points Volunteer Moderator
    2023-09-19T16:21:53.29+00:00

    Like the error message indicates, Azure Data Factory's native CDC support requires data to be in table format. This might be the cause of the issue.

    Azure Data Factory might not support using DECLARE statements directly in the source query. Instead, use parameterized inputs where required.

    Instead of hardcoding date and time values in your SQL script, use ADF's parameterized input. This will allow ADF to dynamically manage and pass these values, reducing potential errors.

    Try wrapping your CDC logic into a single SELECT statement without using variables. For instance:

     SELECT * 
        FROM cdc.fn_cdc_get_net_changes_tblname_DM2(
            sys.fn_cdc_map_time_to_lsn('smallest greater than', '2023-01-24 15:50:28.540'),
            sys.fn_cdc_map_time_to_lsn('largest less than or equal', '2023-09-15 12:33:44.067'), 
            'all'
        )
    

    If the complexity of the query grows or if you face continuous challenges, consider wrapping this logic inside a stored procedure in your SQL database, and then call that stored procedure from ADF.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.