ADF - CDC - enable and disable CDC tables during deployment

Ashwani SRIVASTAVA 60 Reputation points
2024-05-08T04:29:23.9233333+00:00

Hi, I have setup ADF pipeline for data sync of tables between Azure SQL Database to SQL Managed Instance databases. I am using Data Flow and CDC to track changes and sink activity to replicate it to target. However, when we deploy any table changes (add a new column) on CDC enabled tables via DACPAC, it fails with below error:
"Table is under change data capture control and cannot be modified"

Table modification via ALTER statement (t-sql in SSMS) does not throw any error. I looked at few articles and it looks like DACPAC has different approach to table changes and not supported for CDC.

So, the other option is to disable CDC on table before deploying DACPAC and then re-enable CD on the table. But this cause ADF pipeline to fail and throw below error message:

"An insufficient number of arguments were supplied for the procedure or function cdc.fn_cdc_get_net_changes_ ..."

This is due to checkpoint key in data flow step of pipeline as it is associated with pipeline and maintains unique key for table being replicated and once CDC is disabled/re-enabled, it gets invalid. So, need to change this key for pipeline to work again.

Any insights on how to avoid this situation in case we need to have cdc on table enabled/disabled or making schema change work fine on CDC enabled tables when deployed via DACPAC?

User's image

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,702 questions
{count} votes

Accepted answer
  1. BhargavaGunnam-MSFT 27,156 Reputation points Microsoft Employee
    2024-05-10T20:53:31.78+00:00

    Hello Ashwani SRIVASTAVA,

    Please follow the below steps.

    1). Disable and enable the CDC instance on the source database.

    SalesLT.Product table from AdventureWorks is used as an example below.

    disable

    EXEC sys.sp_cdc_disable_table @source_schema = N'SalesLT', @source_name = N'Product', @capture_instance = N'SalesLT_Product' GO

    To verify CDC is disabled (no CDC configuration shows up if disabled on that table)

    exec sys.sp_cdc_help_change_data_capture go

    enable again

    EXEC sys.sp_cdc_enable_table @source_schema = N'SalesLT', @source_name = N'Product', @role_name = NULL, @filegroup_name = NULL, @supports_net_changes = 1

    2). Import the new schema in the ADF mapping data flow.

    Ex:HolidaySalesDeal column, which is added on the database

    1. Add alterRows transformation in the mapping data flow. Condition is [Upsert if] [isInsert(1)]

    4). Override the checkpoint key with unique value.

    1. Run the pipeline and confirm the changed data is populated in the target sink.

    Reference documents:
    https://learn.microsoft.com/en-us/azure/azure-sql/database/change-data-capture-overview?view=azuresql

    https://learn.microsoft.com/en-us/azure/data-factory/concepts-change-data-capture

    https://learn.microsoft.com/en-us/azure/data-factory/control-flow-execute-data-flow-activity#checkpoint-key

    Please note:
    ADF doesn't fully support schema evolution from end to end.

    Schema evolution, which table schema changes such as adding columns, is not fully supported by SQL native CDC. ADF mapping data flow has schema drift and auto mapping feature, but it can’t absorb this feature gap.

    Hence, the added column, changed data are not detected automatically.

    I hope this answers your question. Please let me know in case if you have any further questions.


0 additional answers

Sort by: Most helpful