ADF CDC resource Start and Stop

Oleksandr K 0 Reputation points
2024-12-13T20:36:11.0866667+00:00

Hello. In Data Factory using CDC resource is there a way to monitor (and manually change) the last processed Incremental Column value? (i.e. current state/watermark)?

My both Source and Sink are Azure SQL Database tables. And I have am using CDC resource to keep them in sync.

I would like to know what if I stop the CDC resource and then Start it again, will it continue and process all the data during pause? Or will it start capturing only changes after the Start action?

I want to Stop CDC to create Index on the Target/Sink table, so I am thinking if it's enough just to Stop CDC resource temporarily and Start it again after I created the Index.

Could you advise?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,655 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Gabriel Santana 175 Reputation points
    2024-12-14T02:00:02.1533333+00:00

    Hello, Oleksandr! Welcome to the Microsoft Q&A Community!

    Q: In Data Factory using CDC resource is there a way to monitor (and manually change) the last processed Incremental Column value? (i.e. current state/watermark)?

    A: In Azure Data Factory (ADF), using the Change Data Capture (CDC) feature, it is possible to monitor and update the last processed incremental column value

    Monitoring the Last Processed Incremental Column Value

    1. Using Metadata Tables:
      • You can query the CDC metadata tables (cdc.lsn_time_mapping in your source or sink database to determine the last processed incremental value.
    2. Using Dynamic Management Views (DMVs):
      • For SQL Server or Azure SQL Managed Instance, the DMV sys.dm_cdc_log_scan_sessions contains information about the CDC log scan, including the watermark.

    Changing the Last Processed Incremental Column Value Manually

    If you need to reset or modify the CDC state/watermark:

    • You can use sys.sp_cdc_change_job to configure the CDC process, but direct modification of the watermark is not supported natively. Instead:
      • Identify the target LSN or timestamp value that you want to set as the new watermark.
      • Use a custom query or process to reset CDC metadata tables, but this is not recommended unless carefully tested, as it could lead to data integrity issues.

    Q: I would like to know what if I stop the CDC resource and then Start it again, will it continue and process all the data during pause? Or will it start capturing only changes after the Start action?

    A: Yes, it will continue and process all the data during the pause, As described Here

    Q: I want to Stop CDC to create Index on the Target/Sink table, so I am thinking if it's enough just to Stop CDC resource temporarily and Start it again after I created the Index.

    Could you advise?

    A: Sure thing, stopping the CDC resource temporarily while creating the index on the target/sink table is a valid approach. Once the index is created, you can start the CDC resource again to resume data capture without issues As Described here

    Please let me know if this helps, and don't forget to mark this as the accepted answer so others facing the same issue can find the solution more easily!


  2. Chandra Boorla 14,675 Reputation points Microsoft External Staff Moderator
    2024-12-16T05:25:00.1066667+00:00

    @Oleksandr K

    Greetings & Welcome to Microsoft Q&A forum! Thanks for posting your query!

    In Azure Data Factory (ADF), when using a Change Data Capture (CDC) resource, it is essential to understand how the system tracks changes and how it handles disruptions. Here’s how the system behaves and what steps you can take:

    In Data Factory using CDC resource is there a way to monitor (and manually change) the last processed Incremental Column value? (i.e. current state/watermark)?

    Currently, Azure Data Factory does not provide a direct way to manually monitor or adjust the last processed Incremental Column value of a CDC resource. This value is automatically tracked by the CDC service and used to determine which changes to process. The CDC resource in ADF tracks the last processed state (watermark) automatically using the __$start_lsn and __$end_lsn (Log Sequence Numbers).

    For more details, please refer: Change Data Capture - sys.dm_cdc_log_scan_sessions

    To monitor the current state, query the CDC metadata or logs in the source Azure SQL Database:

    Example:

    • sys.fn_cdc_get_max_lsn - Returns the maximum log sequence number (LSN) from the start_lsn column in the cdc.lsn_time_mapping system table. You can use this function to return the high endpoint of the change data capture timeline for any capture instance.
    • sys.fn_cdc_get_min_lsn - Returns the start_lsn column value for the specified capture instance from the cdc.change_tables system table. This value represents the low endpoint of the validity interval for the capture instance.

    Please refer this https://learn.microsoft.com/en-us/sql/relational-databases/system-functions/sys-fn-cdc-get-max-lsn-transact-sql?view=sql-server-ver16 document for additional information.

    I would like to know what if I stop the CDC resource and then Start it again, will it continue and process all the data during pause? Or will it start capturing only changes after the Start action?

    When you stop the Change Data Capture resource in Azure Data Factory (ADF) and then restart it, it will not automatically process all the data during the pause. Instead, it will resume capturing only the changes after the "Start" action from the last processed Log Sequence Number (LSN) before the stop.

    I want to Stop CDC to create Index on the Target/Sink table, so I am thinking if it's enough just to Stop CDC resource temporarily and Start it again after I created the Index.

    Yes, stopping the CDC resource temporarily to create an index on the target/sink table should generally a common approach. However, there are a few important things to consider ensuring this approach works smoothly and does not cause any issues.

    • Stop the CDC Pipeline - This will pause the CDC process, preventing new data from being processed.
    • Create the Index - Execute the necessary SQL commands to create the desired index on the target table.
    • Start the CDC Pipeline - Resume the CDC process, which will continue to capture and process changes from the last tracked point.

    Key Considerations:

    Retention period - Ensure that the CDC retention period on the source database is configured appropriately to cover the time it takes to create the index and restart the CDC resource.

    Data integrity - After restarting, validate that the changes captured during the downtime (if within the retention period) are correctly applied to the sink table.

    Test the Process - Test the stop-restart flow in a non-production environment before applying it in production.

    Backup Data - Back up the sink table before making structural changes, if possible.

    For additional information please refer: Troubleshoot the Change data capture resource in Azure Data Factory

    I hope this information helps. Please do let us know if you have any further queries.

    Thank you.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.