How to accurately sequence Change Data Capture data?

Andrew HB 321 Reputation points
2023-07-11T13:26:10.82+00:00

We are using the Sql Server Change Data Capture feature to allow us to process data into our data warehouse which is working very well however, I have a quick question with regards to sequencing the data as I've read conflicting documents.

We are currently using the inbuilt function cdc.fn_cdc_get_all_changes_<capture_instance> to retrieve changes between two LSN values and we then sequence that data using the __$start_lsn and __$seqval columns to ensure the records are returned in the correct sequence. Our testing using transactions would suggest that this is correct and gives us the desired result however, we then read this article: -

https://learn.microsoft.com/en-us/sql/relational-databases/system-tables/cdc-capture-instance-ct-transact-sql?view=sql-server-ver16&viewFallbackFrom=sqlallproducts-allversions

This document makes the following statement: -

We recommend that you do not query the system tables directly. Instead, execute the cdc.fn_cdc_get_all_changes_<capture_instance> and cdc.fn_cdc_get_net_changes_<capture_instance> functions.

It then states the following with regards to the __$seqval column: -

Sequence of the operation as represented in the transaction log. Should not be used for ordering. Instead, use the __$command_id column.

The problem with these two statements is that the cdc.fn_cdc_get_all_changes_<capture_instance> function does not return the __$command_id column in its result set.

My question is this, is the sequencing of the CDC data accurate if using __$start_lsn and __$seqval and the references to __$command_id are now out of date and are no longer relevant when using version: -

Microsoft SQL Azure (RTM) - 12.0.2000.8 Jun 1 2023 13:36:49 Copyright (C) 2022 Microsoft Corporation

Or

Are we simply not able to use the cdc.fn_cdc_get_all_changes_<capture_instance> function to achieve what we want and we must revert to directly accessing the raw tables?

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
12,852 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 16,071 Reputation points
    2023-07-20T13:36:12.9166667+00:00

    About sequencing the CDC data using the __$start_lsn and __$seqval columns, the information from the Microsoft documentation stating that __$seqval should not be used for ordering and to use __$command_id instead seems to conflict with your actual experience, where using __$start_lsn and __$seqval appears to give you the correct sequencing.

    Since the Microsoft documentation is not fully aligned with your experience, and you are using a relatively recent version of SQL Server, verifying the behavior of CDC in your specific version and environment might be worthwhile. You can do this by conducting further testing, referring to the official documentation for your exact SQL Server version (if available), or contacting Microsoft support for clarification.

    It's also important to keep in mind that accessing system tables directly, as mentioned in the article you referenced, is generally discouraged because the underlying structures might change in future updates, which could lead to issues and incompatibilities. The recommended approach is to use the provided functions like cdc.fn_cdc_get_all_changes_<capture_instance> and cdc.fn_cdc_get_net_changes_<capture_instance> to access CDC data.

    If you find that using __$start_lsn and __$seqval works correctly for sequencing the data in your specific version of SQL Server, it might be a viable solution for your current needs. However, it's always good to be cautious about potential future changes and ensure compatibility with new SQL Server versions.