SQL Server 2008 - Change Data Capture - Part I

Hello everyone .. It's been a long time since I haven't blogged. So today being a weekend again I have some time for jotting down this blog. Writing gives me a great pleasure and since last few months I didn't get this pleasure to get in touch with you through my blogs.

Today I am planning to write on Change Data Capture feature of SQL Server 2008. I hope you would like this article as well like the other articles you've been liking so far.

Change Data Capture (referred as CDC going forward) is a cool feature inducted in SQL Server 2008. Have you ever wondered that somehow you could get the stream of data that has been changed since a point in time. People have been looking for this feature for quite some time now. Imagine that you have a Data Mart / Data Warehouse that holds the data that is the replica of the data in OLTP tables. The OLTP Database is changed by the numerous transactions that happen daily, finally you find that the data in Data Mart / Data Warehouse is not in synch with the OLTP Tables. In such scenario the CDC feature of the SQL Server 2008 provide a steady stream of the data that has changed. CDC is designed to capture the inserts / updates / deletes applied to SQL Tables.

How it works

The following diagram explains it all :

CDC

Within the OLTP Database you have the tables, whenever you have some Insertions / Deletions / Modifications to the Table, it is written to the Log simultaneously. The Change Data Capture Process picks up the changes to the source tables and stores in the change tables. When we want to load the Data Warehouse we will use the CDC functions and it will give the steady stream of changes happened to those tables and using the ETL (Extract Transform and Load) process we will get those changes in the Data Warehouse.

Preparing a database for using Change Data Capture

NOTE: CDC will be available in SQL Server 2008 Enterprise, Developer and Evaluation editions.

First of all we would check the CDC is enabled on the database. I would use the AdventureWorks Database for code samples.

/*

I am using AdventureWorks sample database

you can specify the database on which you would

like to enable Change Data Capture

*/

--This query will return 0 if the CDC is

--not enabled else it will return 1

GO

SELECT is_cdc_enabled FROM sys.databases

WHERE [name] = 'AdventureWorks'

GO

If CDC is not enabled on the database then we will enable Change Data Capture on the database. When we enable the CDC on the Database the schema named 'cdc' is created and the following five system tables are also automatically created in the schema.

  1. cdc.captured_columns
  2. cdc.change_tables
  3. cdc.ddl_history
  4. cdc.index_columns
  5. cdc.lsn_time_mapping

/*

This query will enable CDC on your database

and will created a schema with the name of

CDC and following tables will be created automatically

Table Created are:

1) cdc.captured_columns

2) cdc.change_tables

3) cdc.ddl_history

4) cdc.index_columns

5) cdc.lsn_time_mapping

*/

GO

USE AdventureWorks

GO

EXEC sys.sp_cdc_enable_db

GO

Now when the CDC is enabled on the Database we would set up Change Data Capture on the selected tables as per our requirement. I am using the 'HumanResources.Employee' Table in the AdventureWorks DB. The following query will set up the CDC on 'HumanResources.Employee' Table and create a database role by the name of 'cdcAdmin' as this role is not already present in the database. The owner of this role will be the schema 'cdc'. The capture instance name can be selected as per user convenience. I am using the name 'InstanceHumanResourcesEmployee'. When I fire the query, it will automatically create a new table in the 'cdc' schema with the name 'cdc.InstanceHumanResourcesEmployee_CT'. You will also find that a new table valued function with the following name pattern will be created cdc.fn_cdc_get_all_changes_ <Capture Instance> . The part of the function name provided in red color depends on the capture instance name we specify. In our case the instance name is "InstanceHumanResourcesEmployee" so the new Table-valued function will be created as cdc.fn_cdc_get_all_changes_InstanceHumanResourcesEmployee

GO

/*

This table will enable CDC on the

HumanResources.Employee Table in the

Adventureworks database.

It will create a table called

cdc.InstanceHumanResourcesEmployee_CT

It will also create a new role called cdcAdmin.

NOTE: Make sure the SQL Agent is started.

*/

EXEC sys.sp_cdc_enable_table

@source_schema = 'HumanResources',

@source_name = 'Employee',

@role_name = 'cdcAdmin',

@capture_instance = 'InstanceHumanResourcesEmployee'

GO

We will fire the following query to find out the details of the CDC just set on the Employee table.

GO

/*

This query provides the details of the CDC that has been

set on the HumanResources.Employee Table.

*/

EXEC sys.sp_cdc_help_change_data_capture

@source_schema = 'HumanResources',

@source_name = 'Employee'

GO

 

If you would like to find out which tables are being monitored by CDC then run the following query. This will show the HumanResources.Employee table is being monitored by CDC.

--This will show all the tables tracked by CDC

GO

SELECT * FROM sys.tables

WHERE is_tracked_by_cdc = 1

GO

The next step is to insert a new sample record in the HumanResources.Employee table and see if that is captured by the CDC.

/*

This query will insert a record in the

HumanResources.Employee Table

*/

INSERT INTO [HumanResources].[Employee](

      [NationalIDNumber],

      [ContactID],

      [LoginID],

      [ManagerID],

      [Title],

      [BirthDate],

      [MaritalStatus],

      [Gender],

      [HireDate],

      [SalariedFlag],

      [VacationHours],

      [SickLeaveHours],

      [CurrentFlag],

      [rowguid],

      [ModifiedDate])

VALUES (

      '114459800',

      1244,

      'adventure-works\ravi0 ',

      3,

      'Mechanical Engineer',

      '1980-01-23',

      'S',

      'M',

      '03/03/2006',

      0,

      48,

      80,

      1,

      newid(),

      '03/03/2006')

GO

Now we will use the following query to find out the changes.

GO

/*

This query will get the changes happened since last day the same time till today the present time.

The function sys.fn_cdc_map_time_to_lsn gets the LSN number suitable to the time range.

At last we fire cdc.fn_cdc_get_all_changes_InstanceHumanResourcesEmployee to get all the changes.

*/

DECLARE           @Yesterday_Time smalldatetime,

                  @to_lsn binary(10),

                  @from_lsn binary(10);

SELECT            @Yesterday_Time = DATEADD(DAY, -1, GETDATE())

 

SELECT @from_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than or equal', @Yesterday_Time);

SELECT @to_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal', GETDATE());

SELECT * FROM cdc.fn_cdc_get_all_changes_InstanceHumanResourcesEmployee (@from_lsn, @to_lsn ,'all')

GO

 
Function sys.fn_cdc_map_time_to_lsn

In the above query the sys.fn_cdc_map_time_to_lsn is used. This function is used to get the LSN number that matches a timestamp. The function sys.fn_cdc_map_time_to_lsn takes two parameters.

Parameter 1: Relational Operator

Options for this parameter :

    • 'largest less than'
    • 'largest less than or equal'
    • 'smallest greater than'
    • 'smallest greater than or equals'

Parameter 2: Tracking Time

Any standard Date and Time could be specified.

Function cdc.fn_cdc_get_all_changes_<Capture Instance>

In the above query the second function used is cdc.fn_cdc_get_all_changes_InstanceHumanResourcesEmployee. I have already described above how this function is named and how it comes to existence. It takes three parameters.

Parameter 1: From LSN

This is the LSN Value from where we need start getting the changes i.e. low endpoint of the resultset.

Parameter 2: To LSN

This is the LSN Value where we need to stop getting the changes i.e. high endpoint of the resultset.

Parameter 3: Row Option Filter

Options for this parameter :

    • 'all' : Returns all the changes within the specified LSN Range. For changes due to an update operation, this option only return the row containing the new values after the update is applied.
    • 'all update old' :Returns all changes within the specified LSN Range. For changes due to an update operation, this option returns both row containing the column values before the update and the row containing the column values after update.

Download the SQL Script

Thank you for reading this blog. Friends that are many more options and functions available with Change Data Capture. I will write more about this in the Part II, very soon. So keep reading and keep posting your questions & comments. Bye !

 

CDC.sql