Obtaining Changes by Using the Change Tracking Functions
This topic describes the change tracking functions in SQL Server 2008 and how to use them to obtain the changes that are made in a database and information about the changes.
About the Change Tracking Functions
Applications can use the following functions to obtain the changes that are made in a database and information about the changes:
CHANGETABLE(CHANGES …) function
This rowset function is used to query for change information. The function queries the data stored in the internal change tracking tables. The function returns a results set that contains the primary keys of rows that have changed together with other change information such as the operation, columns updated and version for the row.
CHANGETABLE(CHANGES …) takes a last synchronization version as an argument. The last synchronization version is obtained using the @last_synchronization_version variable, as shown in the examples in this topic. The semantics of the last synchronization version are as follows:
The calling client has obtained changes and knows about all changes up to and including the last synchronization version.
CHANGETABLE(CHANGES …) will therefore return all changes that have occurred after the last synchronization version.
The following illustration shows how CHANGETABLE(CHANGES …) is used to obtain changes.
Is used to obtain the current version that will be used the next time when querying changes. This version represents the version of the last committed transaction.
Is used to obtain the minimum valid version that a client can have and still obtain valid results from CHANGETABLE(). The client should check the last synchronization version against the value thatis returned by this function. If the last synchronization version is less than the version returned by this function, the client will be unable to obtain valid results from CHANGETABLE() and will have to reinitialize.
Obtaining Initial Data
Before an application can obtain changes for the first time, the application must send a query to obtain the initial data and the synchronization version. The application must obtain the appropriate data directly from the table, and then use CHANGE_TRACKING_CURRENT_VERSION() to obtain the initial version. This version will be passed to CHANGETABLE(CHANGES …) the first time that changes are obtained.
The following example shows how to obtain the initial synchronization version and the initial data set.
-- Obtain the current synchronization version. This will be used next time that changes are obtained. SET @synchronization_version = CHANGE_TRACKING_CURRENT_VERSION(); -- Obtain initial data set. SELECT P.ProductID, P.Name, P.ListPrice FROM SalesLT.Product AS P
Using the Change Tracking Functions to Obtain Changes
To obtain the changed rows for a table and information about the changes, use CHANGETABLE(CHANGES…). For example, the following query obtains changes for the SalesLT.Product table.
SELECT CT.ProductID, CT.SYS_CHANGE_OPERATION, CT.SYS_CHANGE_COLUMNS, CT.SYS_CHANGE_CONTEXT FROM CHANGETABLE(CHANGES SalesLT.Product, @last_synchronization_version) AS CT
Usually, a client will want to obtain the latest data for a row instead of only the primary keys for the row. Therefore, an application would join the results from CHANGETABLE(CHANGES …) with the data in the user table. For example, the following query joins with the SalesLT.Product table to obtain the values for the Name and ListPrice columns. Note the use of OUTER JOIN. This is required to make sure that the change information is returned for those rows that have been deleted from the user table.
SELECT CT.ProductID, P.Name, P.ListPrice, CT.SYS_CHANGE_OPERATION, CT.SYS_CHANGE_COLUMNS, CT.SYS_CHANGE_CONTEXT FROM SalesLT.Product AS P RIGHT OUTER JOIN CHANGETABLE(CHANGES SalesLT.Product, @last_synchronization_version) AS CT ON P.ProductID = CT.ProductID
To obtain the version for use in the next change enumeration, use CHANGE_TRACKING_CURRENT_VERSION(), as shown in the following example.
SET @synchronization_version = CHANGE_TRACKING_CURRENT_VERSION()
When an application obtains changes, it must use both CHANGETABLE(CHANGES…) and CHANGE_TRACKING_CURRENT_VERSION(), as shown in the following example.
-- Obtain the current synchronization version. This will be used the next time CHANGETABLE(CHANGES...) is called. SET @synchronization_version = CHANGE_TRACKING_CURRENT_VERSION(); -- Obtain incremental changes by using the synchronization version obtained the last time the data was synchronized. SELECT CT.ProductID, P.Name, P.ListPrice, CT.SYS_CHANGE_OPERATION, CT.SYS_CHANGE_COLUMNS, CT.SYS_CHANGE_CONTEXT FROM SalesLT.Product AS P RIGHT OUTER JOIN CHANGETABLE(CHANGES SalesLT.Product, @last_synchronization_version) AS CT ON P.ProductID = CT.ProductID
A database that has change tracking enabled has a version counter that increases as changes are made to change tracked tables. Each changed row has a version number that is associated with it. When a request is sent to an application to query for changes, a function is called that supplies a version number. The function returns information about all the changes that have been made since that version. In some ways, change tracking version is similar in concept to the rowversion data type.
Validating the Last Synchronized Version
Information about changes is maintained for a limited time. The length of time is controlled by the CHANGE_RETENTION parameter that can be specified as part of the ALTER DATABASE.
Be aware that the time specified for CHANGE_RETENTION determines how frequently all applications must request changes from the database. If an application has a value for last_synchronization_version that is older than the minimum valid synchronization version for a table, that application cannot perform valid change enumeration. This is because some change information might have been cleaned up. Before an application obtains changes by using CHANGETABLE(CHANGES …), the application must validate the value for last_synchronization_version that it plans to pass to CHANGETABLE(CHANGES …). If the value of last_synchronization_version is not valid, that application must reinitialize all the data.
The following example shows how to verify the validity of the value of last_synchronization_version for each table.
-- Check individual table. IF (@last_synchronization_version < CHANGE_TRACKING_MIN_VALID_VERSION( OBJECT_ID('SalesLT.Product'))) BEGIN -- Handle invalid version and do not enumerate changes. -- Client must be reinitialized. END
As the following example shows, the validity of the value of last_synchronization_version can be checked against all tables in the database.
-- Check all tables with change tracking enabled IF EXISTS ( SELECT * FROM sys.change_tracking_tables WHERE min_valid_version > @last_synchronization_version ) BEGIN -- Handle invalid version & do not enumerate changes -- Client must be reinitialized END
Using Column Tracking
Column tracking enables applications to obtain the data for only the columns that have changed instead of the whole row. For example, consider the scenario in which a table has one or more columns that are large, but rarely change; and also has other columns that frequently change. Without column tracking, an application can only determine that a row has changed and would have to synchronize all the data that includes the large column data. However, by using column tracking, an application can determine whether the large column data changed and only synchronize the data if it has changed.
Column tracking information appears in the SYS_CHANGE_COLUMNS column that is returned by the CHANGETABLE(CHANGES …) function.
Column tracking can be used so that NULL is returned for a column that has not changed. If the column can be changed to NULL, a separate column must be returned to indicate whether the column changed.
In the following example, the CT_ThumbnailPhoto column will be NULL if that column did not change. This column could also be NULL because it was changed to NULL - the application can use the CT_ThumbNailPhoto_Changed column to determine whether the column changed.
DECLARE @PhotoColumnId int = COLUMNPROPERTY( OBJECT_ID('SalesLT.Product'),'ThumbNailPhoto', 'ColumnId') SELECT CT.ProductID, P.Name, P.ListPrice, -- Always obtain values. CASE WHEN CHANGE_TRACKING_IS_COLUMN_IN_MASK( @PhotoColumnId, CT.SYS_CHANGE_COLUMNS) = 1 THEN ThumbNailPhoto ELSE NULL END AS CT_ThumbNailPhoto, CHANGE_TRACKING_IS_COLUMN_IN_MASK( @PhotoColumnId, CT.SYS_CHANGE_COLUMNS) AS CT_ThumbNailPhoto_Changed CT.SYS_CHANGE_OPERATION, CT.SYS_CHANGE_COLUMNS, CT.SYS_CHANGE_CONTEXT FROM SalesLT.Product AS P INNER JOIN CHANGETABLE(CHANGES SalesLT.Product, @last_synchronization_version) AS CT ON P.ProductID = CT.ProductID AND CT.SYS_CHANGE_OPERATION = 'U'
Obtaining Consistent and Correct Results
Obtaining the changed data for a table requires multiple steps. Be aware that inconsistent or incorrect results could be returned if certain issues are not considered and handled.
For example, to obtain the changes that were made to a Sales table and SalesOrders table, an application would perform the following steps:
Validate the last synchronized version by using CHANGE_TRACKING_MIN_VALID_VERSION().
Obtain the version that can be used to obtain change the next time by using CHANGE_TRACKING_CURRENT_VERSION().
Obtain the changes for the Sales table by using CHANGETABLE(CHANGES …).
Obtain the changes for the SalesOrders table by using CHANGETABLE(CHANGES …).
Two processes are occurring in the database that can affect the results that are returned by the previous steps:
The cleanup process runs in the background and removes change tracking information that is older than the specified retention period.
The cleanup process is a separate background process that uses the retention period that is specified when you configure change tracking for the database. The issue is that the cleanup process can occur in the time between when the last synchronization version was validated and when the call to CHANGETABLE(CHANGES…) is made. A last synchronization version that was just valid might no longer be valid by the time the changes are obtained. Therefore, incorrect results might be returned.
Ongoing DML operations are occurring in the Sales and SalesOrders tables, such as the following operations:
Changes can be made to the tables after the version for next time has been obtained by using CHANGE_TRACKING_CURRENT_VERSION(). Therefore, more changes can be returned than expected.
A transaction could commit in the time between the call to obtain changes from the Sales table and the call to obtain changes from the SalesOrders table. Therefore, the results for the SalesOrder table could have foreign key value that does not exist in the Sales table.
To overcome the previously listed challenges, we recommend that you use snapshot isolation. This will help to ensure consistency of change information and avoid race conditions that are related to the background cleanup task. If you do not use snapshot transactions, developing an application that uses change tracking could require significantly more effort.
Using Snapshot Isolation
Change tracking has been designed to work well with snapshot isolation. Snapshot isolation must be enabled for the database. All the steps that are required to obtain changes must be included inside a snapshot transaction. This will ensure that all changes that are made to data while obtaining changes will not be visible to the queries inside the snapshot transaction.
To obtain data inside a snapshot transaction, perform the following steps:
Set the transaction isolation level to snapshot and start a transaction.
Validate the last synchronization version by using CHANGE_TRACKING_MIN_VALID_VERSION().
Obtain the version to be used the next time by using CHANGE_TRACKING_CURRENT_VERSION().
Obtain the changes for the Sales table by using CHANGETABLE(CHANGES …)
Obtain the changes for the Salesorders table by using CHANGETABLE(CHANGES …)
Commit the transaction.
Some points to remember as all steps to obtain changes are inside a snapshot transaction:
If cleanup occurs after the last synchronization version is validated, the results from CHANGETABLE(CHANGES …) will still be valid as the delete operations performed by cleanup will not be visible inside the transaction.
Any changes that are made to the Sales table or the SalesOrders table after the next synchronization version is obtained will not be visible, and the calls to CHANGETABLE(CHANGES …) will never return changes with a version later than that returned by CHANGE_TRACKING_CURRENT_VERSION(). Consistency between the Sales table and the SalesOrders table will also be maintained, because the transactions that were committed in the time between calls to CHANGETABLE(CHANGES …) will not be visible.
The following example shows how snapshot isolation is enabled for a database.
-- The database must be configured to enable snapshot isolation. ALTER DATABASE AdventureWorksLT2008 SET ALLOW_SNAPSHOT_ISOLATION ON;
A snapshot transaction is used as follows:
SET TRANSACTION ISOLATION LEVEL SNAPSHOT; BEGIN TRAN -- Verify that version of the previous synchronization is valid. -- Obtain the version to use next time. -- Obtain changes. COMMIT TRAN
For more information about snapshot transactions, see Using Row Versioning-based Isolation Levels.
Alternatives to Using Snapshot Isolation
There are alternatives to using snapshot isolation, but they require more work to make sure all application requirements are met. To make sure the last_synchronization_version is valid and data is not removed by the cleanup process before changes are obtained, do the following:
Check last_synchronization_version after the calls to CHANGETABLE().
Check last_synchronization_version as part of each query to obtain changes by using CHANGETABLE().
Changes can occur after the synchronization version for the next enumeration has been obtained. There are two ways to handle this situation. The option that is used depends on the application and how it can handle the side-effects of each approach:
Ignore changes that have a version larger than the new synchronization version.
This approach has the side effect that a new or updated row would be skipped if it was created or updated before the new synchronization version, but then updated afterward. If there is a new row, a referential integrity problem might occur if there was a row in another table that was created that referenced the skipped row. If there is an updated existing row, the row will be skipped and not synchronized until the next time.
Include all changes, even those that have a version larger than the new synchronization version.
The rows that have a version larger than the new synchronization version will be obtained again on the next synchronization. This must be expected and handled by the application.
In addition to the previous two options, you can devise approach that combines both options, depending on the operation. For example, you might want an application for which it is best to ignore changes newer than the next synchronization version in which the row was created or deleted, but updates are not ignored.
Choosing the approach that will work for the application when you are using change tracking (or any custom tracking mechanism), requires significant analysis. Therefore, it is much simpler to use snapshot isolation.
Corrected a typo in the code sample for checking the validity of last_synchronization_version against all tables in the database.