Compartir a través de


Real Time Data Integration with Service Broker and Other SQL Techniques

This article discusses how to use various SQL technologies to accomplish real time data integration between SQL Server instances. It provides a set of sample code to help users with their development. The document focuses on the usage of each technology which is incorporated into the data integration service. Please refer to provided links for detail information about the technologies.

Real time data integration definition

Real time data integration supports event-driven data movement and transformation between SQL Server instances which host databases with different schemas. The data integration should be transparent to source systems without significantly impacting the systems when events are captured and delivered. The technique also supports an intermediate format which allows decoupling of schemas between source and destination systems. It allows either system to change schemas without breaking the application in the other system. The data integration provides fast and efficient data delivery to a destination in an event-driven model, without polling the source system for new data.


Sales data integration

The real time data integration demo shows the sales data integration between the databases, AdventureWorks (AW) and AdventureWorksDW (AWDW). The data integration service catches the sales data change on AW, and transforms the data in the schema supported in AW onto a general XML format. The service sends the data in XML onto AWDW, and transforms it to correspond to the AWDW schema.

The demo uses the sample databases on SQL Server. Please refer to the link for the detail information about the databases [https://msdn.microsoft.com/en-us/library/ms124659.aspx]. Users can download and install the databases for SQL Server 2008 from the following link [https://technet.microsoft.com/en-us/library/ms124501(SQL.100).aspx].


Techniques

Change tracking

Change tracking provides a mechanism to query for changes to data and to access information related to the changes. This solution provides answer to the following questions. What rows have changed for a user table? What are the latest data in the rows? Change Tracking requires small amount of storage for each changed row, while it only works for getting the latest data. Please refer to the following link for detail information about Change Tracking [https://msdn.microsoft.com/en-us/library/bb933874(SQL.100).aspx]

. If an application requires information about all the changes and the intermediate values of the changed data then it should use Change Data Capture (CDC). Please refer to the following document for the comparison of two techniques [https://msdn.microsoft.com/en-us/library/cc280519(SQL.100).aspx]. We plan to write another document which shows how to use CDC as a change tracking option. The following code block shows how to enable Change Tracking on the database and table levels.

ALTER

DATABASE AdventureWorks

SET CHANGE_TRACKING = ON

(CHANGE_RETENTION = 2 DAYS, AUTO_CLEANUP = ON)

ALTER

TABLE [AdventureWorks].[Sales].[SalesOrderHeader]

ENABLE CHANGE_TRACKING

WITH (TRACK_COLUMNS_UPDATED = ON)

ALTER

TABLE [AdventureWorks].[Sales].[SalesOrderDetail]

ENABLE CHANGE_TRACKING

WITH (TRACK_COLUMNS_UPDATED = ON)

Changed data in XML

After setting change tracking on a database and tables the change tables are populated with the data change information when data is inserted, deleted or updated on the tables. The data integration service uses the following code block to fetch the change information and create an XML file with the data change. Using CHANGETABLE function it creates change tracking information for the tables, ‘SalesOrderHeader’ and ‘SalesOrderDetail’. The code generates an XML document containing the information using the FOR XML mode. In the XML file the root, top-level element is named with ‘Sales’, and each sales order header corresponds to an element named with ‘SalesOrderHeader’. A ‘SalesOrderHeader’ element contains one or more ‘SalesOrderDetail’ elements which describe the data change information on the table, ‘SalesOrderDetail’. INNER JOIN clauses make sure that all the change data information is retrieved from the tables.

SET

@changeReportXML =

(

SELECT

SYS_CHANGE_OPERATION, c_soh.SalesOrderID,

(

SELECT SYS_CHANGE_OPERATION, c_sod.SalesOrderID,

c_sod

.SalesOrderDetailID

FROM CHANGETABLE

(

CHANGES

[AdventureWorks].[Sales].[SalesOrderDetail],

@last_sync_version

)

AS c_sod

INNER JOIN [AdventureWorks].[Sales].[SalesOrderDetail] sod

ON sod.SalesOrderDetailID = c_sod.SalesOrderDetailID

WHERE c_soh.SalesOrderID = c_sod.SalesOrderID

FOR XML PATH ('SalesOrderDetail'),

type

, ELEMENTS XSINIL

)

FROM CHANGETABLE (

CHANGES

[AdventureWorks].[Sales].[SalesOrderHeader],

@last_sync_version

)

AS c_soh

INNER JOIN [AdventureWorks].[Sales].[SalesOrderHeader] soh

ON soh.SalesOrderID = c_soh.SalesOrderID

WHERE @salesOrderID = c_soh.SalesOrderID

FOR XML PATH ('SalesOrderHeader'),

root

('Sales'),ELEMENTS XSINIL

);

Change notification

SQL Server provides several mechanisms for notifying data change to an application. For example, Trigger [https://msdn.microsoft.com/en-us/library/ms189599.aspx] and Query Notification (QN) [https://msdn.microsoft.com/en-us/library/ms130764.aspx]. Trigger provides a simple way for the notification, while only supporting synchronous mechanism. QN supports asynchronous notification and rich filtering semantics. However QN cannot be configured in TSQL within SQL Server. In the Real Time Data integration demo we use a technique integrating Service Broker and Trigger. It provides a simple way to support event notification implementing asynchronous semantic in TSQL within SQL Server. The following code block shows the event notification on the demo.

CREATE

TABLE ConversationHandle

(

conversationHandle uniqueidentifier);

--Create a dialog to send all the transactions on

BEGIN

TRANSACTION

DECLARE @conversationHandle uniqueidentifier

--Create a new conversation on the table

BEGIN DIALOG @conversationHandle

FROM SERVICE AsynchTriggerInitiatorService

TO SERVICE N'AsynchTriggerTargetService'

ON CONTRACT [AsynchTriggerContract]

WITH ENCRYPTION = OFF;

INSERT ConversationHandle (conversationHandle)

VALUES

(@conversationHandle)

COMMIT

;

-- TRIGGER for initiating the change tracking demo

CREATE

TRIGGER ChangeTrackingTrigger

ON

[AdventureWorks].[Sales].[SalesOrderHeader]

AFTER

INSERT, DELETE, UPDATE

AS

BEGIN

TRANSACTION;

DECLARE @conversationHandle uniqueidentifier;

SELECT TOP (1) @conversationHandle = conversationHandle

FROM

ConversationHandle;

SEND ON CONVERSATION @conversationHandle

MESSAGE

TYPE [AsynchTriggerMessageType]

COMMIT

;

Reliable data movement

Service Broker provides asynchronous and reliable data movement. It supports TSQL programming model built on SQL Server database engine. Please refer to the following link for the detail information about Service Broker [https://technet.microsoft.com/en-us/sqlserver/bb671396.aspx].

The data integration service uses multiple conversations for message delivery to increase throughput. Using multiple dialogs brings the data parallelism on the receiving side. Multiple threads can receive and process the messages in the dialogs independently. However, initiating the conversations brings load to a system. Therefore right amounts of conversations should be chosen smartly. In the real time data integration demo we choose four conversations for processing the messages with high throughput. In the real time data integration service we initiate four dialogs, and store them onto a table. The demo uses the dialogs for sending messages about changed data information. Please refer to the following code block for the dialog creation. The following code blocks present the procedure for sending an XML message using Service Broker. The procedure uses the four conversations evenly distributed messages based on the sales order ID ( SET @dialogHandleID = @salesOrderID % 4) . Because messages for the same sales order ID are delivered in a single conversation it is guaranteed that the messages are delivered exactly once in order manner.

CREATE

PROCEDURE SendChanges

AS

BEGIN

DECLARE @last_sync_version bigint;

DECLARE @salesOrderID bigint;

DECLARE @dialogHandleID INT;

DECLARE @dialogHandle uniqueidentifier;

DECLARE @changeReportXML XML;

DECLARE @next_baseline bigint;

DECLARE @TotalDialogs INT;

DECLARE @logMsg VARCHAR(MAX);

BEGIN TRANSACTION;

SELECT TOP (1) @last_sync_version = lastVersion

FROM LastVersion;

SET @TotalDialogs = 4;

--Create a cursor on the change table for [SalesOrderHeader]

DECLARE cursorChangeOrderHeader

CURSOR FORWARD_only READ_ONLY

FOR SELECT SalesOrderID FROM

CHANGETABLE (

CHANGES [SalesOrderHeader], @last_sync_version) AS Cursor_CH

ORDER BY SYS_CHANGE_VERSION;

--Open the cursor on the change table for [SalesOrderHeader]

--Loop for each changed sales order id

OPEN cursorChangeOrderHeader

WHILE(1=1)

BEGIN

FETCH NEXT FROM cursorChangeOrderHeader INTO @salesOrderID;

--If there is no more changed sales order then exit

IF (@@FETCH_STATUS != 0) BREAK;

--<Fetching changed data and creating XML file.>

--<Please refer to the code block on section 2.a.1.>

--Find the conversation handle for the sales order

--from the dialog handle table

SET @dialogHandleID = @salesOrderID % @TotalDialogs;

SELECT @dialogHandle = dialogHandle

FROM DialogHandles

WHERE ID = @dialogHandleID;

--Capture last version info

SELECT @next_baseline = SYS_CHANGE_VERSION

FROM CHANGETABLE (

CHANGES[AdventureWorks].[Sales].[SalesOrderHeader],

@last_sync_version) as c_soh

WHERE @salesOrderID = SalesOrderID;

--Send the message using Broker

SEND ON CONVERSATION @dialogHandle

MESSAGE TYPE [RealTimeDImessagetype](@changeReportXML);

END

CLOSE cursorChangeOrderHeader;

DEALLOCATE cursorChangeOrderHeader;

UPDATE LastVersion SET lastVersion = @next_baseline;

COMMIT;

END

;

Activation

Activation allows message processing logic to be launched when a message arrives on a Service Broker queue. When an internal activation is used for processing messages a stored procedure is declared on a Service Broker queue, and invoked on a background thread when a message arrives. A user can also specify an executable for processing the messages as an external activator. For example, SQL Server Integration Services (SSIS) can be used as an external activation procedure to process messages. Please refer to the following link for the code sample and document of External Activator [https://www.codeplex.com/SQLSrvSrvcBrkr/Release/ProjectReleases.aspx?ReleaseId=3853].

In the real time data integration services demo we use internal activators to process event notification messages on the initiator service as well as changed data information messages on the target service. We briefly mention how the services process the messages in the activation procedures.

  • Message processing in the initiator

The real time data integration initiator handles messages from two different services. One of the services is an asynchronous event notification service, and the other is a real time data integration target service. A single service in the initiator handles the message from the two different sources based on message types and service names. The following pseudo-code block describes the message processing logic in the initiator.


WHILE

there is any message on ‘RealTime_DI_Initiator_queue’

RECEIVE a message FROM the queue

IF message type is ‘EndDialog’ THEN

END CONVERSATION

ELSE IF message type is ‘ERROR’ THEN

IF service name is ‘RealTime_DI_Initiator_Service’ THEN

Raise error;

Create a new dialog;

Resend pending messages using the dialog;

Replace old dialog with the new one;

END CONVERSATION (old dialog);

IF service name is ‘Asynchronous_Trigger_Target_Service’ THEN

Raise error;

END CONVERSATION;

ELSE IF message type is ‘Asynchronous triggering’ THEN

RECEIVE

WHERE conversation_handle is identical with this message’s handle

EXEC SendChanges PROCEDURE

- Message processing in the target

The message processing procedure in the real time data integration target receives messages from a target queue, and transforms the messages from the XML format into a supported schema. A simple and straightforward way to process messages is to receive a message from the queue and to transform it one by one until all the messages are processed on the queue. However, the mechanism may hurt the performance of the data integration target. Instead of receiving a single message and transform it the data integration target service uses a cursor-based processing mechanism. It receives all the messages from the target queue, and stores in a temporary table. A cursor iterates the table to fetch a message and process it to covert from an XML format to a desired schema. The following code block shows the activation procedure on the target.

CREATE

PROCEDURE ProcessMessagesDW

AS

BEGIN

DECLARE @handle uniqueidentifier;

DECLARE @messageBody XML;

DECLARE @tableMessages TABLE(

queuing_order

BIGINT,

conversation_handle UNIQUEIDENTIFIER,

message_body

VARBINARY(MAX));

DECLARE cursorMessages CURSOR

FORWARD_ONLY

READ_ONLY

FOR SELECT conversation_handle,

message_body

FROM @tableMessages

ORDER BY queuing_order;

WHILE(1=1)

BEGIN

BEGIN TRANSACTION;

WAITFOR(RECEIVE

queuing_order

,

conversation_handle,

message_body

FROM [RealTimeDItargetqueue]

INTO @tableMessages), TIMEOUT 1000;

IF(@@ROWCOUNT = 0)

BEGIN

COMMIT;

BREAK;

END

OPEN cursorMessages;

WHILE(1=1)

BEGIN

FETCH NEXT FROM cursorMessages

INTO @handle, @messageBody;

IF(@@FETCH_STATUS != 0)

BREAK;

-- <Message transformation>

END

CLOSE cursorMessages;

DELETE FROM @tableMessages;

COMMIT;

END

DEALLOCATE cursorMessages;

END

Data transformation

After receiving messages the target service transforms the received messages, and populates tables with the changed data information from the messages. The received messages are in XML format. The service processes each of the messages to obtain required information from the message using TSQL language coupled with integrated XML support. The following code block shows a sample of the transformation using TSQL. On this example, the transformation is occurred only for the data insert event.

INSERT

INTO [AdventureWorksDW].[dbo].[FactInternetSales]

SELECT

N1

.SOH.value('CustomerID[1]', 'int')

AS

[CustomerKey]

,N2.SOD.value('SpecialOfferID[1]', 'int')

AS [PromotionKey]

,N2.SOD.value('CarrierTrackingNumber[1]','NCHAR(9)')

AS

CarrierTrackingNumber

,N1.SOH.value('PurchaseOrderNumber[1]','NVCHAR(25)')

AS

[CustomerPONumber]

FROM

@messageBody

.nodes('/Sales/SalesOrderHeader') N1(SOH)

CROSS

APPLY soh.nodes('SalesOrderDetail') N2(SOD)

WHERE

N1

.SOH.value('CustomerType[1]', 'CHAR') = 'I'

AND

N2.SOD.value('SYS_CHANGE_OPERATION[1]','CHAR')='I'

SQL Server Integration Services (SSIS) also provides the data transformations. SSIS supports various forms of data transformation between heterogeneous sources. Please refer to the following link for more detail information about SSIS [https://technet.microsoft.com/en-us/sqlserver/bb671392.aspx].

This document discusses about real-time data integration technologies with the coordination of a set of powerful SQL Server technologies. The service provides reliable and transparent data integration between instances. The service is composed with the following technologies.

Data tracking: Change tracking

Change notification: Triggers and Service Broker

Reliable data movement: Service Broker

Activation: Internal, Blocking with WAITFOR RECEIVE

Transformation: TSQL with XML support

The complete code list for the demo can be found on the following link [https://www.codeplex.com/SQLSrvSrvcBrkr/Release/ProjectReleases.aspx?ReleaseId=15139].

Comments

  • Anonymous
    February 01, 2009
    You may notice that from SQL Server 2005 version onwards MSDB system database has got more importance

  • Anonymous
    November 18, 2009
    Hi, I did a simple test to know how big is the impact on the operational database: I compare how much time it takes to insert 500 rows before and after implementing the solution. I have a big big difference. What can explain it? Is it supposed to be like that? Please help

  • Anonymous
    March 09, 2010
    There is some useful information provided in this post, along with the code. But there are many errors in the code, especially in the creation of Service Broker service objects. This makes it difficult to work out what you were meaning to do. It's really annoying, and even an amateur developer can test code before releasing it, to see that it works. It's just unbelievable that so many posts on the internet have code that doesn't work. Yet another person who doesn't do things properly.

  • Anonymous
    March 11, 2010
    Damian, Thanks for your comment on this article. Even though we have run the application many times before posting it onto this blog there are still some errors.  I believe that the errors come from the following issues.

  1. An user may not have the proper setup to run the application.  The scripts assume that the sample databases, AdventureWorks and AdventureWorksDW exist on the user's server.
  2.  An user may not run the scripts in right order.  The scripts have dependency among them.  If the run order is violated then a script throws errors. We will update the demo scripts with the information about the sample databases and the run order.  Of course, we are going to run the application in various user environment many times before announcing the update. Thanks again for your valuable feedback.
  • Anonymous
    October 05, 2010
    We had tried to collect some performance data for trigger based solution mentioned in this article. Essentially performance is quite discouraging. I can see that another reader - valerie - has posted her concerns here almost one year ago, and there were no replies. Does it mean that from performance point of view, utilization of SSB in a trigger is just theoretical topic and cannot be used in actual production solution due to low performance? I guess, even if there will be no official answer – it would be helpful for other people who would stumble upon this article to know that several people had experienced poor performance of the solutions mentioned here and were not able to get any performance related advices or hints or “know how” ideas …

  • Anonymous
    October 07, 2010
    The current design of the trigger-based data integration initiates the data movement based on the number of event notifications on the queue (RealTimeDIinitiatorqueue).  In other words, the store procedure (ProcessingMessages) starts to send the messages once a single or multiple notifications are enqueued onto the queue.  The message send logic can be deferred to multiple notification.  It can improve the throughput of the data integration.   Please take a look at the procedure, [ProcessingMessages] in the file, RealTimeDataIntegrationServiceBrokerAW.sql.  The current implementation uses a single notification to initiate the send logic.  In other words, the message is sent for each notification.  It creates large overhead on the system. The main idea of this article is to present how to use various SQL techs for a solution to the data integration.  The solution presented in this article may not be a proper one in the perspective of performance.   The solution has a trigger initiate a store procedure to create a message about the change and to send it using a dialog.  An alternative to the solution is that the trigger directly creates the message, and sends it on the dialog.  Please refer to the following articles for the message creation logic in the trigger.

  • Anonymous
    February 01, 2011
    Jang said on the post on 7 Oct 2010 4:23 PM An alternative to the solution is that the trigger directly creates the message, and sends it on the dialog. Does anyone test the performance using that alternative? I'd like to know what difference it could be. I think it won't make much difference between calling in a trigger directly and in a stored procedure called by the trigger.

  • Anonymous
    August 13, 2011
    I've tested asynchronous triggers also and the results were not what I expected.  What I experienced was 15 millisecond inserts using and an asynchronous trigger verses 1 millisecond or less for a trigger the dumps to staging table.  Service broker itself may be asynchronous however speaking to the service broker is synchronous and has overhead.

  • Anonymous
    May 04, 2014
    I trust this will not send even a single record to AdventureDW coz it is not complete. I wasted my time.