How to: Deliver Changes in Batches (SQL Server)

Article
07/07/2014

This topic describes how to deliver changes in batches for database synchronization in Sync Framework that uses SqlSyncProvider, SqlCeSyncProvider, or DbSyncProvider. The code in this topic focuses on the following Sync Framework classes:

Understanding Batching

By default, Sync Framework delivers changes to each node in a single DataSet object. This object is held in memory as changes are applied to a node. The default behavior works well if there is sufficient memory on the computer where changes are applied and the connection to the computer is reliable. Some applications, however, can benefit from having changes divided into batches. Consider the following scenario for a synchronization application:

A large number of clients that use SqlCeSyncProvider synchronize periodically with a server that uses SqlSyncProvider.
Each client has a limited amount of memory and disk space.
The connections between the server and clients are low bandwidth and intermittent, often resulting in long synchronization times and dropped connections.
The size of the changes (in KB) for a typical synchronization session is large.

Batching changes is ideal for this type of scenario because it provides the following capabilities:

Enables the developer to control the amount of memory (the memory data cache size) that is used to store changes on the client. This can eliminate out-of-memory errors on the client.
Enables Sync Framework to restart a failed synchronization operation from the start of the current batch, rather than the start of the entire set of changes.
Can reduce or eliminate the need to re-download changes or re-enumerate changes on the server due to failed operations.

Batching is simple to configure for 2-tier and n-tier applications, and it can be used for the initial synchronization session and for subsequent sessions.

Configuring and Using Batching

Batching in Sync Framework works as follows:

The application specifies the memory data cache size for each provider that is participating in the synchronization session.

If both providers specify a cache size, Sync Framework uses the smaller value for both providers. The actual cache size will be no more than 110% of the smallest specified size. During a synchronization session, if a single row is greater than 110% of the size the session terminates with an exception.

A value of 0 (the default) disables batching. If one provider has batching enabled, and the other provider does not, batching is enabled for both upload and download.
The application specifies the location of the spooling files for each provider. By default, spooling files are written to the temp directory for the account under which the synchronization process runs.
The application calls Synchronize.
Sync Framework enumerates changes one row at a time. If the memory data cache size for the source provider is reached, changes are persisted to a local spooling file, and the in-memory data is flushed. This process continues until all changes are enumerated.
For n-tier scenarios, service and proxy code in the application streams the spooling files to the destination. For more information, see Code Specific to N-Tier in this topic. For two-tier scenarios, the local file is already at the destination because in this case all synchronization code runs at the destination.
Sync Framework de-serializes changes from the spooling files and applies those changes. This process continues until all changes are applied to the destination.

All batches are applied in one transaction. That transaction is not created until the last batch is received by the destination provider.
For two-tier scenarios, Sync Framework cleans up the spooling file. For n-tier scenarios, Sync Framework cleans up spooling files on the computer on which synchronization is initiated, but files on the middle tier should be cleaned up by the proxy (demonstrated in the sample Cleanup() method later in this topic). To handle cases in which a session is aborted, the middle tier should also use a process to clean up files that are older than a certain date.

Note

The data changes that will be applied to a node are available from the Context property of the DbChangesSelectedEventArgs object. When data is not batched, the ChangesSelected event fires only one time, and all changes are available from the Context property. When data is batched, ChangesSelected fires for each batch and only the changes from that batch are available at that time. If you require changes from all batches, respond to each ChangesSelected event and store the data that is returned.

The following table describes the types and members that are related to batching. The only property required for batching is MemoryDataCacheSize, but it is also recommended to set BatchingDirectory.

Type or Member	Description
BatchingDirectory	Gets or sets the directory in which batch files are spooled to disk. The path specified must be a directory that is local to the provider or proxy that is executing. UNC file paths and non-file URI paths are not supported. Important Spooling files contain raw database data. The directory to which files are written must be protected with the appropriate access controls.
CleanupBatchingDirectory	Gets or sets whether to clean up batching files after the changes in the files have been applied to the destination. The default is to clean up the files.
MemoryDataCacheSize	Gets or sets the maximum amount of memory, in KB, that Sync Framework uses to cache changes before spooling those changes to disk. Note This setting affects only the size of the data and metadata that are held in memory for changes that are sent to the destination. It does not limit the memory used by other Sync Framework components or user application components.
BatchApplied	The event that occurs after each batch of changes has been applied to the destination.
BatchSpooled	The event that occurs after each batch of changes has been written to disk.
DbBatchAppliedEventArgs	Provides data for the BatchApplied event, including the current batch number and the total number of batches to apply.
DbBatchSpooledEventArgs	Provides data for the BatchSpooled event, including the current batch number and batch size.
BatchFileName	Gets or sets the name of the file to which spooled changes are written.
IsDataBatched	Gets or sets whether data is sent in multiple batches or in a single DataSet object.
IsLastBatch	Gets or sets whether the current batch is the last batch of changes.
BatchedDeletesRetried	Gets or sets the number of delete operations that were retried during a synchronization session in which changes were batched. Deletes are retried for batches because of the ordering of primary key and foreign key deletes. If a foreign key delete does not exist in the current batch or an earlier batch, the corresponding primary key delete fails. Failed deletes are retried once after all batches are applied.
SelectIncrementalChangesCommand (relevant only for DbSyncProvider)	Gets or sets the query or stored procedure that is used to select incremental changes from the local database. Note It is recommended that the query that is specified include the clause ORDER BY [sync_row_timestamp]. Ordering rows by timestamp value ensures that if a synchronization session is restarted, the provider will begin to enumerate from the highest timestamp watermark (individual table watermarks are persisted with each batch) and not miss any changes.
DataTable	Gets or sets the DataTable object that contains the changes to be synchronized. If batching is enabled, accessing this property de-serializes the spooled file from disk. Any changes made to the tables are then persisted back to the spooled file.
DataSet	Gets or sets a DataSet object that contains the selected rows from the peer database. Returns null if IsDataBatched is true.

Common Code for Two-Tier and N-Tier

The code examples in this section demonstrate how to handle batching in 2-tier and n-tier scenarios. This code is taken from two of the samples that are included in the Sync Framework SDK: SharingAppDemo-CEProviderEndToEnd and WebSharingAppDemo-CEProviderEndToEnd. Each example is introduced with the location of the code, such as SharingAppDemo/CESharingForm. In terms of batching, the key difference between the two applications is the additional code required in the n-tier case to upload and download the spooled files and to create directories for each node that enumerates changes.

The following code example from the synchronizeBtn_Click event handler in SharingAppDemo/CESharingForm sets the memory data cache size and the directory to which spooling files should be written. The path specified for BatchingDirectory must be a directory that is local to the provider or proxy that is executing. UNC file paths and non-file URI paths are not supported. The path specified for BatchingDirectory is the root directory. For each synchronization session, Sync Framework creates a unique subdirectory in which to store spooling files for that session. This directory is unique for the current source-destination combination to isolate files for different sessions. Consider potential side effects when choose a batching directory. For example, when the provider is hosted by Internet Information Services (IIS), do not use an IIS virtual directory as the batching directory. IIS can trigger a restart when changes are made to items in a virtual directory, which causes synchronization to fail.

The following code example from the synchronizeBtn_Click event handler in WebSharingAppDemo/CESharingForm sets the same properties, but the batching directory for the destination is set for the proxy, rather than directly for the provider as it is in the 2-tier scenario:

The following code examples from the SynchronizationHelper file in both applications create methods to handle the BatchSpooled and BatchAppliedEvents that are raised by a provider during change enumeration and change application:

Code Specific to N-Tier

The remainder of the code examples apply only to the n-tier scenario in WebSharingAppDemo. The relevant n-tier code is contained in three files:

The service contract: IRelationalSyncContract
The Web service: RelationalWebSyncService
The proxy: RelationalProviderProxy

The two providers SqlSyncProvider and SqlCeSyncProvider both inherit from RelationalSyncProvider, so this code applies to both providers. Additional store-specific functionality is separated into proxy and service files for each type of provider.

To understand how batching works in an n-tier scenario, consider a synchronization session in which the server is the source and the client is the destination. After changes have been written to the local directory on the server, the following process occurs for downloaded changes:

The GetChangeBatch method is called on the client proxy. As demonstrated later in the sample code, this method should include specific code to handle batching.
The service gets a batch file from SqlSyncProvider. The service removes the complete path information and sends only the file name over the network. This prevents exposing the directory structure of the server to the clients.
The proxy call to GetChangeBatch returns.
1. The proxy detects that changes are batched so it calls DownloadBatchFile by passing the batch file name as an argument.
2. The proxy creates a unique directory (if one doesn’t exist for the session) under RelationalProviderProxy.BatchingDirectory to hold these batch files locally. The directory name is the replica ID of the peer that is enumerating changes. This ensures that the proxy and service have one unique directory for each enumerating peer.
The proxy downloads the file and stores it locally. The proxy replaces the filename in the context with the new full path to the batch file on the local disk.
The proxy returns the context back to the orchestrator.
Repeat steps 1 through 6 until the last batch is received by proxy.

The following process occurs for uploaded changes

The orchestrator calls ProcessChangeBatch on the proxy.
The proxy determines that it is a batch file, so it performs the following steps:
1. Removes the complete path information and sends only the file name over the network.
2. Calls HasUploadedBatchFile to determine if the file has already been uploaded. If it has, step C is not necessary.
3. If HasUploadedBatchFile returns false, calls UploadBatchFile on the service, and uploads the batch file contents.
  
  The service will receive the call to UploadBatchFile and store the batch locally. Directory creation is similar to step 4 above.
4. Calls ApplyChanges on the service.
The server receives the ApplyChanges call and determines that it is a batch file. It replaces the filename in the context with the new full path to the batch file on the local disk.
The server passes the DbSyncContext to local SqlSyncProvider.
Repeat steps 1 through 6 until the last batch is sent.

The following code example from IRelationalSyncContract specifies upload and download methods that are used to transfer spooled files to and from the middle tier:

The following code examples from RelationalWebSyncService expose the UploadBatchFile and DownloadBatchFile methods defined in the contract, and include additional batching related logic in the following methods:

Cleanup: cleans up any spooled files from a specified directory or the temp directory if one is not specified.
GetChanges: checks if data is batched and if so it removes the directory path of the spooled file so that the path is not sent over the network. In n-tier scenarios, it is a security risk to send full directory paths over a network connection. The file name is a GUID.
HasUploadedBatchFile: returns whether a particular batch file has already been uploaded to the service.
ApplyChanges: checks if data is batched, and if so it checks if the expected batch file has already been uploaded. If the file has not been uploaded, an exception is thrown. The client should have uploaded the spooled file prior to calling ApplyChanges.

The following code examples from RelationalProviderProxy set properties and call methods on the Web service:

BatchingDirectory: enables the application to set the batching directory for the middle tier.
EndSession: cleans up any spooled files from a specified directory.
GetChangeBatch: downloads change batches by calling the DownloadBatchFile method.
ProcessChangeBatch: uploads change batches by calling the UploadBatchFile method.

How to: Deliver Changes in Batches (SQL Server)

Understanding Batching

Configuring and Using Batching

Common Code for Two-Tier and N-Tier

Code Specific to N-Tier

See Also

Concepts

Additional resources