How to Sync Large SQL Server Databases to SQL Azure
Over the past few days I have seen a number of posts from people who have been looking to synchronize large databases to SQL Azure and have been having issues. In many of these posts, people have looked to use tools like SSIS, BCP and Sync Framework and have run into issues such as SQL Azure closing the transaction due to throttling of the connection (because it took to apply the data) and occasionally local out-of-memory issues as the data is sorted.
For today’s post I wanted to spend some time discussing the subject of synchronizing large data sets from SQL Server to SQL Azure using Sync Framework. By large database I mean databases that are larger than 500MB in size. If you are synchronizing smaller databases you may still find some of these techniques useful, but for the most part you should be just fine with the simple method I explained here.
For this situation, I believe there are three very useful capabilities within the Sync Framework:
1) MemoryDataCacheSize: This helps to limit the amount of memory that is allocated to Sync Framework during the creation of the data batches and data sorting. This typically helps to fix any out-of-memory issues. In general I typically allocate 100MB (100000) to this parameter as the best place to start, but if you have larger or smaller amounts of free memory, or if you still run out-of-memory, you can play with this number a bit.
RemoteProvider.MemoryDataCacheSize = 100000;
2) ApplicationTransactionSize (MATS): This tells the Sync Framework how much data to apply to the destination database (SQL Azure in this case) at one time. We typically call this Batching. Batching helps us to work around the issue where SQL Azure starts to throttle (or disconnect) us if it takes too long to apply the large set of data changes. MATS also has the advantage of allowing me to tell sync to pick up where it left off in case a connection drops off (I will talk more about this in a future post) and has the advantage that it provides me the ability to get add progress events to help me track how much data has been applied. Best of all it does not seem to affect performance of sync. I typically set this parameter to 50MB (50000) as it is a good amount of data that SQL Azure can commit easily, yet is small enough that if I need to resume sync during a network disconnect I do not have too much data to resend.
RemoteProvider.ApplicationTransactionSize = 50000;
3) ChangesApplied Progress Event: The Sync Framework database providers have an event called ChangesApplied. Although this does not help to improve performance, it does help in the case where I am synchronizing a large data set. When used with ApplicationTransactionSize I can tell my application to output whenever a batch (or chunk of data) has been applied. This helps me to track the progress of the amount of data that has been sent to SQL Azure and also how much data is left.
RemoteProvider.ChangesApplied += new EventHandler<DbChangesAppliedEventArgs>(RemoteProvider_ChangesApplied);
When I combine all of this together, I get the following new code that I can use to create a command line application to sync data from SQL Server to SQL Azure. Please make sure to update the connection strings and the tables to be synchronized.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data.SqlClient;
using Microsoft.Synchronization.Data.SqlServer;
using Microsoft.Synchronization.Data;
using Microsoft.Synchronization;
namespace SQLAzureDataSync
{
class Program
{
public static string sqlazureConnectionString = "Server=[Your SQL Azure Server].database.windows.net;Database=AdventureWorksLTSQLAzure;User ID=[Your SQL Azure User Name]@[Your SQL Azure Server];Password=[Your SQL Azure Password];Trusted_Connection=False;Encrypt=True;";
public static string sqllocalConnectionString = "Server=(local);Database=AdventureWorksLT2008;Trusted_Connection=True";
public static readonly string scopeName = "AllTablesSyncGroup";
public static int transactionCount;
public static uint BatchSize = 50000;
public static uint MemorySize = 100000;
static void Main(string[] args)
{
// Test if input arguments were supplied:
if (args.Length == 0)
{
System.Console.WriteLine("Please enter an argument.");
System.Console.WriteLine("Usage: SyncTest.exe -setup");
System.Console.WriteLine(" SyncTest.exe -sync");
}
else if (args[0] == "-setup")
Setup();
else if (args[0] == "-sync")
Sync();
}
public static void Setup()
{
try
{
SqlConnection sqlServerConn = new SqlConnection(sqllocalConnectionString);
SqlConnection sqlAzureConn = new SqlConnection(sqlazureConnectionString);
DbSyncScopeDescription myScope = new DbSyncScopeDescription(scopeName);
DbSyncTableDescription TestSchema1 = SqlSyncDescriptionBuilder.GetDescriptionForTable("TestSchema1", sqlServerConn);
// Add the tables from above to the scope
myScope.Tables.Add(TestSchema1);
// Setup SQL Server for sync
SqlSyncScopeProvisioning sqlServerProv = new SqlSyncScopeProvisioning(sqlServerConn, myScope);
sqlServerProv.CommandTimeout = 60 * 30;
if (!sqlServerProv.ScopeExists(scopeName))
{
// Apply the scope provisioning.
Console.WriteLine("Provisioning SQL Server for sync " + DateTime.Now);
sqlServerProv.Apply();
Console.WriteLine("Done Provisioning SQL Server for sync " + DateTime.Now);
}
else
Console.WriteLine("SQL Server Database server already provisioned for sync " + DateTime.Now);
// Setup SQL Azure for sync
SqlSyncScopeProvisioning sqlAzureProv = new SqlSyncScopeProvisioning(sqlAzureConn, myScope);
if (!sqlAzureProv.ScopeExists(scopeName))
{
// Apply the scope provisioning.
Console.WriteLine("Provisioning SQL Azure for sync " + DateTime.Now);
sqlAzureProv.Apply();
Console.WriteLine("Done Provisioning SQL Azure for sync " + DateTime.Now);
}
else
Console.WriteLine("SQL Azure Database server already provisioned for sync " + DateTime.Now);
sqlAzureConn.Close();
sqlServerConn.Close();
}
catch (Exception ex)
{
Console.WriteLine(ex);
}
}
public static void Sync()
{
try
{
SqlConnection sqlServerConn = new SqlConnection(sqllocalConnectionString);
SqlConnection sqlAzureConn = new SqlConnection(sqlazureConnectionString);
SqlSyncProvider RemoteProvider = new SqlSyncProvider(scopeName, sqlAzureConn);
SqlSyncProvider LocalProvider = new SqlSyncProvider(scopeName, sqlServerConn);
//Set memory allocation to the database providers
RemoteProvider.MemoryDataCacheSize = MemorySize;
LocalProvider.MemoryDataCacheSize = MemorySize;
//Set application transaction size on destination provider.
RemoteProvider.ApplicationTransactionSize = BatchSize;
//Count transactions
RemoteProvider.ChangesApplied += new EventHandler<DbChangesAppliedEventArgs>(RemoteProvider_ChangesApplied);
SyncOrchestrator orch = new SyncOrchestrator();
orch.RemoteProvider = RemoteProvider;
orch.LocalProvider = LocalProvider;
orch.Direction = SyncDirectionOrder.UploadAndDownload;
Console.WriteLine("ScopeName={0} ", scopeName.ToUpper());
Console.WriteLine("Starting Sync " + DateTime.Now);
ShowStatistics(orch.Synchronize());
sqlAzureConn.Close();
sqlServerConn.Close();
}
catch (Exception ex)
{
Console.WriteLine(ex);
}
}
public static void RemoteProvider_ChangesApplied(object sender, DbChangesAppliedEventArgs e)
{
transactionCount += 1;
int totalBytes = transactionCount * Convert.ToInt32(BatchSize);
Console.WriteLine("Changes Applied event fired: Transaction " + totalBytes+" bytes.");
}
public static void ShowStatistics(SyncOperationStatistics syncStats)
{
string message;
message = "\tSync Start Time :" + syncStats.SyncStartTime.ToString();
Console.WriteLine(message);
message = "\tSync End Time :" + syncStats.SyncEndTime.ToString();
Console.WriteLine(message);
message = "\tUpload Changes Applied :" + syncStats.UploadChangesApplied.ToString();
Console.WriteLine(message);
message = "\tUpload Changes Failed :" + syncStats.UploadChangesFailed.ToString();
Console.WriteLine(message);
message = "\tUpload Changes Total :" + syncStats.UploadChangesTotal.ToString();
Console.WriteLine(message);
message = "\tDownload Changes Applied :" + syncStats.DownloadChangesApplied.ToString();
Console.WriteLine(message);
message = "\tDownload Changes Failed :" + syncStats.DownloadChangesFailed.ToString();
Console.WriteLine(message);
message = "\tDownload Changes Total :" + syncStats.DownloadChangesTotal.ToString();
Console.WriteLine(message);
}
}
}
Liam Cavanagh
Comments
Anonymous
September 26, 2010
The comment has been removedAnonymous
September 27, 2010
Hi Pini, You mention that the source has only a few MB – so it looks like the local office only needs to synchronize a subset of files on the server. If that is true, each local office endpoint with sync with the server with a different scope. Also, it looks like you only do one-way sync where the local office is always the source and the file server always the destination? Will changes from one local office flow down to another local office? Do you expect conflicts when syncing from source to destination? If you make sure that you define a different scope for each local office, and the scope on the server is only a subset of the whole data, you should see smaller metadata files. Thanks sidAnonymous
September 27, 2010
The comment has been removedAnonymous
September 28, 2010
Couple more questions: if local office A uploads data to the file server, does that data then flow down to local office B? does anyone change the copy of the data on the file server?Anonymous
September 28, 2010
No. The data is transfered only in one direction form the offices up to the main storage server. In the file server more data is added and since this storage also works with the same web application the offices are working. The full story is that each office has an exact copy of the web application but a local file storage so they can work OCA ( occetionally connected)Anonymous
October 22, 2010
I am using sync framework since its CTP1. It will work well for small size of data (either file or database). For large DB or TB of files it is always a problem.Anonymous
November 03, 2010
Thanks for the post, but I'm afraid this is not the full story to it. We have been using the November CTP of the AzureSyncProvider. It has always worked fine for us, with good performance. Given the posts about Azure being supported now natively by SyncFx 2.1, and the fact that I dislike using CTP code in a production environment, we decided to upgrade. Since then the performance has gone down dramatically. We Synchronize SqlCe to Azure. The problem becomes apparent when a large number of new rows are present SqlCe (e.g 50.000). Change enumeration is fast enough (a few seconds), however the change application to Azure takes almost two minutes. Normally when we sync we have about 150.000 - 200.000 new rows, so you can see how this takes too much time. By performing a query on the change Tracking table of Azure I noticed a significant difference between elements that were added by the AzureSyncProvider CTP and the ones that were added by the 2.1 SqlSyncProvider: the [last_change_datetime] column of entries that we synced using the AzureSyncProvider all have the same timestamp. This suggests that a single transaction was used. In the elements synced using the SqlSyncProvider however at most 75 entries share the same timestamp. If those timestamps are indeed caused by using separate transactions then I can imagine why it takes a long time to sync (1)50.000 entries. For testing I have created both a new SqlCe database and a new Sql Azure database and provisioned these using SyncFx 2.1 to make sure this was not caused by missing stored procedures or anything like that. Is this a bug that I need to report, or is it by design somehow? And more important: is there a solution?Anonymous
March 09, 2011
Hi, We have built an application to sync data using MS Sync framework 2.1 . The application runs successfully. However, there is a case when the rowcount in a table is a large as 7.2 millions. We have implemented batching in this application so that it runs successfully. However, it takes around 1.25 hours to successfully sync this table. We need to reduce this sync time. Can anyone help us to know the work around for a faster application?Anonymous
April 15, 2011
Setup() is OK but initial Sync() is very slow (hours). Using MS Sync FW 2.1 . . . 1.5 MM rows are already staged on local and remote (SQLAzure) servers.Anonymous
May 03, 2011
One issue we have run into is the extra tables created by MS Sync 2.1. These tables and their indices are quite large and cause us to bump into our SQL Azure size limits. Is there any way around this?Anonymous
October 20, 2013
Hi, I followed all the steps as in the article. Still I see network issues after inserting many records. I am seeing this information in the log file, as many records are inserted to database in transaction and later failed with network issue and data is rolling back. same issue is reproducing after many retries. Please let me know you have any other idea to fix this issue?Anonymous
February 12, 2014
Unforutnately, even using these additional tuning parameters, my data sync between Azure DB 1 and Azure DB 2 was still just batching files after 7 days. I don't think that this approach will work for larger Azure databases without being able to easily do the following:
- Batch and then transmit and transact. This way, you don't spend a day or two waiting for all of the batches to be batched and transmitted and then rolled back near the end due to a SQL timeout or throttling issue. If one batches and transacts in small chunks, one can slowly get the databases synchronized.
- Have an easy way to specify how much data to retrieve from each table for each batch, based on, for example, date/time rows. Many of our tables reference other tables that have the appropriate row. So, this mechanism would need to be fancier than the current Microsoft Data Sync filtering method, which I believe only allows one to specify columns in the table that one is currently synchronizing. I've now probably spent over 2 weeks of real developer time on: a) Issues with the SQL Azure Data Sync preview (total failure for synchronizing a larger DB to an on-premises empty copy). b) More issues with SQL Data Sync Framework-based application (see above). So, I've gone back to an old and proven technology -- SSIS. Thankfully, we have an on-premises SQL Server and the knowledge of how to write and SSIS package that grabs X amount of data at a time, synchronizes it, and then repeats based on a SQL Server Agent job schedule. It took me a few days to write thet SSIS package. But, it succeeded the very first time. PS. Our Azure DB is currently ~ 60 GB in size. Katy Leonard Senior Software Development Engineer Unify Square
Anonymous
May 12, 2014
Dear All, When I run SyncDB.exe -sync I got error below: Microsoft.Synchronization.Data.DbNotProvisionedException: the current operation could not be completed because the database is not provisioned for sync or you not have permission to the sync configuration tables. I used sa account but I still get that error. Could you help please?Anonymous
October 23, 2015
I am syncing sql Data to Azure and it is about 1 Gb I didn't get so much problem.Some of My Problem is that Sometime it show semaphore timeout period has expired and sync stops. And Sometimes all Tables are not getting synced as i have make an array of all the tables and apply foreach for it but it is syncing all.