February 2011

Volume 26 Number 02

Forecast: Cloudy - Branch-Node Synchronization with SQL Azure, Part 2: Service-Based Sync

By Joseph Fultz | February 2011

image: Joseph FultzLast month, I focused on the general architecture and design for synchronizing corporate databases with SQL Azure and with various destination nodes from there. I talked about optimizing by filtering and also using geographic distribution—or using the two strategies in conjunction—to optimize the overall data distribution and collection network.

This month, I’m going to pull in Azure to host a synchronization service and focus on synchronizing via a service interface in the cloud. This will provide a means of scaling the synchronization mechanism to handle many more end nodes than could be handled by direct-to-database synchronization. I’ll use the October 2010 Community Technology Preview (CTP) release of the Microsoft Sync Framework 4.0 (bit.ly/dpyMP8), which is built on top of the 2.1 framework used in January’s issue.

A synchronization service can be built straight on top of version 2.1 and a good sample and walk-through can be found at bit.ly/bibIdl and bit.ly/epyImQ. However, with the availability of the 4.0 CTP and the Internet-focused elements of the release, it makes good sense to leverage it for the Azure Sync Service. A fair amount of code still needs to be written to make a functional solution, but in the end, we’ll end up with a synchronization service that could be consumed by any device using OData.

Synchronizing at Internet Scale

I covered some ideas last month on how to scale the direct-to-database sync. However, in some—if not many—cases there are a number of reasons that the scale problem isn’t so easily solved. Giving just some cursory thought to things that won’t be covered by using the previously described approaches, one can easily come up with the following:

  1. Due to the relationship of the data, it can’t easily be split.
  2. There isn’t segmentation that makes sense and any split would be arbitrary, which would likely lead to unforeseen hot-spots in the various partitions of the solution.
  3. The amount of data that would need to be replicated would drive up the cost beyond being cost-effective if it had to exist in multiple places.
  4. Burst times for synchronization; for example, end-of-day processing for hundreds or thousands of retail locations creates contention regardless of the partitioning.

This obviously doesn’t exhaust all of the possible reasons that would necessitate a design other than straight to SQL Azure for synchronization, but it’s a good enough list to broach the topic and take a look at how to solve the problem. As with most things in computer science, I’ll attempt to resolve the issues above by inserting a layer of indirection. In this case, it will be a service layer hosted in a Azure Web Role used as the synchronization point instead of synchronizing directly with the SQL Azure instance. I’ve updated the end state diagram from last month by adding a placeholder for the Sync Service hosted in Azure, arriving at a logical design such as is shown Figure 1.

image: Typical Corporate Architecture

Figure 1 Typical Corporate Architecture

Getting Started

The Sync Framework 4.0 is particularly suited to help solve this problem. However, it will require that I do a bit more work than the simple model of synchronizing directly between databases. The 4.0 CTP shipped with a good sample and walk-through in the help file, which is titled “Creating a Sync Service in Azure.” I’ll use it as the basis for discussing the Sync Service implementation. The client code is a bit more difficult because in version 4.0 there isn’t a client-side runtime library to assist, due to the abstraction created to open up synchronization to any platform that will work with OData. However, there’s a sample in the 4.0 samples for Windows Mobile 6.5 using SQL Server CE. I’ve co-opted the code needed and modified it to work with standard SQL Server. To start, the October 2010 4.0 CTP uses a certain set of objects to perform the synchronization activity and it helps to be familiar with them. The client application consumes a CacheController, which is responsible for communication to the Sync Service using OData. On the local side, the CacheController uses an OfflineSyncProvider, which is the data-store-specific—and likely per-target-platform—interface between the application and the data (see Figure 2). In this implementation, being based on the sample, there’s a StorageHandler object used to handle the local data access. The OfflineSyncProvider is a known type that’s used by the CacheController, but the StorageHandler is custom code written to handle all of the back-end store interaction. Think of the OfflineSyncProvider as the intelligence on top of the data-access library and the StorageHandler as the data-access library. Notably, the 4.0 CTP only ships with a built-in CacheController for Isolated Storage in a Silverlight client, which leaves me with some work to do in order to use standard SQL Server. The layout of the objects and interaction boundaries are represented at a high level in Figure 2.

image: Sync Framework 4.0 Client Synchronization Objects

Figure 2 Sync Framework 4.0 Client Synchronization Objects

Developing the Cloud Sync Service

I was always told to deliver the bad news first and the good news last. That way, the conversation—and hopefully the spirits of those participating in it—ends on a positive note. However, in this case, I’m going to reverse the delivery order, hoping to sell the solution on the merits of the easy part. The bulk of the work comes on the client side, but the framework provides a lot of assistance for the server side. In a design session I lead, I was once told by someone that sold death-care services (funerals, plots, coffins and so on) that they’d never make a single sale if they focused on “what it is” and that instead the focus needed to be on “what it does”; in the case of death care, peace of mind was the real commodity being bought, not a coffin and hole in the ground. Such is the case with the Sync Framework. The Sync Framework 2.1 took care of a lot of things for the developer, but it fell a little short of goal when it came to service-based synchronization. It didn’t at all address the plethora of devices and platforms that might want to synchronize with the data that’s made available through an Internet-facing synchronization service. With the—now popularly termed—consumerization of IT, my customers find themselves having to deal with many devices in the hands of people at all levels of the enterprise. Sync Framework 4.0 CTP is aimed at helping with this type of challenge, particularly in regard to synchronizing data to those devices.

Getting the server side of this solution up and going is quite simple. Basically, it comes down to these steps:

  1. Define the database
  2. Create a configuration file for it
  3. Use the SyncServiceUtil to provision the database using the config file
  4. Use the SyncServiceUtil to generate classes required for the synchronization service
  5. Create an Azure-based Web Role to host the service
  6. Deploy

If you’re like me when you read this summary, you think, “what configuration file?” The schema for the file can be found in the MSDN Library at bit.ly/h2FJod. Using that and referencing the ListDB database and the related config file for it that ships with the 4.0 samples, one can put together a custom config file that represents a database with minimal confusion. Once this file exists, the creation of the Azure-based services is a snap. First, the target database—in this case the ListDB sample in the 4.0 SDK—needs to be created in Azure. Once that’s done, the new SyncServiceUtil can be used to provision the database using a command similar to:

SyncSvcUtil /mode:provision 
/scopeconfig:listdbconfig.xml

The one thing that will have to be set in the config file is the connection to the SQL Azure database. Near the end of the config file is the <TargetDatabase /> element, which will need to be configured properly for the cloud:

<Databases>
  <TargetDatabase Name="listdb" DbServer="[URI for the SQL Azure DB 
   Instance]" DbName="listdb" UserName="[username]" Password="[password]" 
   UseIntegratedAuth="false" /> 
</Databases>

Running the utility will generate two files: DefaultScopeEntities.cs and DefaultScopeSyncServices.svc. The “DefaultScope” part of the name comes from the config file and is found in the element <SyncScope />:

<SyncScope Name="DefaultScope" IsTemplateScope="true">

The entities file is pretty much as described, but the DefaultScopeSyncServices.svc file is somewhat more significant as it generates the partial class that allows me to intercept service calls and add custom logic (something new to 4.0). The base synchronization logic is all included as part of the base object. Figure 3 shows the DefaultScopeSyncService class and the related entities class as the template type for the template class SyncService.

image: Object Browser View of SyncServices Generated Code

Figure 3 Object Browser View of SyncServices Generated Code

Note, on the right-hand side of Figure 3, the abbreviated list of service interfaces that are exposed to do the synchronizing (as compared to what would need to be exposed using Sync Framework 2.1 directly). If I wanted to add any custom logic to the synchronization process, I’d simply open the DefaultScopeSyncServices.svc file, pick the method interceptor and write to my heart’s content. To implement basic synchronization via the service interface that was just created, I simply need to associate the service/Web project containing the files with a Web Role and in the WebRole:OnStart method add a line to create the activation context:

public override bool OnStart()
{
  DiagnosticMonitor.Start("DiagnosticsConnectionString");
  // For information on handling
  // configuration changes, see the MSDN topic at 
  // go.microsoft.com/fwlink/?LinkId=166357
  RoleEnvironment.Changing += RoleEnvironmentChanging;
  Microsoft.Samples.Synchronization.ActivationContext.
    CreateActivationContext();
  return base.OnStart();
}

I then make a couple of configuration changes to ensure that the Sync Framework binaries are set to CopyAlways. To get the new service interface goodness, I ensure that the 4.0 Microsoft.Synchronization.dll is both referenced and set to be published with the package. Then I publish it to my Web Role and I’m ready to go. I can make a simple test by requesting the sync scopes that are currently available by entering a request such as jofultz.cloudapp.net/defaultscopeSyncService.svc/$syncscopes into my browser. I get back the following response, which gives me some confidence that the service is working:

- <service xml:base="https://rd00155d3a1a55:20000/defaultscopesyncservice.svc/" xmlns:atom="https://www.w3.org/2005/Atom" xmlns="https://www.w3.org/2007/app">
- <workspace>
  <atom:title>SyncScopes</atom:title> 
- <collection href="defaultscope">
  <atom:title>defaultscope</atom:title> 
  </collection>
  </workspace>
  </service>

I could also request other data, and if there are changes, I’ll get them as OData by default. I can do this in a browser or via a tool. Using the OData Viewer Tool on CodePlex (dataservicestool.codeplex.com/releases/view/52805), I issue the request to download changes: jofultz.cloudapp.net/defaultscopeSyncService.svc/DefaultScope/DownloadChanges?userid=BA9152CC-4280-4DAC-B32D-1782E2E8C3D3, which gives me the results as shown in Figure 4.

image: OData Viewer Tool DownloadChanges Result

Figure 4 OData Viewer Tool DownloadChanges Result

The great news here is that the additions in Sync Framework 4.0 CTP provide the simplified synchronization interfaces with results retrievable in OData ATOM and OData JSON. This opens up the synchronization to other platforms from the client standpoint and relegates proprietary data formats to being legacy formats—and all I had to do was run a utility, configure a project and add a line of code.

Implementing Client Synchronization

Here’s the point in the implementation where the headphones go on and multitasking gives way to focus. The cloud service was almost a matter of configuration, but the client piece takes a bit more work if you’re starting from scratch. Because the Sync Framework 4.0 CTP ships with a CacheController for isolated storage, if Silverlight is the target client platform, then the client implementation will be as easy as the cloud service implementation. However, my target is a Windows client running SQL Server Standard/Express, and that will require some effort. The SyncServiceUtil still assists by generating the needed entities, but a custom CacheController and OfflineSyncProvider will have to be created. More importantly, the data store will need to be modified in order to facilitate the change tracking. One might do this with a version 2.1-provisioned database or his own custom schema for change tracking. Such an implementation could add significant work and complexity to the overall application in terms of a more complicated database implementation and a more complicated code base. However, it needs to be done in order to leverage the rest of the framework. In describing this to others, I get asked, “Why not just do it all yourself?” The answer is simple: I do it this way to reduce the body of work and open the implementation to synchronization by other 2.1 and 4.0 framework sync clients/agents (including non-Windows platforms).

Take a look at the segmentation of work shown in Figure 5 for just the client and service pieces being discussed. You can see that using the framework reduces the amount of work by roughly 60 percent or more depending on the target client platform.

Figure 5 Segmentation of Work for Client and Service

Unit of Work Effort
DB Schema for Sync (Server) Configuration
Service Implementation Generated + 1 line of code
Customize Validation Hooks in Sync Hooks Generated; Only Have to Write Value-Add Code
DB Schema for Sync (client) Could Use 2.1 Provisioning or Custom
Sync Implementation for Non-Silverlight Custom
Sync Client for Silverlight Configuration + Generation

Working with the Mobile 6.5 and SQL CE sample provides me with an example implementation of what you might do with the database in order to implement the client synchronization; note the IsDirty, IsTombstone and Metadata fields as seen in Figure 6.

image: Columns to Support Custom Synchronization Implementation

Figure 6 Columns to Support Custom Synchronization Implementation

With a schema in place, I need a number of other things:

  1. CacheController implementation as mentioned before
    1. Local store interaction
    2. Service interaction
    3. Sync Conflict Handler
    4. Sync Error Handler
  2. Some code to generate and consume the OData
  3. Code definition for the entities being synchronized
  4. OfflineSyncProvider for the local SQL Server database

For items 1 and 2, I use the code provided for the 6.5 sample (see Figure 7) and place it in my own CacheController project, which consists entirely of code borrowed from the sample.

image: Files Used from the 6.5 Sample

Figure 7 Files Used from the 6.5 Sample

I use the SyncServiceUtil to generate the entities using the same config file as before, along with the “/mode:codegen” and “/target:client” flags. This generates a DefaultScopeEntities.cs file that has my client-side objects. Because I’m stealing from the 6.5 sample, I copy settings.cs, utility.cs, SqlCeOfflineSyncProvider.cs, DataStoreHelper.cs and SqlCeStorageHandler.cs to my Windows Forms project. To minimize my coding effort I make the changes shown in Figure 8.

Figure 8 Changes to Sample Code Made to Minimize Coding Effort

File / Project Change
DefaultScopeEntities.cs Rename the class to SqlCeOfflineEntity in order to match the expected type name in the borrowed files
 

Add

[Microsoft.Samples.Synchronization.ClientServices.KeyAttribute]

in each place where

[System.ComponentModel.DataAnnotations.KeyAttribute()]

exists as it’s used within the CacheController implementation

My new CacheController Project

Replace all namespaces with

namespace Microsoft.Samples.Synchronization.ClientServices.Custom

SqlCeOfflineSyncProvider.cs

Replace

using Microsoft.Samples.Synchronization.ClientServices;

with

using Microsoft.Samples.Synchronization.ClientServices.Custom;

to reference my custom CacheController implementation

SqlCeStorageHandler.cs Comment out all [connection].[transaction commands] from the file: working against SQL Server requires a little different implementation than SQL CE, and this would need to be added back properly for a real implementation
DataStoreHelper.cs Change the connection string to point to my local SQL Server instance
Settings.cs Assign SyncServiceUrl the URI for my Azure Sync Service (https://jofultz.cloudapp.net/DefaultScopeSyncService.svc/)
Utility.cs

Replace

using Microsoft.Samples.Synchronization.ClientServices;

with

using Microsoft.Samples.Synchronization.ClientServices.Custom;

to reference my custom CacheController implementation

By leveraging the sample code and making those changes, I’m able to write a small console app that will call the Utility.Sync function, which in turn instantiates the OfflineSyncProvider and CacheController to perform the synchronization:

var localProvider = new   
  SqlCeOfflineSyncProvider();
var controller = new CacheController(new 
  Uri(Settings.SyncServiceUrl), Settings.  
  SyncScope, localProvider);

So, one might ask, where’s the code to do things like fetch changed records from the local store? All of that lives in the StorageHandler implementation. Take a look at Figure 9 to see a piece of it.

Figure 9 Local Store Data Commands

internal class SqlCeStorageHandler : IDisposable
  {
    #region SQL CE Commands
    private const string GET_ALL_PRIORITY = "SELECT [ID], [Name], [_
      MetadataID] FROM [Priority] WHERE [IsTombstone] = 0";
    private const string GET_ALL_STATUS = "SELECT [ID], [Name], [_
      MetadataID] FROM [Status] WHERE [IsTombstone] = 0";
    private const string GET_ALL_TAGS = "SELECT [ID], [Name], [_
      MetadataID] FROM [Tag] WHERE [IsTombstone] = 0";
    private const string GET_ALL_LISTS =
      "SELECT [ID], [Name], [Description], [UserID], [CreatedDate], 
      [IsTombstone], [_MetadataID] FROM [List] WHERE [IsTombstone] = 0";
    private const string GET_ALL_ITEMS =
      "SELECT ID, ListID, UserID, Name, Description, Priority, Status, 
      StartDate, EndDate, IsTombstone, [_MetadataID] FROM [Item] WHERE 
      [IsTombstone]=0 AND [ListID]=@ListID";
    private const string SELECT_ITEM_CHANGES =
      "SELECT ID, ListID, UserID, Name, Description, Priority, Status, 
      StartDate, EndDate, IsTombstone, [_MetadataID] FROM [Item] WHERE 
      IsDirty = 1";
    private const string SELECT_LIST_CHANGES =
      "SELECT ID, Name, Description, UserID, CreatedDate, IsTombstone, 
      [_MetadataID] FROM [List] WHERE IsDirty = 1";
    private const string SELECT_TAGITEMMAPPING_CHANGES =
      "SELECT TagID, ItemID, UserID, IsTombstone, [_MetadataID] FROM 
      [TagItemMapping] WHERE IsDirty = 1";

Thus, the forward chain of operations works as follows:

  1. Client app calls arbitrary sync function
  2. Sync function
    1. Instantiates OfflineSyncProvider
    2. Instantiates CacheController (this one is custom), passing the service URI and the OfflineSyncProvider
    3. Finally calls CacheController.Refresh()
  3. CacheController creates a CacheRequestHandler that will handle the communication with the Sync Service in Azure
  4. CachController asks the OfflineSyncProvider for the local changeset
  5. The OfflineSyncProvider uses the StorageHandler to retrieve the changes from the local SQL Server
  6. CacheController uses the changeset to create a request and pass it to the CacheRequestHandler
  7. The CacheRequestHandler uses the appropriate formatter (OData ATOM here) to create a proper request and sends it to the Sync Service URI

Of course, all of the unpacking and getting the data back to the client pretty much is just the same stuff in reverse. Figure 4 shows the OData package as it flows back from the service.

Final Thoughts

Obviously, removing transaction support and keeping misnomers such as SqlCe[suffix] for objects isn’t the way to go for any real implementation, but it served its purpose here to get a client version working without writing all-new code. Anyone wanting to create a SQL Server CacheController could easily start with the 6.5 sample and refactor and rename, with the major work coming in the commands inside of the StorageHandler that would need to be specific to their data store.

My primary goal here was to demonstrate a service-based synchronization architecture. I purposefully neglected caching and other optimizations that would need to happen for it to scale, but that’s typically well understood. Also, I wanted to convey what’s there, what isn’t and what’s possible while familiarizing the reader with the Sync Framework 4.0 CTP. I hope I’ve accomplished these things.

With the SQL Azure Data Sync CTP 2 underway, there’s the promise of being able to set all of this up—including the client piece—via configuration and the download of a client-side agent. Of course, that would be for Windows-based machines, but if the goal was to reach a broader set of platforms, using the Sync Framework 4.0 directly might be the better choice.

I want to encourage you to go out and download the latest Sync Framework SDK and at least follow the tutorial to set up the sync service in Azure using a SQL Azure database and follow the example for the Silverlight Client to get a feel for it. For those who are a little braver, grab the files as described from the Windows Mobile 6.5 sample in the 4.0 CTP (there are two projects) and use them to create your own Windows-based synchronization client.          


Joseph Fultz is an architect at the Microsoft Technology Center in Dallas, where he works with both enterprise customers and ISVs designing and prototyping software solutions to meet business and market demands. He’s spoken at events such as Tech·Ed and similar internal training events.

Thanks to the following technical expert for reviewing this article: Ganeshan Iyer