May 2009

Volume 24 Number 05

Foundations - Versioning Workflows

By Matthew Milner | May 2009

Code download available

Contents

The Issues
.NET Versioning
Versioning With XOML Workflow Definitions
Activity Versioning
Versioning Workflow Services
Dynamic Update

In any application, there is one thing you can be certain of: change happens. One of the most common issues I find developers struggling with is how to deal with versioning workflows and their related classes. In this month's column, I will discuss the core issues related to workflow versioning and provide my recommendations for making changes to workflow definitions, activities, and workflow services.

fig01.gif

Figure 1 Simple Workflow That May Persist

If you build a .NET application today—say, a Windows Presentation Foundation (WPF) or ASP.NET application—you might think about versioning, or you might not. When you want to deploy updates to an ASP.NET application, you typically have a process in place to simply copy over all or some of the components including pages, binaries, and configuration. In some scenarios, you may have concerns about this deployment, such as making sure no requests are currently executing, but this process is generally manageable. So why is deploying new versions of workflow applications so hard?

Workflows let you model long-running business process or business logic. In addition, Windows Workflow Foundation (WF) provides persistence services that allow for the state of individual instances of a business process to be saved to a durable store, such as a Microsoft SQL Server database. That saved state is made up of serialized .NET objects; and therein lies the problem.

Consider a workflow that uses some .NET types as data—say, an Order class. The workflow is started with an Order object passed in as a parameter, then persisted to the database, its state containing the serialized Order object. Now change the Order object class and rebuild your application library. When that persisted workflow needs to resume because a delay has expired or some event is raised to the workflow, the Order object needs to be deserialized. Unfortunately, because that type has changed, the deserialization fails, throwing an exception.

Some changes cause problems with deserialization because the standard .NET binary serialization fails and other problems occur because of the WF implementation of serialization. Regardless of the cause, the end result is the same: an exception when you try to load a workflow that references types that have changed. These types might be classes or interfaces you create or they might be custom activities you have written.

To provide a concrete example, consider the workflow in Figure 1. It simply writes some data to the console using a custom WriteLine activity, delays (which also allows it to persist), and then writes more data to the console.

Now, if I start an instance of this workflow and let it run to the idle point—the delay—it will persist in the database. I can shut down the runtime and begin work on my version 1.1 of the workflow. I'll simply add an activity to the end of the workflow definition to write more data to the console, as shown in Figure 2.

Starting up the runtime without starting another workflow instance will cause the persistence service to poll the database looking for expired timers. When it finds the existing instance, the persistence service will attempt to load the object and will get an exception. I can detect this error by registering an event handler for the ServicesExceptionNotHandled event on WorkflowRuntime. What I'll see is that the persistence service will raise an event for the exception with the message "Index was outside the bounds of the array."

fig02.gif

Figure 2 Modified Workflow Definition

The problem here is that I've changed the definition of my workflow type and it no longer matches the definition that existed when the workflow was persisted. When the persistence service tries to deserialize the workflow, I get exceptions because the new type definition contains fields that did not exist when the instance was serialized. The deserialization code is reading the fields on the new type and attempting to load those values from the serialized data. Because some fields did not exist when the instance was serialized, an exception is raised when attempting to access that data.

.NET Versioning

One way to work around the versioning issues is to take advantage of the versioning system built into the .NET Framework. Since version 1.0 of the .NET Framework, versioning of assemblies has been supported and was the only way to enable side-by-side support for multiple versions of assemblies. In short, each assembly can have a version number attached to it and when assemblies are signed with a key, multiple versions of the assembly can be deployed to the Global Assembly Cache (GAC). For more on assembly versions and strong naming assemblies, see the MSDN documentation on assemblies.

This enables you to build your workflows and activities into projects with a version of 1.0, sign them, and deploy them to the GAC. Now, when you start a workflow and it persists, the version information is part of the serialized state. If you change the workflow, it is important to change the version of the assembly and any other assemblies the workflow depends on if those have changed, too. For example, if you had the workflow from the previous example and you made a change to the WriteLine activity and the workflow definition, then you would need to increment the version number for each project and deploy the updates to the GAC.

At this point, you would have both version 1.0 and version 1.1 of the workflow deployed into the GAC. When a version 1.0 workflow is loaded from the persistence store, the runtime will be able to resolve the type definition for version 1.0 and use that type information when deserializing the workflow. Any version 1.1 workflows that were also persisted would work since the runtime would be able to find the definition for the version 1.1 types, too. Essentially, as long as the assemblies containing the types for the workflow and its related classes can be resolved and match the original definition, the workflow will deserialize correctly and processing can continue. You can see how this process works in Figure 3where the different versions are loaded from the database, and their types resolve correctly when found in the GAC.

fig03.gif

Figure 3 Side by Side .NET Versioning

One important point to notice is that version 1.0 workflows will continue working and processing, but they are still based on the version 1.0 definition of the workflow. This means any changes you introduced in version 1.1 of the workflow will not be present in the workflow definition. In other words, any activities you added, removed, or otherwise changed will not be present in the workflow definition and you will not see the impact of those changes in the version 1.0 workflows. Shortly, I will discuss a feature known as dynamic update that provides the ability to change the existing, in-process, version 1.0 workflows.

Simply having side-by-side versions of the workflows and related assemblies does not mean that things roll along nicely, however. Your host application is likely built against a particular, and probably the most recent, version of the workflows so that it can interact with them. This can cause problems as a host built against version 1.1 of the workflow will encounter problems when attempting to interact with persisted version 1.0 workflows that were built against version 1.0 of the related classes.

One area where this trouble often comes up for developers is when using ExternalDataExchangeService in conjunction with HandleExternalEvent and CallExternalMethod activities for local communications. Many people, thinking they are doing the right thing for versioning, follow the previous steps so that both the 1.0 and 1.1 versions are available. However, when the host attempts to send data to the workflow through an event, the type of the interface being used no longer matches. The host is using version 1.1 of the interface, while the workflow instance was built against version 1.0.

To fully understand why this is a problem, recall how communications really work in WF. (For more on workflow communications see the September 2007 installment of Foundations.) The HandleExternalEvent activity creates a queue to receive data and the name of that queue includes the type information for the interface. When data is sent from the host, it must be sent to the correct queue. ExternalDataExchangeService (EDS) uses the type information and the event name to create what it believes is the appropriate queue name. Unfortunately, since the queue was created based on version 1.0 of the interface and the EDS is creating the queue name based on version 1.1, the two IComparable objects will not match and the queue will not be found.

To fully work around this problem, your host application needs to raise events on the version 1.0 interface. Two possible ways to accomplish this include reflection and inheritance. The first option involves using reflection to load the original version of your local service and add it to the EDS, after which you raise an event on that interface, again using reflection.

The second option involves building a new assembly where you create version 1.1 of your interface and have it derive from the version 1.0 interface, allowing you to add events to the derived type while not changing the base type. In this way, if you raise an event that was defined on version 1.0 of the interface, the correct queue name will get created and the message delivered even if you invoke that operation through the derived type. The key is that version 1.0 of the interface must remain deployed and unchanged, and your new interface and service must derive from the version 1.0 interface. Figure 4shows this approach.

fig04.gif

Figure 4 Derived Interfaces for Local Communications

Notice that only the derived service and interface are added to the EDS. This enables the host to interact with a single service but to raise events based on multiple interfaces. The goal is to keep the version 1.0 interface around and in use, so the EDS creates queue names that match what the workflows will be creating. Also notice that this involves creating a whole new project rather than just creating a new version of the communication library. Untenable is the word that comes to mind when thinking about using this approach long term.

Generally, my recommendation around communications is to avoid the local communications activities for this very reason and use custom activities that create queues based on simpler names, such as strings, removing much of the type dependence between the workflow and the host. The other suggestion is to use workflow services that provide additional capabilities around receiving requests and routing them to the correct version of the workflow, as I will discuss shortly.

Versioning With XOML Workflow Definitions

The first step to making life easier when it comes to versioning is to use XOML-based workflows without compiling them. This truly declarative model provides many benefits in addition to versioning. See the May 2008 issue of MSDN Magazinefor details (" Loading Workflow Models in WF"). I will focus on the benefits for versioning here.

When building workflows via a code approach, you are defining a new type. For example, if you create a new code workflow using the Visual Studio project or item template, you are defining a new class that derives from a type in the WF assemblies. This is the primary reason for the issues described previously because when you change the workflow by adding or removing activities, you are changing the type. This does not happen when you build XOML workflows.

A XOML workflow that is not compiled is simply an XML document that describes a collection of existing types. This may seem like a subtle difference, but describing existing types instead of creating a new type makes a big difference.

Here is a XOML workflow definition:

<SequentialWorkflowActivity x:Name="Workflow2" xmlns:ns0="clr-namespace:SimpleWorkflows" xmlns:x="https://schemas.microsoft.com/winfx/2006/xaml" xmlns="https://schemas.microsoft.com/winfx/2006/xaml/workflow"> <ns0:WriteLineActivity x:Name="Hello" OutputText="Hello" /> <DelayActivity TimeoutDuration="00:00:03" x:Name="SmallDelay" /> <ns0:WriteLineActivity x:Name="World" OutputText="World" /> </SequentialWorkflowActivity>

You can see that the root element is SequentialWorkflowActivity, which is defined in the System.Workflow.Activities assembly. This type is both versioned, as 3.0, and nonchanging. If you build a workflow using this type as your root activity and the workflow is serialized and persisted, you can be assured that, when the workflow is loaded and deserialized, the runtime will be able to find version 3.0 of the SequentialWorkflowActivity in the GAC.

What about changing the workflow definition by adding or removing activities? In a XOML workflow definition, you are simply changing some XML, which does not change any types. Even if you change the root element in the workflow definition to some other type, you are not changing a type on which the existing workflow instances depend. In fact, once a workflow is created from the XOML, that XOML would not be referenced again by the runtime for the life of that instance.

Using XOML-based workflows means that you do not have to version your workflows because you are not defining types. Because you do not have to strongly name and version your workflows, you also do not have to strongly name and version the other types, such as local communications interfaces and business entities. It is much easier to make nonbreaking changes, such as adding members or methods to types, without much work in your application.

The downside to working with XOML workflows is that you have little to no access to code, other than what is encapsulated in your activities, so you end up writing more activities. But since activities are simply classes, this is not a major blocker and allows you to capture methods that can be used to provide event handling logic or condition checking for activities in a XOML workflow.

Additionally, since the SequentialWorkflowActivity class does not have any properties of use to your business case defined on it, you won't be able to pass parameters to the CreateWorkflow method. Fortunately, you can either create your own root activity type, deriving from one of the included workflow types, and add your required properties there, or begin your workflow with a receive activity or custom activity so you can send in initial data right after starting the workflow.

Finally, looking forward, fully declarative workflows based on XAML are the primary modeling option coming in WF 4.0. Many improvements are being made to the capabilities, including replacing events with what are currently being called an ActivityAction. Essentially, these will provide you with a way to model delegates using activities and have the activity execute those activities rather than invoke a delegate.

Activity Versioning

Workflows are activities, so it follows that many of the issues discussed so far with workflow types apply to activity types as well. In fact, just as a workflow type causes problems if you add activities, if you add properties to an activity type and do not version that activity assembly, you will get exceptions upon deserialization of the workflow. Unfortunately, activities need to be classes and must be compiled types, so there is no easy XOML solution as with workflows.

The short answer for activities is that you should version them using .NET versioning techniques when you change the interface for the activity in any way. This enables existing workflows and new workflows to reference the correct version and execute correctly. Keep in mind, however, that if you are fixing a defect in a method within the activity that does not change the interface, you can make the fix and deploy the assembly with the same version number. When that activity is loaded and executed by existing or new workflows, the correct code will run.

The long answer involves a deep understanding of the serialization mechanics in the .NET Framework, which is beyond the scope of this article. However there are two keys things I can share that may provide the most benefit with the least amount of investment.

First, if you are adding properties or fields to your activity, you can mark the fields with the NonSerialized attribute to avoid the Index Out of Bounds exception upon deserialization of workflows built against the previous version of your activity.

Second, you can override the OnActivityExecutionContextLoad method to initialize any state that was not automatically deserialized. This method gets called when the activity is recreated from persisted state or when a new execution context is created. Both of these techniques together can help you make changes to activities without requiring that you strongly name them and change version numbers.

Versioning Workflow Services

Workflow services provide similar challenges for local communications as well as additional opportunities for decoupling messages from workflows. Because a workflow service is built using a service contract, any versioning of the service or data contracts means the workflow will need to be rebuilt against the new versions of the contracts. When a message arrives at a service endpoint, it will be received by a service using a contract that has a particular version. If the workflow for which the message is intended was built against the same version of the contract, then things work well. However, if the workflow was built against the earlier version of the contract and then persisted, then the message cannot be delivered. To support two versions of the contracts, you will need to deploy two versions of the service.

While it is unfortunate that you cannot take full advantage of the versioning capabilities inherent in Windows Communication Foundation (WCF) that would enable a new version of the service to process messages from clients built against the old version, WCF does provide a mechanism to manage this scenario. At the level of WCF, clients and services are simply exchanging messages and .NET types do not cross the wire. The versioning issues I am speaking about are purely at the .NET level and can be hidden from clients using a messaging router.

The benefit with workflow services is that it is possible to publish two services side by side, with different endpoint addresses, yet publish a façade that makes both services accessible at the same address. Using this approach, when you deploy version 1.0 of your service, you can deploy it behind a pass-through WCF router that simply takes in messages and passes them on to the service. When you are ready to deploy version 1.1 of the service, you can update the routing information so that messages get sent to either the version 1.0 service or the version 1.1 service, depending on the message or the context in which it was received. Figure 5shows this router concept with the two versions of a service behind a router and a client interacting with them.

fig06.gif

Figure 5 Using a Router for Versioning Services

The key to ensuring that a solution with a router will work is to deploy it with version 1.0 of your service. Then when you move to the next version, the router is not something new you are adding to the deployment and testing process. The .NET Framework 3.5 SDK provides a sample router, and you can also follow a hands-on article about using WF rules to make routing decisions.

Dynamic Update

After all the discussion about how to best manage changing the definition of workflows and dealing with new and old versions running side-by-side, there is often a question about how to handle those old workflows. You made a change to your workflow for a reason. Perhaps it was to fix a defect in logic, or maybe you had compliance regulations that needed to be followed. Whatever the reason, it is often the case that allowing all of the existing workflows to complete as they were defined is not a tenable outcome. In those cases, you might consider taking advantage of a powerful feature in WF: dynamic update.

In short, dynamic update allows you to make changes to a running workflow instance. Those changes can consist of adding or removing activities from the workflow structure. That is all you can do, but it can completely change the definition of your workflow. For example, in a state machine, you have the ability to add entire states, along with the events they listen to and the transitions defined. You can change state transitions by removing and re-adding SetState activities. In a sequential workflow, you can add or remove steps as your business process changes.

To make changes to a workflow, you first need to create a WorkflowChanges object based on the workflow. At that point, the WorkflowChanges object provides a clone of the workflow structure that you can manipulate through its TransientWorkflow property. You make changes to the clone by adding and removing activities. When your changes are complete, you call the ApplyChanges method on the WorkflowInstance class, passing the WorkflowChanges to represent the changes you want to apply to the actual instance.

Here is the example workflow shown earlier, which contained two WriteLine activities separated by a delay activity, with an additional WriteLine activity added to the end of the sequence. This code runs when the workflow goes idle, or when the delay begins execution.

WorkflowChanges changes = new WorkflowChanges(instance.GetWorkflowDefinition()); CompositeActivity root = changes.TransientWorkflow; WriteLineActivity wl = new WriteLineActivity { Name = "newWriteLine", OutputText = "dynamically added" }; root.Activities.Add(wl); instance.ApplyWorkflowChanges(changes);

This simple example shows what is possible, but much more advanced scenarios can be built to apply many different changes to a running workflow. This allows you to take a version 1.0 workflow that has already been started and make changes to the instance so that it works more like a version1.1 workflow.

Send your questions and comments to mmnet30@microsoft.com.

Matt Milner is a member of the technical staff at Pluralsight, where he focuses on connected systems technologies (WCF, Windows WF, BizTalk, "Dublin" and the Azure Services Platform). Matt is also an independent consultant specializing in Microsoft .NET application design and development. Matt regularly shares his love of technology by speaking at local, regional, and international conferences such as Tech Ed. Microsoft has recognized Matt as an MVP for his community contributions around connected systems technology.