WF4 Versioning Spike: Planning for Change

In my investigation of versioning issues with Windows Workflow Foundation I’ve come to one important conclusion.  This isn’t something new, in fact at TechEd 2008 when I gave my very first “chalk talk” about WF3.5 I said that this issue (versioning) was going to be something that you must plan for.

As architects and developers, we are often so focused on getting to Version 1 we rarely think about what will happen when we need to move to Version 2.  Of course, we know version 2 is going to happen (Unless version 1 is so bad that nobody wants version 2).  So let’s think about this for a moment…

What is this versioning problem?

Let’s think about this problem outside of the context of workflow for a moment. Remember when you first learned about objects?  They told you that objects consist of three elements identity, state and behavior. 

image

Over time you learned that you could make a copy of the state and save it somewhere else (like a database) and then at some later point, load the state back into your object.  When you did this you opened the door to the problem of versioning.

Your object (v1) creates state which we could think of as (s1). 

image

Then at some later time you load the state (s1) back into your object (v1) and everything is great.  But then change happens…
You change your class definition in some way and when you load your state…

image

Ok – well maybe it doesn’t go boom but it could.  Of course it depends on the kind of change you made to the behavior of your object.  State is not just data, it is data with meaning and the meaning it has depends (to some degree) on your intention as the designer of the behavior.  And obviously if you changed the behavior in a very significant way, the state might go boom when you try to load it.

Now let’s consider a Workflow in the same situation.  The biggest difference is that with a workflow you don’t create the instance.  In fact, the only thing you create is the Workflow Definition and you hand that to the Workflow Runtime which creates the Workflow Instance (which you never actually see because it is hidden inside the Workflow Runtime).

image

What happens when you change your workflow runtime and load a persisted instance?

image

Yes.. things go boom.  And unlike the object example there is very little you can do about this one.  The state is created and managed by the workflow runtime.  You don’t have the same opportunity to get in the middle of this and fix it as you do when you write the code that manages the state.

In the next release of .NET we are adding a feature called Dynamic Update which offers one path to fixing up the state so that it does not go boom but here today I’m speaking only of how .NET 4 behaves.

Planning for Change

Now that we know what the problem is we can come up with a way to think about change and the first truth we must consider is that we must plan for change.  We know that change will happen and I am asserting that planned change is always better than unplanned change (at least when it comes to software).

In my previous post on this versioning spike I described two scenarios for change.  The first question we want to ask will help us decide which scenario we are facing.

Can we allow persisted instances to complete?

Side By Side Version Aware Routing

Suppose we have 1000 persisted workflow instances which were created with V1.  We need to decide if these workflow instances can complete using the V1 definition.  I we can allow them to complete then our versioning problem becomes a message routing problem.  What we must do is figure out a way to route messages meant for V1 instances to be routed to V1 instances and all other messages will route to V2 instances. If you want to see an example of doing this check out the AppFabric Reference Implementation: Managing the LifeCycle of a WorkFlow Service

If that solves your problem then GoTo(EndOfBlog) Else ReadOn()

Are the changes breaking?

Maybe we decide that we can’t allow our persisted instances to complete because of a policy change or perhaps there is a bug in our workflow.  If that is the case we need to figure out if there is a way we can make the necessary change without causing the workflow to go boom when we load it.

A Change Is Breaking If…

  1. It causes the workflow to go boom when you load it
  2. It produces an incorrect result

Our official position on the Workflow team is that any change is a breaking change. Why? Because the truth is that there are so many ways your change could break things that we can’t possibly say that any given change is safe. However as you saw in my previous post there are some changes which might work but you can’t know for sure until you test them. You do have tests don’t you?

Sometimes the change is breaking when the logic of the workflow is broken by the change.  Imagine a workflow which loops and accumulates a result.  I can look in the Instance Store and see that I have a persisted instance which is in the middle of that loop.  If I make a change to the way the result is accumulated is that a breaking change?

The answer depends on which iteration of the loop we are in.  If it is the first iteration then the change is not breaking because we have not accumulated a result yet.  If it is any other value then it is not safe to change the expression which calculates the result because part of the accumulated result will have been accumulated under the old behavior and part from the new behavior.  There is no way we can produce a correct result in this scenario.

If The Change Is Breaking

If the change is breaking then there is no safe means to upgrade persisted instances.  That leaves us only one option… Delete and Resubmit.

How do we delete a persisted instance?

First you have to find it.  If you look in the InstancesTable (if you are using SqlWorkflowInstanceStore) you will see a lot of Guids and Binary data.  It is difficult to know which Instance you are looking for unless you have planned for it.  You need to store a human readable identifier of some kind that you can link to the InstanceId.  That way you can find and delete the correct record.

Where should you store this identifier?

You can store it in the InstanceStore or you can store it in another database.  SqlWorkflowInstanceStore provides a mechanism for doing this called PromotedProperties.  This is the good news.  The bad news is that using PromotedProperties is not as easy or simple as it should be.  Fortunately we do have sample code in the WF4 Samples which includes a Property Promotion activity

If you use this technique you can get the InstanceId and whatever other values you need to make a decision and these values will end up in the InstancePromotedProperties table.  Then when the workflow completes these values will get cleaned up like the rest of the instance data.

Of course, you can always store these values in some other database if you like.  Then you have to come up with a way to clear out the values as workflows complete.  I tried both ways and once I got comfortable with PromotedProperties I think I would say that I prefer that approach.

How do you resubmit?

In order to resubmit you have to save the data that started your workflow somewhere.  Once again using PromotedProperties is a pretty good approach for doing this.  If you decide you need to resubmit you simply grab the values that started the workflow and send them in to start a new instance.

But.. be careful.  Your workflow may have already done some work under the old (V1) definition.  Now that work will be repeated under (V2) unless you take some action to prevent it.  Ideally you should make your work idempotent then it won’t matter if it gets done more than once.

To see how I worked through this problem, watch the following video.

Happy Coding!
Ron Jacobs
https://blogs.msdn.com/rjacobs
Twitter: @ronljacobs https://twitter.com/ronljacobs