Disaster Recovery and Scope Restore in Workflow Manager 1.0

 

Updated: September 17, 2015

This topic provides an overview of disaster recovery options and procedures for Workflow Manager 1.0. It covers procedures for handling database server failures, application server failures, and provides procedures for restoring a corrupted or deleted scope.

  • Disaster Recovery

  • Scope Restore

Disaster Recovery

Workflow Manager 1.0 enables you to prepare for handling disaster scenarios. In the scope of this topic, a disaster is an event that causes serious loss, destruction, hardship, and so on. In the case of a server product, this is any event that results in extended disruption to the availability of the server and may be accompanied by varying degrees of loss of the original cluster (or parts of it) that was set up for the server.

Disaster Recovery vs. High Availability

Usually, Disaster Recovery is conflated with High Availability. High Availability is the problem of making sure that the service is highly available under normal circumstances – including building enough redundancies in the system and eliminating single points of failure. But, Disaster Recovery is the problem wherein the primary service goes down due to extraneous circumstances (such as a natural disaster) and the same level of service must continue to be provided.

High level disaster recovery model

The process of preparing for and responding to a disaster could be broken down into various stages as shown in the following sections. This diagram illustrates each of these stages and calls out the responsibilities the user needs to take versus capabilities provided by Workflow Manager.

Workflow Manager 1.0 Manageability Diagram

Different types of disaster scenarios for Workflow Manager

In the case of Workflow Manager, there could be different disaster scenarios that should be prepared for.

  1. A disaster that results in the loss of one or more databases used by Workflow Manager.

    1. This could have been caused by a hardware failure or a date center-wide disaster event.
  2. A disaster that results in the loss of the application servers where Workflow Manager binaries have been deployed.

  3. A disaster that results in the loss of the entire cluster, where both the app servers and the databases are lost.

As Workflow Manager contains information related to user's workflows, activities and instances, a key part of Workflow Manager Disaster Recovery is the ability to restore a Workflow Manager installation's data by using backup copies. So, in most of these scenarios, disaster recovery is mainly about restoring the data from backups and ensuring that the data is consistent across various subsystems of Workflow Manager.

Preparation for disaster recovery

Disaster Recovery is all about being prepared for an emergency, should one occur. With Workflow Manager, you might want to preserve the data related to all the activities, workflows, and instances even in the event of a disaster.

Workflow Manager stores all of its data in SQL Server databases. So, an important prerequisite to disaster recovery is setting up periodic backups and /or data redundancy solutions so that in the actual event of a disaster that strikes your data center, there is a recent copy of your database you could use for restoring your Workflow Manager installation.

Your Workflow Manager installation uses the following databases.

Database Name Description
WFManagementDB Workflow Manager Farm Management Database
SbManagementDB Service Bus Farm Management Database
WFResourceManagementDB Workflow Manager Resource Management Store
WFInstanceManagementDB Workflow Manager Instance Management Store
SbGatewayDatabase Service Bus Gateway Database
SBMessageContainer01 - n Service Bus Message Container Databases

Depending upon the criticality of the workflow data you have in your Workflow Manager installation, you could choose from various disaster recovery preparation options. Since all of the data of Workflow Manager is stored in the above-mentioned SQL Server databases, any SQL Server based high availability and backup strategy should apply for Workflow Manager as well.

For more information about implementing high availability and disaster recovery for SQL Server, see Selecting a High Availability Solution and Description of disaster recovery options for Microsoft SQL Server.

Note

Regardless of the option you choose for backing up these databases, ensure that these backups are not far apart in time. For example, it is difficult for Workflow Manager to be restored properly if the backups of these individual databases are hours or days apart from each other.

The following diagram lists the different components of a Workflow Manager installation.

Workflow Manager 1.0 Server Farm Administrator Vie

From the server farm administrator's viewpoint, there are two potential parts of Workflow Manager that could break in the event of a disaster: One or more of the databases involved, or one or more of the application server nodes. There could be different combinations of application and database servers that are down, but at a high level, the data tier and the application tier are the failure points.

  • Data Tier

  • Compute Tier (Workflow/Messaging Tier)

Data Tier

Workflow Manager 1.0 stores its data in the following SQL Server databases.

Database Name Description
WFManagementDB Workflow Manager Farm Management Database
SbManagementDB Service Bus Farm Management Database
WFResourceManagementDB Workflow Manager Resource Management Store
WFInstanceManagementDB Workflow Manager Instance Management Store
SbGatewayDatabase Service Bus Gateway Database
SBMessageContainer01 - n Service Bus Message Container Databases

Regarding the data tier, there are three important tasks associated with disaster recovery:

  1. Preparation - making sure that you have the right backup / replication strategy for your databases so that you do not lose data in the event of a disaster that involves your databases.

    In order to recover from a disaster situation, you must be prepared for the disaster. Specifically, concerning recovering from disasters that involve the loss of databases, you must have a way for storing a copy of the data in a different location. Since these databases are standard SQL databases, the recommendation is to use established SQL techniques such as:

    1. SQL Mirroring

    2. SQL Replication

    3. Simple backups as well as a combination of backups and log shipping

    You could choose any of these techniques depending upon the nature of your business and the level of data fidelity you want between your backup and the primary databases.

    In essence, you, as the administrator of your Workflow Manager farm, are expected to create backups of these databases by using an appropriate backup strategy that suits your needs. Workflow Manager does not provide any capability to help in creation of these backup databases.

  2. Restoring the backup databases

    Depending upon your data replication strategy, you would have to use an appropriate restore tool / mechanism to restore your backup databases. There are industry-standard SQL tools and techniques that you could use to restore your SQL databases.

  3. Restoring the Workflow Manager farm

    This step refers to the process of making sure the Workflow Manager farm is restored to a consistent state and can function properly. Workflow Manager provides the necessary PowerShell scripts and guidance to perform this step.

Compute Tier (Workflow/Messaging Tier)

You can create a secondary farm in a secondary location to help in the case of disaster recovery scenarios. You can create the secondary farm either before the disaster or after the disaster. There are three models for consideration.

  1. Cold Standby

    In this model, you could re-create the farm after the disaster has occurred. This impacts the time taken to recover the farm as you would be provisioning new compute nodes and installing Workflow Manager on these nodes afresh.

  2. Warm Standby

    You could typically opt-in to this model if you want to make sure that you have your secondary farm created and tested even before a disaster strikes. In this model, you provision compute nodes in the new farm before the disaster. After establishing the database pairing relationship, you point this new farm to the secondary databases.

    After setting up this new farm, you essentially turn OFF the compute nodes so that they are not running idle. As part of disaster recovery, you have to run the database consistency scripts provided by Workflow Manager.

    Note

    This model assumes that new Service Bus container databases are not being created in the primary after the initial setup is made. If a new Service Bus Container database is created in the primary, then additional steps must be taken during recovery.

  3. Hot Standby

    This is an improvement over Warm Standby where the compute nodes can be on. This would speed up the time taken to recover from disaster.

    Warning

    Hot Standby is not supported by Workflow Manager.

Disaster recovery process

This section describes the actual disaster recovery process used for various disaster scenarios previously explained. At a high level, the recommended process is to restore the required databases from a backup (that you have taken using any of the standard SQL Server techniques) and use the restore cmdlets provided by Workflow Manager to restore your farm.

Note

The following steps describe the process of discarding the previous farm management databases and recreating them.

Process to run the restore commands

  1. Export both the ServiceBus farm certificate with private key and the Service Bus encryption certificate with private key. Import both into the Local Computer\Personal folder of the new server. Also import the root certificate(s) into the Local Computer\Trusted Root Authority folder of the new server. You can identify the farm certificate and the encryption certificate from Get-SBFarm output.

    System_CAPS_noteNote

    The import only works if the old ServiceBus encryption certificates from the old WFM/SB server(s) were either:

    • Auto-generated during the old farm configuration by the Configuration Tool.

    • Or, in case you had used a custom certificate for ServiceBus in the old environment, it needs to be wildcard certificates for your domain, i.e., the “Subject Alternative Name” field in the certificate was created with a value like - *.mydomainname.com.

    If the import of the old ServiceBus certificate is not performed, the Restore-WFFarm cmdlet in the following steps will fail with an error similar to the following.

    Token provider returned message: '<Error><Code>400</Code><Detail>The namespace 'WorkflowDefaultNamespace' does not have a valid issuer that can be used to issue tokens. Add a valid issuer with a valid signature to the namespace.

  2. Open an elevated PowerShell (RunAs Administrator) window on the new machine.

  3. Call the Restore-SBFarm cmdlet passing the following parameters. This cmdlet creates a new Service Bus Farm Management database. The old Service Bus Farm Management database can then be deleted.

    Restore-SBFarm -FarmCertificateThumbprint <String> -GatewayDBConnectionString <String> -SBFarmDBConnectionString <String> [-AdminApiCredentials <PSCredential> ] [-AdminGroup <String> ] [-AmqpPort <Int32> ] [-AmqpsPort <Int32> ] [-EncryptionCertificateThumbprint <String> ] [-FarmDns <String> ] [-Force] [-HttpsPort <Int32> ] [-InternalPortRangeStart <Int32> ] [-MessageBrokerPort <Int32> ] [-RPHttpsPort <Int32> ] [-RunAsAccount <String> ] [-TcpPort <Int32> ] [-TenantApiCredentials <PSCredential> ] [-Confirm] [-WhatIf] [ <CommonParameters>]  
    

    Note

    If you had used custom wildcard certificates in the old ServiceBus configuration and had used two different certificates for FarmCertificate and EncryptionCertificate, you would have to import both of them on each new node and provide the FarmCertificateThumbprint and EncryptionCertificateThumbprint parameters in the above cmdlet accordingly.

    The following snippet shows an example of calling Restore-SBFarm.

    Restore-SBFarm -RunAsAccount 'farm\test' -FarmCertificateThumbprint 41FED42EC87EA556FB64A41572111B96D13FBFC2 -GatewayDBConnectionString 'Data Source=DBServer;Initial Catalog=SbGatewayDatabase;Integrated Security=True;Encrypt=False' -SBFarmDBConnectionString 'Data Source= DBServer;Initial Catalog=SbManagementDB;Integrated Security=True;Encrypt=False' -AdminGroup 'BUILTIN\Administrators' -EncryptionCertificateThumbprint 41FED42EC87EA556FB64A41572111B96D13FBFC2  
    
  4. Call the Restore-SBGateway cmdlet on one of the farm nodes with the following parameters.

    Parameter Description
    SBFarmDBConnectionString Connection string of the Service Bus farm database that is created in the previous step.
    GatewayDBConnectionString Connection string of the restored gateway database.

    The following snippet shows an example of calling Restore-SBGateway.

    Restore-SBGateway -GatewayDBConnectionString 'Data Source= DBServer;Initial Catalog=SbGatewayDatabase;Integrated Security=True;Encrypt=False' -SBFarmDBConnectionString 'Data Source= DBServer;Initial Catalog=SbManagementDB;Integrated Security=True;Encrypt=False'  
    
  5. For each container database, call the Restore-SBMessageContainer cmdlet with the following parameters. Run this cmdlet on one of the farm machines.

    Parameter Description
    SBFarmDBConnectionString Connection string of the Service Bus farm database that is created in the previous step.
    ContainerDBConnectionString Connection string of the container database.
    Id ID of the restored message container.

    Obtain the ID of the restored message container from the gateway database's [dbo].[ContainersTable] table, which contains the IDs, connection strings, database server names and database names of all message containers. Pick the ID of the container whose database name matches the original container database name.

    The following snippet is an example of calling the Restore-SBMessageContainer cmdlet.

    Restore-SBMessageContainer -ContainerDBConnectionString "Data Source=localhost;Initial Catalog=SBMessageContainer01;Integrated Security=SSPI;Asynchronous Processing=True" -SBFarmDBConnectionString "Data Source=localhost;Initial Catalog= SBManagementDB;Integrated Security=SSPI;Asynchronous Processing=True" –id 1  
    
  6. Call the Add-SBHost cmdlet and pass the following parameters.

    Parameter Description
    SBFarmDBConnectionString Connection string of the Service Bus farm database that was created in the previous step.
    CertificateAutoGenerationKey This is the key used for auto generation of SB certificates
    RunAsPassword SecureString that contains the password of the account under which the Service Bus processes run.
    EnableFirewallRules True if the firewall rules of the host should be updated to allow for Service Bus data to traverse the firewall. False otherwise.

    The following example demonstrates how to invoke the cmdlet.

    $myPassword=convertto-securestring 'ereee' -asplaintext -force  
    
    Add-SBHost -EnableFirewallRules $TRUE -RunAsPassword $myPassword -SBFarmDBConnectionString 'Data Source= DBServer;Initial Catalog=SbManagementDB;Integrated Security=True;Encrypt=False'  
    
  7. Call the Restore-WFFarm cmdlet and using the ResourceManagement and Instance Database connection strings.

    The following example demonstrates how to invoke the cmdlet.

    $mykey=convertto-securestring 'etwegff' -asplaintext -force  
    Restore-WFFarm  -RunAsAccount 'farm\test' -InstanceDBConnectionString 'Data Source= DBServer;Initial Catalog=WFInstanceManagementDB;Integrated Security=True;Asynchronous Processing=True;Encrypt=False' -ResourceDBConnectionString 'Data Source= DBServer;Initial Catalog=WFResourceManagementDB;Integrated Security=True;Asynchronous Processing=True;Encrypt=False' -WFFarmDBConnectionString 'Data Source= DBServer;Initial Catalog=WFManagementDB;Integrated Security=True;Encrypt=False' -InstanceStateSyncTime 'Sunday, May 11, 2014 12:30:00 PM' -ConsistencyVerifierLogPath 'c:\log.txt' -CertificateAutoGenerationKey $myKey  
    

    Note

    The InstanceStateSyncTime must follow the exact format specified in the previous example. ConsistencyVerifierLogPath should be the path to a folder where this cmdlet would write logs related to the restore process.

  8. Call the Add-WFHost cmdlet.

    The following example demonstrates how to invoke the cmdlet.

    Add-WFHost -WFFarmDBConnectionString 'Data Source= DBServer;Initial Catalog=WFManagementDB;Integrated Security=True;Asynchronous Processing=True;Encrypt=False' -RunAsPassword $myPassword -EnableFirewallRules $TRUE -CertificateAutoGenerationKey $myKey  
    

Scenario 1 - Disaster affects the entire cluster

In this scenario, the entire cluster is affected because of a disaster. To recover from this disaster scenario, the entire cluster must be rebuilt, using the most recent database backups.

  1. Install Workflow Manager 1.0 on a new machine.

    Note

    Install Workflow Manager 1.0 using the installer but do not start configuring the farm

  2. Restore the backed up primary databases by using SQL Server Restore features. This step varies depending on the choice of your backup solution.

    Only the following database should be restored.

    • WFResourceManagementDB

    • WFInstanceManagementDB

    • SbGatewayDatabase

    • SBMessageContainer*

    Important

    Do not restore the WFManagementDB and SbManagementDb databases as they will be recreated as part of the restore operation.

  3. Follow the steps described in Process to run the restore commands.

Scenario 2 - Disaster affects the SQL Server databases alone

In this case, one or more databases used by Workflow Manager are lost or inaccessible. This could have been caused by a hardware failure or any other disaster localized to the SQL Servers alone.

Note

The steps in this scenario could also be followed to migrate from one data center to another, by transferring the most recent backup of your database to the new data center, and using the process described in this section.

  1. Uninstall Workflow Manager 1.0 from one of the existing app server nodes.

  2. Re-install Workflow Manager 1.0 on the server from the previous step.

    Note

    Install Workflow Manager using the installer but do not start configuring the farm

  3. Restore the backed up primary databases by using SQL Server Restore features. This step varies depending on the choice of your backup solution. You can restore to the existing SQL Server or to a different SQL Server depending on the nature of the disaster.

  4. Follow the steps described in Process to run the restore commands.

When these steps are complete, you will have a farm with one node that uses the existing databases. This farm has been restored with your backup copies of the original databases and this farm has been brought to a consistent state for it to be fully functional.

For each node that was part of the primary farm, perform the following steps.

  1. Uninstall Workflow Manager 1.0.

  2. Reinstall Workflow Manager 1.0.

  3. Run the Add-SBHost and Add-WFHost cmdlets as described in Process to run the restore commands.

Scenario 3 - Disaster affects the Application servers alone

Sometimes, it is possible that only your app servers could have crashed or been lost due to a localized disaster and your database servers are intact. Though this is a rare scenario to happen in a data center, it is fairly easy to recover in the case of such a disaster. Since you haven't lost your databases, you would want to continue with the primary location and add new nodes to this existing farm. If you prefer to move to the secondary location for any reason, you could still copy the databases over to the secondary location and refer to the new databases when performing the recovery steps.

To recover from an application server disaster scenarios, perform the following steps.

  1. Install Workflow Manager 1.0 on a new machine.

  2. Drop the following databases.

    • WFManagementDB

    • SbManagementDB

  3. Follow the procedure described in Process to run the restore commands.

    Note

    If you moved the databases, refer to the new databases when you perform these steps; otherwise refer to the original databases.

When these steps are complete, you have a farm with one node that uses the existing (or moved) databases. If desired, you can add additional nodes to the farm in the same way that you would add more nodes to a workflow manager farm.

Scope Restore

There might be situations, wherein you have accidentally deleted a particular scope or a particular scope's contents are corrupt. You also have a backup of your Workflow Manager databases when the scope's contents were in order. You might want to restore the contents of this scope alone from the backup copy you have.

When you restore a scope, the following contents are restores.

  • Scopes and child scopes along with their configurations

  • Activities within the scope hierarchy being restored

  • Workflows within the scope hierarchy along with their configurations

  • Instances for workflows within the scope hierarchy

    • Instances would continue their execution from their last persistence point.
  • Tracking records corresponding to these instances.

  • Any undelivered messages for workflows within this scope hierarchy

Note

When you delete a scope, all its contents (including instances and tracking records) would be cleaned up within a few minutes (the process is asynchronous).

The following table describes the key terminologies used in a scope restore operation.

Term Description
Backup databases It is assumed that you have taken a backup of all the databases used by Workflow Manager and the scope you are planning on restoring is available in this backup copy. In other words, this database acts as the source database for copying the contents of this scope.
Live Databases This term refers to the current active databases in your Workflow Manager farm. In other words, this is the target database for the scope restore process.
Scope to be restored. You can specify any scope within the scope hierarchy to be restored from a backup database.

Workflow Manager provides capabilities to enable this scenario for you. Here are the steps you have to follow to accomplish a scope restore.

  • Scope restore process

  • Scope restore considerations

Scope restore process

The scope you are wishing to restore should not exist in the live database(s). So, if you are restoring a scope from a backup because your live database contains a corrupt copy of this scope, you must delete this corrupt scope from your live database.

  1. Restore the SQL Databases: The first step is to restore the SQL databases by using the backups as outlined in Restore a Database Backup (SQL Server Management Studio).

    Important

You must restore the backup databases to a different server. Do not overwrite the live databases.

  1. Run the Restore-WFScope PowerShell command by passing the following parameters,

    • Path of the scope to restore

    • Connection string of the backup Resource DB

    • Connection string of the backup Instance DB

    • Provide the time when the backups were created – this can be a rough approximation. If the databases were backed up at different points in time, make sure the oldest of those timestamps is provided as input to this step.

    • Connection string of the Gateway DB

    • Connection strings of one or more container databases. Typically, you would only have one container database. In case your server has multiple container databases, make sure you provide all those connection strings to this cmdlet.

    At this point the scope and contents should be restored in the live database, and the newly restored backup databases can be removed.

Scope restore considerations

While Scope Restore restores your scope from a backed up and restored database(s), you should note the following points about the overall restore process.

  • You can only restore a scope from a previous backup copy of the current live database. In other words, you cannot try to move a particular scope from one Workflow Manager farm to another.

  • Scope Restore only restores the contents belonging to a scope and all its children. It does not restore any contents outside of this scope's child hierarchy.

  • If an activity or a workflow is referencing another activity outside of this scope hierarchy (say, the referenced activity is in a parent scope above the scope being restored), then the referenced activity will not be restored as part of this operation. This means that such workflows would be invalid and any attempts to create an instance of such workflows would result in errors.