Monitoring Workflow Manager 1.0

 

Updated: October 26, 2012

To ensure high availability and reliability of Workflow Manager 1.0, it is important to monitor your server to ensure that it is operating in good health and to detect failures as quickly as they occur such that corrective action can be taken. This article will discuss the capabilities available to monitor your Workflow Manager 1.0 environment.

Monitoring a Workflow Manager 1.0 Server

Typical ways to monitor a server include the following:

  1. Performance counters

  2. Event tracing

  3. PowerShell

  4. System Center Operations Manager Management Pack

Performance Counters

Performance counters are great for providing information as to how well the server is performing. Performance counters are grouped by Counter Sets.

Workflow Manager 1.0 emits its own set of Performance Counters to aid you monitor your server. Workflow Manager 1.0 defines two counter sets: Management and Dispatcher. The individual counters are defined under the respective counter set. You can find the performance counters in these counter sets when you open Performance Monitor on a machine with Workflow Manager 1.0 installed. You can then take a look at the "Workflow Management" and "Workflow Dispatcher" counter sets.

The below table summarizes the performance counters available in these two sets

Index Performance Counter Details
1 Management requests per second Number of requests processed by the front-end per second on a given node.
2 Workflow events per second Number of successful PublishNotification calls per second on a given node.
3 Management request failures per second Number of front-end calls per second resulting in an error response to the caller on a given node (per second). The errors could be because of bad requests, authorization errors, or validation errors.
4 Authorization errors per second Number of authorization errors per second on a given node.
5 Publish workflow event duration Average latency of publishing a workflow notification.
6 Episodes outstanding Number of workflow instances executing on a given backend node.
7 Episodes failed per second Number of workflow instance execution errors reported per second on a given backend node.
8 Events processed per second Number of workflow notifications successfully processed per second on a given node.

The following is an example of a health model derived from the above performance counters.

Symptom Source Content: Cause, Resolution, Summary
Node not appearing to be processing any messages. RequestsProcessedPerSecond No activity for 10 minutes.
Workflow instance not appearing to complete (EpisodesCompletedPerSecond / RequestsProcessedPerSecond) * 100 Below N% - N can be user defined; for example, 10.
Workflow Instance Failure RequestsFailedPerSecond Number of failures.

You can also add performance counters from Windows such as CPU and Memory Utilization.

Event Tracing

Workflow Manager 1.0 components use Event Tracing for Windows (ETW) for tracing. ETW is the ideal choice for tracing as it has the least overhead in terms of performance. Also, ETW logs are smaller than logs in other formats. All components of the service use an ETW provider named Microsoft- Workflow.

Workflow Manager 1.0 uses the following ETW channels, which are available by default.

  • Operational Channel: This channel is used for traces reporting critical issues that requires operator involvement. Examples include service faulting or SLA threshold reached.

  • Debug Channel: All diagnostic traces use this channel.

  • Analytic Channel: This channel is used for high value traces, such as the amount of time taken to complete an operation. The events can have additional metadata like scope or operation name.

A complete list of events generated by Workflow Manager 1.0 can be found in the Microsoft.Workflow.EventDefinitions.man ETW Manifest file located in the [InstallDrive]:\Program Files\Workflow Manager\1.0\Workflow folder.

Some of the events of interest in that file that are particularly important to monitoring the health of your server are listed in the table below.

Issue Event IDs emitted
WF backend startup failed 289
Unhandled exception 1, 10, 19
Frequent unhandled exceptions in a particular node 5 events of 1, 10, or 19 within 30 minutes
Frequent Service Started events 5 events of 288 or 582 within 30 minutes

PowerShell Cmdlets

PowerShell is a great way to administrate your Workflow Manager 1.0 server. Workflow Manager 1.0 includes cmdlets that provide you the state of the Workflow farm and its health status. Workflow Manager 1.0 provides administrators with a shortcut to initiate the Workflow PowerShell prompt in the Workflow Manager 1.0 Programs group in the Start menu. You could also invoke these cmdlets programmatically by importing the Workflow Manager 1.0 PowerShell modules. All Workflow Manager 1.0 cmdlets are defined in the Microsoft.Workflow.Commands PowerShell module found in the Workflow Manager 1.0 installation directory.

There are two cmdlets that are particularly useful for server monitoring: Get-WFFarm and Get-WFFarmStatus.

Get-WFFarm

The Get-WFFarm cmdlet is a quick way of retrieving all the details about your Workflow Farm. This cmdlet will return the below information about your farm.

Value Description
Hosts Lists the hosts (or computers) in your farm.
Endpoints Lists both the http and https endpoints on the hosts.
WFFarmDBConnectionString The connection string for the workflow farm database. The workflow farm database contains all of the configuration information for the farm.
RunAsAccount The account under which the workflow backend service is run.
AdminGroup Returns which Windows Authentication Security Group is configured as the Administrators group for the Workflow Farm.
InstanceDBConnectionString The connection string for the Instance database. The Instance database contains instance information of your persisted workflows. It is highly recommended that you do not update any information in this database. This connection string is only used for supplying to other offline cmdlets such as ones used for disaster recovery.
ResourceDBConnectionString The connection string for the Resource database. The Resource database contains your workflow and activity definitions. It is highly recommended that you do not update any information in this database. This connection string is only used for supplying to other offline cmdlets such as ones used for disaster recovery.
HttpPort Lists the Httpport of the Workflow front end if the service is configured with http.
HttpsPort Lists the Httpsport of the Workflow front end.
OutboundCertificate Returns the thumbprint of the outbound certificate. Also returns whether this certificate was autogenerated during installation.
SslCertificate Returns the thumbprint of the SSL certificate. Also returns whether this certificate was autogenerated during installation.

Get-WFFarmStatus

Note

Get-AzureWFFarmStatus is not included in Workflow Manager 1.0, but will be included as part of the 1.0 RTM.

The Get-AzureWFFarmStatus cmdlet provides the basic status of the farm and its nodes.

From each of the nodes, Get-AzureWFFarmStatus will provide the health of the Workflow Backend Windows service and whether the Workflow Front end was reachable on that node or not.

Management Pack

Note

Workflow Manager 1.0 does not include a Management Pack as part of installation, but it will be available for download separately around the time of our 1.0 RTM. This Management Pack will support Microsoft System Center 2012 as well as System Center 2007 R2.

The Performance Counters, Event Traces and PowerShell cmdlets provide insights into the health of the farm. However, true enterprise-class reliability requires not only constant monitoring of the server but also an alerting mechanism that activates when a failure is detected. Microsoft System Center Operations Manager Management Pack provides this alerting capability.

The majority of the events and performance counters covered in this article will be supported in the System Center Management Pack. The management pack will be targeted at monitoring the Workflow Manager 1.0 farm and its nodes, and not particularly targeted at monitoring Workflow Manager 1.0 artifacts such as workflow instances.

The following diagram shows a typical health model for Workflow Manager 1.0.

Workflow health model