April 2012

Volume 27 Number 04

Microsoft Office - Integrating Windows Workflow Foundation with the OpenXML SDK

By Rick Spiewak | April 2012

Business processes that involve a workflow often require that corresponding documents be created or processed. This can occur, for example, when an application (for a loan, an insurance policy, redemption of shares and so forth) has been approved or rejected during the workflow process. This might be driven by a program (automatically) or by an underwriter (as a manual step). In that case, a letter might need to be written, or a spreadsheet showing balances produced.

In the June 2008 issue of MSDN Magazine, I showed how to accomplish this using the Microsoft Office application object models (msdn.microsoft.com/magazine/cc534981). Using that article as a basis, this time I’ll show you how to integrate Microsoft Office-compatible documents with Windows Workflow Foundation (WF) without having to interact directly with Office applications. This is achieved by using the OpenXML SDK 2.0 to manipulate word-processing and spreadsheet document types. The corresponding Office applications are, of course, Word and Excel. While each of these has its own OpenXML document model, there are enough similarities to allow the use of a set of interface classes that hide most of the differences for the purpose of workflow integration.

Because WF is included in the Microsoft .NET Framework 4 Client Profile, any .NET Framework 4 installation will include the WF library. And because its use has been greatly simplified in the .NET Framework 4 as compared with the .NET Framework 3.0, any application that requires even basic workflow functions should consider using this in place of custom code. This applies even to cases where the built-in activities need to be supplemented by custom activities.

I’ll show how a custom activity can provide data entry based on an Office document that serves as a prototype. The data entry activity, in turn, passes data to activities for each document type, and the activities use these data fields for filling in target Office documents. I used Visual Studio 2010 to develop classes to support operations such as enumerating named fields, extracting their contents and filling in documents from prototypes. These classes all use the OpenXML SDK rather than directly manipulating Office application object models. I created workflow activities to support data entry and filling in the word processing and spreadsheet document types. Finished documents are displayed by simply invoking the default application for the document type. I wrote the code using Visual Basic.

Overall Design

Any design for integrating workflow and Office has three general requirements: getting data into the workflow; processing the data to create or update Office documents; storing or passing on the output documents. To support these needs, I created a set of classes that provide uniform interfaces to the underlying Office document formats using the OpenXML SDK. These classes provide methods to:

  • Get the names of the potential target fields in the documents. I’ll use the generic term “field” to describe these. In the case of Word, these are bookmarks; Excel uses named ranges. In the most general case, both bookmarks and named ranges can be quite complex, but I’ll use the simple case of a single location for bookmarks and single cells for named ranges. Bookmarks always contain text, whereas spreadsheet cells can contain text, numbers or dates.
  • Fill in target fields in a document by accepting input data and matching the data to the document.

To implement activities to fill in the Office documents, I used the WF 4 CodeActivity model. This model is greatly simplified over WF 3.0 and provides a much clearer implementation. For example, it’s no longer necessary to explicitly declare dependency properties.

The Supporting Cast

Behind the workflow stands a set of classes designed to support the functions of the OpenXML SDK. The classes perform the key functions of loading the prototype document into a memory stream, finding and filling in the fields (bookmarks or named ranges), and saving the resulting output document. Common functions are collected into a base class, including loading and saving the memory stream. I’ve omitted error checking here for clarity, but it’s included in the accompanying code download. Figure 1 shows how to load and save an OpenXML Document.

Figure 1 Loading and Saving an OpenXML Document

Public Sub LoadDocumentToMemoryStream(ByVal documentPath As String)
  Dim Bytes() As Byte = File.ReadAllBytes(documentPath)
  DocumentStream = New MemoryStream()
  DocumentStream.Write(Bytes, 0, Bytes.Length)
End Sub
Public Sub SaveDocument()
  Dim Directory As String = Path.GetDirectoryName(Me.SaveAs)
  Using OutputStream As New FileStream(Me.SaveAs, FileMode.Create)
    DocumentStream.WriteTo(OutputStream)
    OutputStream.Close()
    DocumentStream.Close()
  End Using
End Sub

Getting Data into the Workflow

One of the first challenges I faced in integrating Office documents into workflows was how to pass data destined for fields in these documents. The standard workflow structure relies on advance knowledge of the names of variables associated with activities. Variables can be defined with different scopes to provide visibility into the workflow, and into other activities. I decided that applying this model directly was too rigid, as it required tying the overall workflow design too tightly to the document fields. In the case of Office integration, the workflow activity is acting as a proxy for an Office document. It isn’t realistic to try to predetermine the names of the fields in the document, as this would require matching a custom activity to a particular document.

If you look at the way arguments are passed to activities, you’ll find they’re passed as a Dictionary(Of String, Object). To fill in fields in an Office document, you need two pieces of information: the name of the field and the value to insert. I’ve developed applications using workflow products and Office before, and the general strategy I adopted worked well: I enumerate the named fields in the document and match them to input parameters by name. If the document fields match the pattern in the basic input parameter dictionary (String, Object), they can be passed in directly. However, this approach couples the workflow too tightly to the document.

Rather than naming variables to correspond with fields in the document, I decided to use a generic Dictionary(Of String, String) to convey these field names. I named this parameter Fields, and used it in each of the activities. Each entry in such a dictionary is of type KeyValuePair(Of String, String). The key is used to match the field name; the value is used to fill in the field contents. This Fields dictionary is then one of the parameters inside the workflow.

You can start the workflow with just a few lines of code in a simple Windows Presentation Foundation (WPF) window, and even fewer when added to an existing application:

Imports OfficeWorkflow
Imports System.Activities
Class MainWindow
  Public Sub New()
    InitializeComponent()
    WorkflowInvoker.Invoke(New Workflow2)
    MessageBox.Show("Workflow Completed")
    Me.Close()
  End Sub
End Class

I wanted the activities to be generally useful, and to have more than one strategy available for providing the input document. To allow for this, the activities have a common parameter named InputDocument. These can be attached to variables that, in turn, are connected to outputs of other activities as the needs of the workflow dictate. The parameter contains the path to an input document used as a prototype. However, the code also provides for using a Field parameter whose name is InputDocument if it contains a path to a document type suitable for the target Office application. An additional parameter allows the target activity to be named in an input field named TargetActivity. This allows for multiple activities in a sequence; for example, to evaluate the fields in the original input document for applicability.

Any real workflow will have a source of input data. For demonstration purposes, I used a DataEntry activity, which can draw its input (the fields and default values) from any of the supported Office document types. This activity opens a dialog box containing a DataGrid and buttons for selecting a document and saving the data fields. After the user selects a document as a prototype, the DataGrid is filled in with the available fields from the document, plus the InputDocument, OutputDocument and TargetActivity fields, as shown in Figure 2. (Incidentally, Julie Lerman’s April 2011 Data Points column, “Composing WPF DataGrid Column Templates for a Better User Experience,” which you’ll find at msdn.microsoft.com/magazine/gg983481, has an important tip on the use of FocusManager to enable single-click editing in a grid such as the one in Figure 2.)

The Data Entry Activity Interface
Figure 2 The Data Entry Activity Interface

Processing the Data into the Documents

As I noted earlier, each of the Office document types has its own way of providing a collection of named fields. Each of the activities is written to support a particular document type, but the activities that support the Office document types all follow the same pattern. If the InputDocument property is provided, it’s used as the path to the document. If the InputDocument property is null, the activity looks in the Fields property for an InputDocument value that, if found, is examined to see whether it contains a path with a suffix matching the document type the activity handles. The activity also attempts to find a document by appending a suitable suffix. If these conditions are met, the InputDocument property is set to this value, and processing proceeds.

Each matching entry from the Fields collection is used to fill in the corresponding field in the output document. This is either passed in as the corresponding workflow variable (OutputDocument), or is found in the Fields collection as the OutputDocument KeyValuePair entry. In each case, if the output document has no suffix, an appropriate default suffix is appended. This allows the same value to potentially be used to create different document types, or even multiple documents of different types.

The output document will be stored at the path specified. In most real-world workflow environments, this would be a network share or a SharePoint folder. For simplicity, I used a local path in the code. Also, the code for the Word and Excel activities checks the input document for the corresponding type. In the demonstration workflow, the input document defaults to the prototype selected as the basis for the Fields. The user can change this, so that Fields derived from a Word document can be used to define inputs for an Excel document, or vice versa. Figure 3 shows the code for filling in a Word document.

Figure 3 Filling in a Word-Processing Document in the Activity

Protected Overrides Sub Execute(ByVal context As CodeActivityContext)
  InputFields = Fields.Get(Of Dictionary(Of String, String))(context)
  InputDocumentName = InputDocument.Get(Of String)(context)
  If String.IsNullOrEmpty(InputDocumentName) Then InputDocumentName = _
    InputFields("InputDocument")
  OutputDocumentName = OutputDocument.Get(Of String)(context)
  If String.IsNullOrEmpty(OutputDocumentName) Then OutputDocumentName = _
    InputFields("OutputDocument")
  ' Test to see if this is the right activity to process the input document
  InputFields.TryGetValue(("TargetActivity"), TargetActivityName)
  ' If there is no explicit target, see if the document is the right type
  If String.IsNullOrEmpty(TargetActivityName) Then
    If Not (InputDocumentName.EndsWith(".docx") _
    Or InputDocumentName.EndsWith(".docm")) _
    Then Exit Sub
    'If this is the Target Activity, fix the document name as needed
  ElseIf TargetActivityName = Me.DisplayName Then
    If Not (InputDocumentName.EndsWith(".docx") _
    Or InputDocumentName.EndsWith(".docm")) _
      Then InputDocumentName &= ".docx"
    End If
  Else
    Exit Sub
  End If
  ' This is the target activity, or there is no explicit target and
  ' the input document is a Word document
  Dim InputWordInterface = New WordInterface(InputDocumentName, InputFields)
  If Not (OutputDocumentName.EndsWith(".docx") _
  Or OutputDocumentName.EndsWith(".docm")) _
    Then OutputDocumentName &= ".docx"
  InputWordInterface.SaveAs = OutputDocumentName
  Dim Result As Boolean = InputWordInterface.FillInDocument()
  ' Display the resulting document
  System.Diagnostics.Process.Start(OutputDocumentName)
End Sub

In Figure 3, note in particular the following line:

Dim InputWordInterface = _
  New WordInterface(InputDocumentName, InputFields))

This is where the WordInterface class instance is constructed. It’s passed the path to the document to use as a prototype, along with the field data. These are simply stored in the corresponding properties for use by the class’s methods.

The WordInterface class provides the function that fills in the target document. The input document is used as a prototype, from which an in-memory copy of the underlying OpenXML document is created. This is an important step—the in-memory copy is the one that’s filled in and then saved as the output file. Creation of the in-memory copy is the same for each document type and is handled in the base OfficeInterface class. However, saving the output file is more specific to each type. Figure 4 shows how the Word document is filled in by the WordInterface class.

Figure 4 Using the WordInterface Class to Fill in a Word Document

Public Overrides Function FillInDocument() As Boolean
  Dim Status As Boolean = False
  ' Assign a reference to the existing document body.
  Dim DocumentBody As Body = WordDocument.MainDocumentPart.Document.Body
  GetBuiltInDocumentProperties()
  Dim BookMarks As Dictionary(Of String, String) = Me.GetFieldNames(DocumentBody)
  ' Determine dictionary variables to use -
    based on bookmarks in the document matching Fields entries
  If BookMarks.Count > 0 Then
    For Each item As KeyValuePair(Of String, String) In BookMarks
      Dim BookMarkName As String = item.Key
      If Me.Fields.ContainsKey(BookMarkName) Then
        SetBookMarkValueByElement(DocumentBody, BookMarkName, Fields(BookMarkName))
      Else
        ' Handle special case(s)
        Select Case item.Key
          Case "FullName"
            SetBookMarkValueByElement(DocumentBody, _
            BookMarkName, GetFullName(Fields))
        End Select
      End If
    Next
    Status = True
  Else
    Status = False
  End If
  If String.IsNullOrEmpty(Me.SaveAs) Then Return Status
  Me.SaveDocument(WordDocument)
  Return Status
End Function

I added a special case Field name called FullName. If the document contains a field with this name, I concatenate input fields called Title, FirstName and LastName to fill it in. The logic for this is in a function called GetFullName. Because all the Office document types have similar needs, this function is in the base OfficeInterface class along with some other common properties. I’ve used a Select Case statement to make this an extensibility point. You could, for example, add a FullAddress field that concatenates input fields called Address1, Address2, City, State, ZipCode and Country.

Storing the Output Documents

Each of the activity classes has an OutputDocument property, which can be set by several means. Within the designer, a property can be bound to a workflow-level parameter or to a constant value. At run time, each of the activities will look in its OutputDocument property for the path to save its document. If this isn’t set, it will look within its Fields collection for a key named OutputDocument. If this value ends with an appropriate suffix, it’s used directly. If it doesn’t have an appropriate suffix, one is added. The activity then saves the output document. This allows maximum flexibility in deciding where to put the path to the output document. Because the suffix is omitted, the same value can be used by either of the activity types. Here’s how a Word document is saved, first ensuring that the memory stream is updated and then using the method in the base class:

Public Sub SaveDocument(ByVal document As WordprocessingDocument)
  document.MainDocumentPart.Document.Save()
  MyBase.SaveDocument()
End Sub

Sample Workflows

Figure 5 shows the simple workflow I’ll use to show how the integration works. A second example uses a flowchart, shown in Figure 6. I’ll go through each of the activity types in turn and talk about what they do, but first let’s see what each workflow does.

Simple Workflow with Office Integration
Figure 5 Simple Workflow with Office Integration

Flowchart Workflow with Office Integration
Figure 6 Flowchart Workflow with Office Integration

The workflow consists of a simple sequence that invokes each activity type in turn. Because the Word and Excel activities check the input document types, they won’t try to process the wrong type.

The flowchart workflow in Figure 6 uses a Switch activity to decide which Office activity should be invoked. It uses a simple expression to make this determination:

Right(Fields("InputDocument"), 4)

The cases docm and docx are directed to Word, while xlsx and xlsm are directed to Excel.

I’ll use Figure 5 to describe the actions of each activity, but the action in Figure 6 is similar.

Data Entry

At the top of Figure 5, you can see an instance of the DataEntryActivity class. The DataEntryActivity presents a WPF window with a DataGrid control, which is populated by extracting the named fields from the InputDocument, which in this case is selected by the user. The control allows the user to select a document, regardless of whether one was initially provided. The user can then enter or modify the values in the fields. A separate ObservableCollection class is provided to enable the required TwoWay binding to the DataGrid, as shown in Figure 7.

Figure 7 ObservableCollection for Displaying Fields

Imports System.Collections.ObjectModel
' ObservableCollection of Fields for display in WPF
Public Class WorkflowFields
  Inherits ObservableCollection(Of WorkFlowField)
  'Create the ObservableCollection from an input Dictionary
  Public Sub New(ByVal data As Dictionary(Of String, String))
    For Each Item As KeyValuePair(Of String, String) In data
      Me.Add(New WorkFlowField(Item))
    Next
  End Sub
  Public Function ContainsKey(ByVal key As String) As Boolean
    For Each item As WorkFlowField In Me.Items
      If item.Key = key Then Return True
    Next
    Return False
  End Function
  Public Shadows Function Item(ByVal key As String) As WorkFlowField
    For Each Field As WorkFlowField In Me.Items
      If Field.Key = key Then Return Field
    Next
    Return Nothing
  End Function
End Class
Public Class WorkFlowField
  Public Sub New(ByVal item As KeyValuePair(Of String, String))
    Me.Key = item.Key
    Me.Value = item.Value
  End Sub
  Property Key As String
  Property Value As String
End Class

The InputDocument can be either of the supported Office document types, Word (.docx or .docm) or Excel (.xlsx or .xlsm). To extract the fields from the document, the appropriate OfficeInterface class is called. It loads the target document as an OpenXML object and enumerates the fields (and their contents, if present). The document formats that contain macros are supported, and the macros are carried over to the output document.

One of the fields provided by the DataEntryActivity is the TargetActivity field. This is just the name of an activity whose Fields property is to be populated with the fields collected from the input document. The TargetActivity field is used by the other activities as a way to determine whether to process the data fields.

WordFormActivity

The WordFormActivity operates on a word-processing (Word) document. It matches the Fields entries to any bookmarks in the Word document with the same names. It then inserts the value of the field into the Word bookmark. This activity will accept Word documents with (.docm) or without (.docx) macros.

ExcelFormActivity

The ExcelFormActivity operates on a spreadsheet (Excel) document. It matches the Fields entries to any simple named ranges in the Excel document with the same names. It then inserts the value of the field into the Excel named range. This activity will accept Excel documents with (.xlsm) or without (.xlsx) macros.

Excel document types have some additional special features that require special handling if filled-in data is to be formatted and handled correctly. One of these features is the variety of implicit date and time formats. Fortunately, these are well-documented (see ECMA-376 Part 1, 18.8.30 at bit.ly/fUUJ). When an ECMA format is encountered, it needs to be translated to the corresponding .NET format. For example, the ECMA format mm-dd-yy becomes M/dd/yyy.

In addition, spreadsheets have a concept of shared strings, and special handling is required when inserting a value into a cell that is to contain a shared string. The InsertSharedStringItem method used here is derived from the one contained in the OpenXML SDK:

If TargetCell.DataType.HasValue Then
  Select Case TargetCell.DataType.Value
    Case CellValues.SharedString    ' Handle case of a shared string data type
      ' Insert the text into the SharedStringTablePart.
      Dim index As Integer = InsertSharedStringItem(NewValue, SpreadSheetDocument)
      TargetCell.CellValue.Text = index.ToString
      Status = True
    End Select
End If

Finishing Touches

The example workflow simply announces its own completion. The activity that’s selected, either by document type or by the TargetActivity field, creates its output document in the specified location. From here it can be picked up by other activities for additional processing. For demonstration purposes, each of the activities ends by launching the output document, relying on Windows to open it in the appropriate application:

System.Diagnostics.Process.Start(OutputDocumentName)

If you want to print instead, you can use the following:

Dim StartInfo As New System.Diagnostics.ProcessStartInfo( _
  OutputDocumentName) StartInfo.Verb = "print"
System.Diagnostics.Process.Start(StartInfo)

Because we’re working with only the document format, there’s still no necessity for the workflow application to be aware of the Office version installed!

In a real production environment, more work would usually follow. Database entries might be made, and documents might end up being routed via e-mail or printed and sent to a customer. Production workflow applications would also be likely to take advantage of other services, such as persistence and tracking.

Wrapping Up

I’ve outlined a basic design approach to interfacing Window Workflow Foundation 4 with Office documents using the OpenXML SDK. The sample workflow illustrates this approach, and shows how it can be implemented in a way that customizes Office documents using data in the workflow. The classes from which this solution is built are readily modifiable and extensible to meet a wide variety of similar needs. Though the workflow activities are written to take advantage of WF 4 and the .NET Framework 4, the Office interface libraries are also compatible with the .NET Framework 3.5.


Rick Spiewak is a lead software systems engineer with The MITRE Corp. He works with the U.S. Air Force Electronic Systems Center on Mission Planning. Spiewak has worked with Visual Basic since 1993 and the Microsoft .NET Framework since 2002, when he was a beta tester for Visual Studio .NET 2003. He has long experience integrating Office applications with a variety of workflow tools.

Thanks to the following technical expert for reviewing this article: Andrew Coates