sup { vertical-align:text-top; }

Form Filler

Build Workflows To Capture Data And Create Documents

Rick Spiewak

This article discusses:

  • Building custom activities
  • Interfacing with Microsoft Office
  • Passing data into workflow activities
  • Extracting workflow data to Office documents
This article uses the following technologies:
Windows Workflow Foundation, 2007 Office System , Visual Basic

Contents

Overall Design
Getting Data into the Workflow
Processing the Data
A Sample Workflow
Deciding Which Way to Go
Supporting Players
Finishing Touches

Workflow describes a method for automating business processes. A workflow might be used to handle customer requests, process insurance claims, or redeem shares in a mutual fund. Workflows may start when a document arrives in the mail, a request is received from a Web site, or a customer calls into a support center. Predefined tasks may then be assigned to be completed, either automated or by human actions.

In the past, attempts to provide this type of automation generally relied on monolithic workflow systems, around which the business process was built. Windows® Workflow Foundation (WF) provides a component-based approach that allows you to build workflow to support a business process, rather than the other way around.

Business processes that call for workflow often require that process-related documents be consumed or created. This requirement can occur, for example, when an application (for a loan, say, or redemption of shares) has been approved or rejected during the workflow process. This might occur after examination by a program (automatically) or an underwriter (manually). A letter might need to be written or a spreadsheet showing balances might need to be produced.

In this article, I will talk about techniques for integrating Microsoft® Office applications with WF. I'll discuss how to use InfoPath® forms and other Office documents to capture data, how to pass data to targeted activities, and how to use these data fields for decisions as well as for creating or populating new Office documents.

I used Visual Studio® 2008 to test a number of these WF and Office integration scenarios, including enumerating named fields, extracting their contents, and filling in documents from templates. I was able to write custom workflow activities to support data entry, filling in various document types and displaying finished documents. I also added designer support for these activities so that their appearance is distinct from other WF activities.

Overall Design

Any design for integrating workflow and Office has three basic components: getting data into the workflow, using the data to create or update Office documents, and storing or returning the output documents. To support these needs, I created a set of interfaces to enable support for a handful of discrete workflow tasks over the underlying Office documents and applications.

The first of these tasks was to get the names of data fields in Office documents. While I'll use the generic term field to describe these, in the case of Microsoft Word, these are bookmarks; Microsoft Excel® uses named ranges, InfoPath fields show up as XML nodes, and PowerPoint® has names for shapes. (Prior to the 2007 Office system, there was no way within the UI to rename shapes. In PowerPoint 2007, you can open the Selection Pane to do this.)

Next is the task of populating these fields with data. I chose to accomplish this by accepting an input Dictionary(Of String, String) and matching the contents of the dictionary to fields in the document. In addition, I implemented the IDisposable interface so that I could clean up COM objects.

In implementing activities to fill in the Office documents, I used a base class named OfficeFormFill class. This supports the disposable pattern and includes a shared Designer that lends visual uniformity to the activities. From this base class, I then derived my custom activity classes.

The workflow was built on custom activities called WordFormFill, ExcelFormFill, PowerPointFormFill, and DataEntryActivity. The function of DataEntryActivity is simply to allow for the introduction of workflow-related variables that are not specific to a document into the generic dictionary structure used by the various OfficeFormFill activities. It uses an Office document as a template for this purpose, enumerating the named fields and presenting them to the user to be filled in. The template can be from any of the supported document classes, without regard to the eventual target document or documents to be created and filled in by the workflow.

Getting Data into the Workflow

One of the first challenges I faced in integrating Office documents into workflows was how to surface data contained by an input document from within the workflow. The standard workflow paradigm relies on advance knowledge of the names of properties associated with activities. Dependency properties can be promoted to be made visible to the workflow, and to other activities. I decided that this was too rigid, as it would require that the overall workflow design be tied to specific fields in the input document. In the case of Office integration, the workflow activity is acting as a generic proxy for any Office document. It isn't realistic to try and predetermine the names of the fields in the document, as this would require a custom workflow activity for different types of documents.

Looking at the way arguments are passed into workflows, and by extension activities, you can see that they are passed as a generic Dictionary(Of String, Object). The WF runtime then binds the elements of the dictionary to properties of the workflow itself. To fill in fields in an Office document, you need two pieces of information: the name of the field and the value to insert. The general strategy I have adopted in the past has been to enumerate the named fields in the document and then match them to the WF input parameters dictionary by name. If a document field matches the key in the dictionary, it is passed to the input parameter of the same name. In this case, however, this was not the way I thought they should be handled, because properties would have to be created on the workflow for each of the potential document fields that might be used.

Rather than naming and promoting properties in an activity to correspond with fields in the document, I decided to use a generic Dictionary(Of String, String) to convey these. I named this parameter Fields, and use it in each of the workflow activities. The key is used to match the field name; the value is used to populate the field contents. This Fields dictionary is then passed as one of the entries in the parameters passed to the workflow. So the real parameters, at least as far as Office integration is concerned, are contained in a Dictionary within a Dictionary (see Figure 1).

Figure 1 Passing Fields to the Workflow

Private WithEvents workflowRuntime As WorkflowRuntime = Nothing
Private workflowInstance As WorkflowInstance = Nothing
Private WithEvents waitHandle As New AutoResetEvent(False)
Public Sub main()
   Dim WorkflowType As Type = GetType(Workflow4)
   Dim Params As New Dictionary(Of String, Object)
   Dim Fields As New Dictionary(Of String, String)
   Params.Add("Fields", Fields)
   ' Start the workflow
   workflowRuntime = New WorkflowRuntime
   workflowInstance = _
      workflowRuntime.CreateWorkflow(WorkflowType, Params)
   workflowInstance.Start()
   waitHandle.WaitOne()

I wanted all of the workflow activities to be broadly useful and to have more than one strategy available for specifying the input document or template. To allow for this, all of the activities also have in common the parameter InputDocument. This is a dependency property so it can be promoted as the needs of the workflow dictate. It holds the path to an input document or template. However, the code also provides for using a Field parameter whose name matches the activity's name if it contains a path to a document or template suitable for the target Office application.

Any real workflow will have a source of input data. For demonstration purposes, I used a DataEntry activity, which can draw its template (the fields and default values) from any supported Office document type. It displays an input form, which allows the user to specify a target activity. This can be a particular OfficeFormFill activity, or a composite activity. My example is an IfElseActivity. The code needs knowledge of only the general Fields property used by all OfficeFormFill activities.

Processing the Data

As I noted earlier, each of the Office document types has its own collection of named fields. Each of the OfficeFormFill-derived activities was written to support a particular document type. While it would certainly be possible to combine functions which reference multiple document types (DataEntryActivity was an example of this), there is one caveat: if you reference one of the Office Primary Interop Assemblies (PIAs) in your code and it isn't present on the system your code is ultimately deployed to, you may get an exception even if the execution path doesn't use the specific PIA. For example, a Select Case or If statement that seems to bypass the absent component will still result in an exception. That's why it pays to isolate the calls to Office components. If you decide to create a combined activity, be sure to maintain this isolation. And, of course, always test your application in all potential target environments.

The activities that support each of the Office document types all follow the same pattern. If the InputDocument property is provided, it is used as the path to the document or template. If the InputDocument property is null, the activity looks in the Fields property for a key that matches the name of the activity. If this is found, it is examined to see whether it contains a path with a suffix matching the type of document the activity handles. If these conditions are met, the InputDocument property is set to this value.

Each matching entry from the Fields collection is used to fill in the corresponding field in the document. The result is placed into the output document. This is either passed in as the corresponding dependency property (OutputDocument), or is found in the Fields collection as the Output KeyValuePair entry. In all cases, if the output document has no suffix, an appropriate default suffix is appended. This allows the same value to potentially be used to create different document types or even multiple documents of different types.

The output document will be stored at the path specified. In most workflow environments, this would be a network share or a SharePoint® document library. For simplicity, I used a local path.

Each of the activities also has a field named Visible, to optionally show the document as it's being filled in. Figure 2 shows the code for filling a Word document. Note in particular the line:

WordFiller = New WordInterface(InputDocument, Fields)

Figure 2 Populating a Word Document

<STAThread()> Protected Overrides Function Execute _
  (ByVal executionContext As ActivityExecutionContext) _
  As ActivityExecutionStatus

  Dim Status As ActivityExecutionStatus

  ' Open the target document or template
  If String.IsNullOrEmpty(Me.InputDocument) Then
    If Me.Fields _
      IsNot Nothing AndAlso Me.Fields.ContainsKey(Me.Name) Then  
      ' Use a field named for *this* activity if it is a document 
      Dim NameValue As String = Fields(Me.Name)
      Select Case System.IO.Path.GetExtension(NameValue).ToLowerInvariant
        Case ".doc", ".docx", ".dot", ".dotx"
          InputDocument = NameValue
        Case Else
          Throw New ArgumentException( _ 
   "Input Document Invalid or Missing")
      End Select
    End If
  End If

  ' Create or open the required document from an input 
  ' document or template
  WordFiller = New WordInterface(InputDocument, Fields)

  ' Production workflow may not want to show the document
  If Me.Fields.ContainsKey("Visible") Then
    Boolean.TryParse(Me.Fields("Visible"), WordFiller.Visible)
  End If

  ' Get success or failure from the Word Interface class
  Dim Success As Boolean = WordFiller.FillInDocument()
  WordApp = WordFiller.WordApp

  If Success Then
    ' Find the target output document
    If String.IsNullOrEmpty(Me.OutputDocument) Then
      If Me.Fields.ContainsKey("Output") Then
        Dim NameValue As String = Fields("Output").ToString

        If NameValue.EndsWith(".doc", _
          StringComparison.CurrentCultureIgnoreCase) _
          OrElse NameValue.EndsWith(".docx", _
          StringComparison.CurrentCultureIgnoreCase) Then

          ' Set the document property to the provided name
          Me.OutputDocument = NameValue
        Else 'Force .doc suffix
          Me.OutputDocument = NameValue & ".doc"
        End If
      End If
    End If

    ' Save the output 
    WordApp.ActiveDocument.SaveAs(Me.OutputDocument)

    Cleanup()

    Status = ActivityExecutionStatus.Closed
    executionContext.CloseActivity()
  Else
    Cleanup()

    Status = ActivityExecutionStatus.Closed
    executionContext.CancelActivity(Me)
  End If

  Return Status
End Function

This is where the WordInterface class instance was constructed and passed the path to the document to use as a template, along with the field data. These were simply stored in the corresponding properties for use by the methods of the class.

The WordInterface class provides the functionality to fill in the target document (see Figure 3). If the input document is a template, a new instance of the document is created. If it is a document, it is opened as read-only to prevent accidental overwrite. This class also inherits its Visible property from the base class. Note that the code here omits some error checking for the sake of brevity.

Figure 3 FillInDocument

Public Overrides Function FillInDocument() As Boolean
  Dim Status As Boolean = False
  _WordApp = New Word.Application
  WordApp.Visible = Visible

  ' Check for template
  Select Case Path.GetExtension(Document).ToLowerInvariant
    Case ".dot", ".dotx"
      _WordDocument = WordApp.Documents.Add(Document)
    Case ".doc", ".docx"
      _WordDocument = WordApp.Documents.Open(FileName:=Document, _
        ReadOnly:=True)
  End Select

  ' Determine dictionary variables to use 
  ' based on bookmarks in the document matching Fields entries
  If WordDocument IsNot Nothing _
    AndAlso WordDocument.Bookmarks.Count > 0 Then

    Dim BookMark As Word.Bookmark = Nothing
    For i As Integer = 1 To WordDocument.Bookmarks.Count
      BookMark = WordDocument.Bookmarks(i)
      Dim BookMarkName As String = BookMark.Name
      Dim rng As Word.Range = BookMark.Range
      If Me.Fields.ContainsKey(BookMarkName) Then
        rng.Text = Fields(BookMarkName).ToString   
        'This results in the bookmark being lost, it needs to be replaced
        WordApp.ActiveDocument.Bookmarks.Add(BookMarkName, rng)
      Else
        ' Handle special case(s)
        Select Case BookMark.Name
          Case "FullName"
            rng.Text = GetFullName(Fields)
        End Select
      End If
    Next
      Status = True
  Else
    Status = False
  End If

  Return Status
End Function

I added a special case field name called FullName. If the document contained a field with this name, I concatenate input fields called Title, FirstName, and LastName to populate it. Since all Office document types have similar needs, this function was moved into the base OfficeInterface class along with some other common properties. The OfficeInterface classes also handled the required cleanup logic to ensure the proper disposal of the COM objects created in order to manipulate the Office documents.

Each of the OfficeFormFill classes has an OutputDocument property. These can be directly set by several means. Within the designer, a property can be bound to a workflow-level parameter (including properties promoted from other activities) or to a constant value. At run time, each of the OfficeFormFill types will look in its OutputDocument property for the path to save its document. If this is not set, it will look within its Fields collection for a key named Output. If the value ends with an appropriate suffix, it is used as-is. If it doesn't have an appropriate suffix, one is added. The Activity then saves the output document. This allows maximum flexibility in deciding where to put the path to the output document. Again, because the suffix is omitted, the same value can be used by any of the OfficeFormFill types to save the document in the correct respective format. (In Word and PowerPoint, the suffix I appended is the compatibility mode version. Excel provides a property, Excel8CompatibilityMode, that can be used to distinguish the document type, so you can decide this directly.)

A Sample Workflow

Figure 4 shows the sample workflow I'll use to demonstrate how the integration works. I'll go through each of the activities in turn, and talk about what they do.

fig04.gif

Figure 4 Document Creation Workflow

At the top, EnterCustomerData is an instance of the DataEntryActivity class. It relies on a DataEntryDocument property that was, in this case, simply set at design time (see Figure 5). This property could also have been set by the program that launched the workflow. DataEntryActivity presents a Windows Form with a DataGridView control, which is populated by extracting the named fields from DataEntryDocument. It also shows the path to this document. The user can add or modify the values in the fields.

fig05.gif

Figure 5 Bindings for EnterCustomerData Activity

DataEntryDocument can be any of the supported Office document types. For this example, an InfoPath document template is used. To extract the fields from the document, the appropriate OfficeInterface class is called. It loads the target document into the appropriate application and enumerates the fields (and their contents if present). I used a checkbox, set by default, to exclude the default names for shapes in PowerPoint. These have names like TextBox 1. The code behind DataEntryActivityForm includes a routine to exclude field names based on a regular expression, and this was used to skip including these values and names in the DataGridView.

One of the fields provided by DataEntryActivity is TargetActivity. This is just the name of the activity whose Fields property is to be populated with the fields collected from the input document. DataEntryActivity could then target any of the OfficeFormFill derived activities or any CompositeActivity. For this article, the TargetActivity field is shown as pre-populated with CustomerScenario.

The target activity is passed the data from DataEntryActivity. When the target is a composite activity, this process can require walking up the activity tree to find the appropriate Fields property. I created a WorkflowUtilities class to contain routines for locating activities by name or by desired property.

Deciding Which Way to Go

The CustomerScenario activity is an IfElseActivity (which derives from CompositeActivity). The IfElseActivity type executes each branch in turn looking for a return value of True. Once a branch returns True, other branches are not executed.

The CompositeActivity's structure is maintained within the Extensible Application Markup Language (XAML) file called Workflow4.xoml, and its logic is maintained in the code file called Workflow4.xoml.vb. The first part of the CustomerScenario activity is shown in Figure 6.

Figure 6 Excerpt from CustomerScenario

<IfElseActivity x:Name="CustomerScenario">
  <IfElseBranchActivity x:Name="ProcessApproval">
    <IfElseBranchActivity.Condition>
    <CodeCondition Condition="ifElseBranchActivity1Condition" />
    </IfElseBranchActivity.Condition>
    <CodeActivity x:Name="codeActivity1" ExecuteCode=
      "{ActivityBind Workflow4,Path=codeActivity1_ExecuteCode1}" />
    <ns1:WordFormFill Description="Fill in fields in a Word document"
      x:Name="WriteCustomerLetter" Fields="{x:Null}" 
      OutputDocument="{x:Null}" 
      InputDocument=" C:\MSDN Workflow\Templates\Customer Letter.doc" />
</IfElseBranchActivity>

For the code behind the branch to process the decision, a return value is set in the Result property of the event arguments passed to each branch. The decision to continue in a particular branch is made in the code, based on the value of the Status field passed to the activity. The branch labeled ProcessApproval looks for a value of Approved. The ProcessDisapproval branch looks for Disapproved. Any other value is handled by the ProcessUndecided branch. Figure 7 shows the code executed in the ProcessApproval branch.

Figure 7 ProcessApproval

Public Sub ifElseBranchActivity1Condition(ByVal sender As Object, _
  ByVal e As ConditionalEventArgs)
  ' Look for status matching desired value for *this* branch
  If Me.Fields.ContainsKey("Status") AndAlso _
    Me.Fields("Status") = "Approved" Then

    ' Look for a WordFormFill activity as a child to *this* branch
    Dim target As Activity = _
      WorkflowUtilities.FindActivityByType( _
      GetType(WordFormActivity.WordFormFill), _
      (DirectCast(sender, Activity)))

    If target IsNot Nothing Then
      Dim TargetActivity As WordFormActivity.WordFormFill = _
        DirectCast(target, WordFormActivity.WordFormFill)
      TargetActivity.Fields = Me.Fields
      ' Set input document by using field named for target activity
      TargetActivity.Fields.Add(TargetActivity.Name, Me.InputDocument)
      e.Result = True

    Else
      e.Result = False
    End If

  Else
    e.Result = False
  End If

End Sub

The other branches are similar, but pass their data on to a Power­PointFormFill or ExcelFormFill activity type. In a real business process, they might simply use different Word templates to do something like produce a customer letter rather than interfacing with additional Office document types.

The WordFormFill instance executed in this branch has its input (template) document set at design time as shown in Figure 8. Note that the Fields property is initially null. This is because it is set programmatically.

fig08.gif

Figure 8 WriteCustomerLetter Properties

Supporting Players

Part of the approach used in the example workflow relies on the ability of one activity to find another. This requires that the activities be child activities of a CompositeActivity, which include the major workflow types: sequential, state machine, and rules-driven. This means that any nontrivial workflow will have this type of relationship among its activities.

The WorkFlowUtilities class provides methods to search a tree of activities for a descendant activity by name or type. It also provides functions to search higher in the tree for ancestor activities with a particular property.

These utility methods support finding target activities to pass parameters to, as well as finding the source for input parameters. So, for example, when the DataEntryActivity is passed a TargetActivity parameter in its Fields collection, it can find it anywhere in the activity tree starting with its own container:

Dim EntryActivity As Activity = Nothing
EntryActivity = WorkflowUtilities.FindActivityByName( _
  TargetActivityName, Me.Parent)

Finishing Touches

The example workflow announces the activity's completion. The activity selected by the Status field creates the output document in the specified location. From here, it can could be picked up by other activities for additional processing.

I've outlined a basic design approach to interfacing workflow with Office client applications. By adhering to the principles of object-oriented design, you can create reusable workflow activities and supporting classes that meet a variety of similar needs.

Rick Spiewak is a Lead Software Systems Engineer with The MITRE Corporation. He is working with the U.S. Air Force Electronic Systems Center on Mission Planning. He has worked with Visual Basic since 1993 and the Microsoft .NET Framework since 2002. He was a beta tester for Visual Studio .NET 2003.