October 2009

Volume 24 Number 10

SharePoint and Open XML - Generating Documents from SharePoint with Open XML Content Controls

By Eric White | October 2009

It's often the case that a department manager needs to regularly send a nicely formatted status report to her general manager or that a team leader needs to send a weekly status report to a number of interested parties. To collaborate with others in their organizations, both the manager and the team leader can maintain status information in SharePoint lists. The question for developers is how to include the information in the lists in a document such as a status report.

Open XML, the default file format for Office 2007, is an ISO standard (documented in exacting detail in IS29500). Put simply, Open XML files are Zip files that contain XML, and it is very easy to generate or modify Open XML documents programmatically. All you need is a library to open a Zip file and an XML API. Using the programmability features of Open XML and SharePoint, you can put together a small document-generation system that takes advantage of Open XML content controls and puts people like the manager and team leader in charge of their reports. In this article, I'll present some guidance and sample code for creating a document-generation system that uses SharePoint lists to populate tables in an Open XML word-processing document.

Overview of the Example

The example in this article is a small SharePoint Web Part that generates an Open XML word-processing document from data that lives in SharePoint lists. I've created two custom SharePoint lists, shown in Figure 1, that contain the data we want to insert into the tables.

OpenXML_figure1a

OpenXML_figure1b
Figure 1 Two Custom SharePoint Lists

I've also created a template Open XML word-processing document that contains content controls defining the lists and columns that are the source of the data for the generated document. The controls are shown in Figure 2.

OpenXML_Figure2
Figure 2 The Template Open XML Document That Contains Content Controls

Finally, I created a Web Part that retrieves the list of template documents from a specific document library and presents the list to users. A user selects an item in the list and then clicks the Generate Report button. The Web Part creates an Open XML word-processing document, places it in the Reports document library, and redirects the user to that library so that she can open the report. The Web Part is shown in Figure 3, and the document it generates is shown in Figure 4.

OpenXML_Figure3
Figure 3 A SharePoint Web Part That Allows Users to Select a Template Document

Open XML Content Controls

Before describing the SharePoint solution, I'll cover the basics of Open XML content controls. Open XML content controls provide a facility in word-processing documents that allows you to delineate content and associate metadata with that content. To use content controls, you must enable the Developer tab in Microsoft Office Word 2007. (Click Word Options on the Office menu; then, in the Word Options dialog box, select Show Developer Tab in Ribbon.)

To insert a content control, select some text and then click the button in the Controls area of the Developer tab that creates a plain text content control, shown in Figure 5.

ee532473.OpenXML_Figure4
Figure 4 An Open XML Word-Processing Document That Contains the Generated Report

ee532473.OpenXML_Figure5
Figure 5 Use This Button to Create a Plain Text Content Control

You can set properties for a content control to give it a title and assign it a tag. Click in the content control, and then click the Properties button in the Controls area on the Developer tab. This displays a dialog box you can use to set the title and the tag.

Content controls use the w:sdt element in Open XML markup, which is shown in Figure 6. The contents of the content control are defined in the w:sdtContent element. In the figure, you can also see the title of the content control in the w:alias element and the content control tag in the w:tag element.

Programming for Open XML Using the .NET Framework

You can take a variety of approaches to programming for Open XML using the Microsoft .NET Framework:

  • Use the classes in System.IO.Packaging
  • Use the Open XML SDK with any of the XML programming technologies available in .NET, including XmlDocument, XmlParser, or LINQ to XML. My favorite is LINQ to XML.
  • Use the strongly typed object model of the Open XML SDK version 2.0. You can find a number of articles that introduce how to program with this object model.

Here, I'm going to use the Open XML SDK (either version 1.0 or 2.0) with LINQ to XML. You can download the Open XML SDK from go.microsoft.com/fwlink/?LinkId=127912.

It is useful to encapsulate some Open XML functionality around content controls in a ContentControlManager class. When you approach the problem in this way, you can develop your Open XML functionality in a simple console application. After you have coded and debugged your Open XML functionality, you can then incorporate it in your SharePoint feature with a minimum of effort. It's pretty time-consuming to incur the overhead of deploying the SharePoint feature while debugging Open XML code.

For our SharePoint document-generation example, we want to write some code that retrieves a template document from a specific document library, queries the document for the content controls it contains and uses the metadata stored in each content control to populate the document with data from the appropriate SharePoint list.

If you download the sample and examine the template Open XML document, you'll see that it contains a content control that surrounds each table and that content controls are inserted in each cell in the bottom row of each table. The tag for each content control in the table cells specifies the name of a column in the SharePoint lists. For convenience, I've also set the title of each content control to the same value as the tag. Content controls display their title when the insertion point is inside the control.

When you write the code for a SharePoint feature that generates an Open XML document, the code should first query the document for these content controls. The query returns an XML tree that describes the structure of the content controls, along with the tag for each. If you run this code on the sample document, it produces the following XML:

<ContentControls>
    <Table Name="Team Members">
        <Field Name="TeamMemberName" />
        <Field Name="Role" />
    </Table>
    <Table Name="Item List">
        <Field Name="ItemName" />
        <Field Name="Description" />
        <Field Name="EstimatedHours" />
        <Field Name="AssignedTo" />
    </Table>
</ContentControls>

This XML document shows which SharePoint lists our code needs to query. For each item in the list, you need to retrieve the values of the specified columns. The code to query the Open XML word-processing document, shown in Figure 7, is written as a LINQ to XML query that uses functional construction to form the returned XML.

To use functional construction, the code instantiates an XElement object using its constructor, passing a LINQ to XML query as an argument to the constructor. The LINQ to XML query uses axis methods to retrieve the appropriate elements in the body of the document and uses the Enumerable.Select extension method to form new XML from the results of the query. Functional construction takes a bit of study to understand, but as you can see, once you wrap your head around it, you can do an awful lot with just a little bit of code.

Figure 6 Open XML Markup for a Content Control

<w:p>
    <w:r>
        <w:t xml:space="preserve">Not in content control. </w:t>
    </w:r>
    <w:sdt>
        <w:sdtPr>
            <w:alias w:val="Test"/>
            <w:tag w:val="Test"/>
            <w:id w:val="5118254"/>
            <w:placeholder>
                    <w:docPart w:val="DefaultPlaceholder_22675703"/>
            </w:placeholder>
        </w:sdtPr>
        <w:sdtContent>
            <w:r>
                <w:t>This is text in content control.</w:t>
            </w:r>
        </w:sdtContent>
    </w:sdt>
    <w:r>
        <w:t xml:space="preserve"> Not in content control.</w:t>
    </w:r>
</w:p>

Preatomization of XName and XNamespace Objects

The code in Figure 7 uses an approach called "preatomization" of the LINQ to XML names. This is just a fancy way of saying that you write a static class (see Figure 8) that contains static fields that are initialized to the qualified names of the elements and attributes you are using.

Figure 7 Retrieving the Structure of the Content Controls in the Template Document

public static XElement GetContentControls(
    WordprocessingDocument document)
{
    XElement contentControls = new XElement("ContentControls",
        document
            .MainDocumentPart
            .GetXDocument()
            .Root
            .Element(W.body)
            .Elements(W.sdt)
            .Select(tableContentControl =>
                new XElement("Table",
                    new XAttribute("Name", (string)tableContentControl
                        .Element(W.sdtPr).Element(W.tag).Attribute(
                            W.val)),
                    tableContentControl
                            .Descendants(W.sdt)
                            .Select(fieldContentControl =>
                                    new XElement("Field",
                                        new XAttribute("Name",
                                            (string)fieldContentControl
                                                  .Element(W.sdtPr)
                                                  .Element(W.tag)
                                                  .Attribute(W.val)
                                            )
                                    )
                            )
                )
            )
    );
    return contentControls;
}

There is a good reason to initialize XName and XNamespace objects in this fashion. LINQ to XML abstracts XML names and namespaces into two classes: System.Xml.Linq.XName and System.Xml.Linq.XNamespace, respectively. The semantics of these classes include the notion that if two XNames have the same qualified name (namespace + local name), they will be represented by the same object. This allows for fast comparison of XName objects. Instead of using a string comparison to select XElement objects of a given name, the code needs only to compare objects. When you initialize an XName object, LINQ to XML first looks in a cache to determine whether an XName object with the same namespace and name already exists. If one does, the object is initialized to the existing XName object from the cache. If one doesn't exist, LINQ to XML initializes a new one and adds it to the cache. As you might imagine, if this process is repeated over and over, it can result in performance issues. By initializing these objects in a static class, the work is done only once. In addition, by using this technique you reduce the possibility that an element or attribute name is misspelled in the body of the code. One other advantage is that by using this technique, you get support from IntelliSense, which makes writing Open XML programs using LINQ to XML easier.

Figure 8 A Static Class Containing Static Fields to Preatomize XName and XNamespace Objects

public static class W
{
    public static XNamespace w =
        "https://schemas.openxmlformats.org/wordprocessingml/2006/main";
    public static XName body = w + "body";
    public static XName sdt = w + "sdt";
    public static XName sdtPr = w + "sdtPr";
    public static XName tag = w + "tag";
    public static XName val = w + "val";
    public static XName sdtContent = w + "sdtContent";
    public static XName tbl = w + "tbl";
    public static XName tr = w + "tr";
    public static XName tc = w + "tc";
    public static XName p = w + "p";
    public static XName r = w + "r";
    public static XName t = w + "t";
    public static XName rPr = w + "rPr";
    public static XName highlight = w + "highlight";
    public static XName pPr = w + "pPr";
    public static XName color = w + "color";
    public static XName sz = w + "sz";
    public static XName szCs = w + "szCs";
}

The GetXDocument and PutXDocument Extension Methods

The example presented in this article also uses another little trick to ease programming and improve performance. The Open XML SDK has the ability to put annotations on parts in the document. This means that you can attach any .NET Framework object to an OpenXmlPart object and later retrieve it by specifying the type of the object that you attached.

We can define two extension methods, GetXDocument and PutXDocument, that use annotations to minimize deserialization of the XML from the Open XML part. When we call GetXDocument, it first looks to see whether an annotation of type XDocument exists on the OpenXmlPart. If the annotation exists, GetXDocument returns it. If the annotation doesn't exist, the method populates an XDocument from the part, annotates the part, and then returns the new XDocument.

The PutXDocument extension method also checks whether there is an annotation of type XDocument. If this annotation exists, PutXDocument writes the XDocument (presumably modified after the code calls GetXDocument) back to the OpenXMLPart. The GetXDocument and PutXDocument extension methods are shown in Figure 9. You can see the use of the GetXDocument extension method in the GetContentControls method listed earlier in Figure 7.

Figure 9 Extension Methods Use Open XML SDK Annotations to Minimize Deserialization of XML

public static class AssembleDocumentLocalExtensions
{
    public static XDocument GetXDocument(this OpenXmlPart part)
    {
        XDocument xdoc = part.Annotation<XDocument>();
        if (xdoc != null)
            return xdoc;
        using (Stream str = part.GetStream())
        using (StreamReader streamReader = new StreamReader(str))
        using (XmlReader xr = XmlReader.Create(streamReader))
            xdoc = XDocument.Load(xr);
        part.AddAnnotation(xdoc);
        return xdoc;
    }
    public static void PutXDocument(this OpenXmlPart part)
    {
        XDocument xdoc = part.GetXDocument();
        if (xdoc != null)
        {
            // Serialize the XDocument object back to the package.
            using (XmlWriter xw =
                XmlWriter.Create(part.GetStream
                (FileMode.Create, FileAccess.Write)))
            {
                xdoc.Save(xw);
            }
        }
    }
}

Replacing the Content Controls with Data

Now that we have a method that returns the structure of the content controls in the tables and cells, we need a method (SetContentControls) that creates an Open XML document with specific data (retrieved from the SharePoint lists) inserted in the tables. We can define this method to take an XML tree as an argument. The XML tree is shown in Figure 10, and Figure 11 shows the document that SetContentControls creates when it is passed the XML tree.

Figure 10 An XML Tree That Contains Data to Insert in the Word-Processing Document Tables

<ContentControls>
    <Table Name="Team Members">
        <Field Name="TeamMemberName" />
        <Field Name="Role" />
        <Row>
            <Field Name="TeamMemberName" Value="Bob" />
            <Field Name="Role" Value="Developer" />
        </Row>
        <Row>
            <Field Name="TeamMemberName" Value="Susan" />
            <Field Name="Role" Value="Program Manager" />
        </Row>
        <Row>
            <Field Name="TeamMemberName" Value="Jack" />
            <Field Name="Role" Value="Test" />
        </Row>
    </Table>
    <Table Name="Item List">
        <Field Name="ItemName" />
        <Field Name="Description" />
        <Field Name="EstimatedHours" />
        <Field Name="AssignedTo" />
        <Row>
            <Field Name="ItemName" Value="Learn SharePoint 2010" />
            <Field Name="Description" Value="This should be fun!" />
            <Field Name="EstimatedHours" Value="80" />
            <Field Name="AssignedTo" Value=”All” />
        </Row>
        <Row>
            <Field Name="ItemName" Value=
                "Finalize Import Module Specification" />
            <Field Name="Description" Value="Make sure to handle all document
            formats." />
            <Field Name="EstimatedHours" Value="35" />
            <Field Name="AssignedTo" Value=”Susan" />
        </Row>
        <Row>
            <Field Name="ItemName" Value="Write Test Plan" />
            <Field Name=”Description" Value=
            "Include regression testing items." />
            <Field Name="EstimatedHours" Value="20" />
            <Field Name="AssignedTo" Value="Jack" />
        </Row>
    </Table>
</ContentControls>


Figure 11 The Generated Document

You can see that the single row that contained content controls has been replaced by multiple rows, each containing the data from the XML tree that was passed as an argument to the method. By using an XML tree to pass the data to the code that manipulates the Open XML markup, you achieve a nice separation of the code that uses the SharePoint object model and the Open XML code.

The code to assemble the new document honors any formatting you've applied to the table. For example, if you've configured the table to show different colors for alternating rows, or if you've set the background color of a column, the newly generated document reflects your formatting changes.

If you download and examine the ContentControlManager example, you can see that the code gets a copy of the row that contains content controls and saves it as a prototype row:

// Determine the element for the row that contains the content controls.
// This is the prototype for the rows that the code will generate from data.
XElement prototypeRow = tableContentControl
    .Descendants(W.sdt)
    .Ancestors(W.tr)
    .FirstOrDefault();

Then, for each item retrieved from the SharePoint list, the code clones the prototype row, alters the cloned row with the data from the SharePoint list, and adds it to a collection that is inserted into the document.

After creating the list of new rows, the code removes the prototype row from the list and inserts the collection of newly created rows, as shown here:

XElement tableElement = prototypeRow.Ancestors(W.tbl).First();
prototypeRow.Remove();
tableElement.Add(newRows);

Creating the SharePoint Feature

I used the February 2009 CTP release of the Visual Studio 2008 extensions for Windows SharePoint Services 3.0, v1.3 to build this example. I've built and run this example on both 32-bit and 64-bit versions of WSS. (Kirk Evans has some great webcasts that show how to use these extensions.)

The example contains the code to create the Web Part controls. The code is pretty self-explanatory if you are accustomed to building SharePoint Web Parts. When a user clicks the Generate Report button, the code calls the CreateReport method, which assembles the new Open XML word-processing document from the template document by using the data in the SharePoint lists as configured in the tags of the content controls.There are a few points to note about the code for the CreateReport method. Files in a document library in SharePoint are returned as byte arrays. You need to convert this byte array into a memory stream so that you can open and modify the document using the Open XML SDK. One of the MemoryStream constructors takes a byte array, and you might be tempted to use that constructor. However, the memory stream created with that constructor is a nonresizable memory stream, and the Open XML SDK requires that the memory stream be resizable. The solution is to create a MemoryStream with the default constructor and then write the byte array from SharePoint into the MemoryStream, as shown in Figure 12.

Figure 12 Writing a Byte Array from SharePoint to a MemoryStream

private ModifyDocumentResults CreateReport(SPFile file, Label message)
{
    byte[] byteArray = file.OpenBinary();
    using (MemoryStream mem = new MemoryStream())
    {
        mem.Write(byteArray, 0, (int)byteArray.Length);
        try
        {
            using (WordprocessingDocument wordDoc =
                WordprocessingDocument.Open(mem, true))
            {
                // Get the content control structure from the template
                // document.
                XElement contentControlStructure =
                    ContentControlManager.GetContentControls(wordDoc);
                // Retrive data from SharePoint,
                    constructing the XML tree to
                // pass to the ContentControlManager.SetContentControls
                // method.
                ...
            }
        }
    }
}

The remainder of the code is straightforward. It uses the SharePoint object model to retrieve document libraries and the contents of the libraries, retrieve lists and retrieve values of columns for each row in the list. It assembles the XML tree to pass to ContentControlManager.SetContentControls, and then it calls SetContentControls.

The code assembles the name of the generated report document as Report-yyyy-mm-dd. If the report already exists, the code appends a number to the report name to disambiguate the report from other reports that have already been generated. For instance, if Report-2009-08-01.docx already exists, the report is written to Report-2009-8-2 (1).docx.

Easy Customizations

You will probably want to customize this example to suit your own needs. One possible enhancement is to allow for a content control in the body of the template document that pulls boilerplate content from a specified document stored in SharePoint. You could write the code so that you could place the name of the document that contains the boilerplate text as text in the content control.

Also, this example hard-codes the names of the TemplateReports and the Reports document libraries. You could remove this constraint by specifying this information in a SharePoint list. The code would then know about the name of only this configuration list. The names of the TemplateReports and the Reports document libraries would be driven by data from your configuration list.

SharePoint is a powerful technology that makes it easy for people in organizations to collaborate. Open XML is a powerful emerging technology that is changing the way that we generate documents. Using the two technologies together enables you to build applications in which people can use documents to collaborate in new ways.


Eric White* is a writer at Microsoft specializing in the Office Open XML file formats, Office and SharePoint. Before joining Microsoft in 2005, he worked as a developer for a number of years and then started PowerVista Software, a company that developed and sold a cross-platform grid widget. He has written books on custom control and GDI+ development. Read his blog at blogs.msdn.com/ericwhite.*