Open XML Format SDK 2.0

Hello, my name is Zeyad Rajabi and I am a Program Manager on Brian's team. For the next several posts I will be talking about the Open XML SDK and will show you how to use the SDK to accomplish real world scenarios such as document assembly and document manipulation. Expect to see lots of code samples and demos.

In today's post, I am going to talk about the overall design of the Open XML SDK with respect to goals and scenarios. In subsequent posts, I will dive more deeply into the architecture of the SDK as well as show you lots of sample code. If you want to jump ahead and get started with the SDK, you can download the latest CTP here. I would also recommend joining the Connect site, found here, to get access to the latest articles, how to topics, and forums.

What is the Open XML SDK?

The Open XML SDK provides a set of .Net APIs that allows developers to create and manipulate documents in the Open XML Formats in both client and server environments without the need of the Office clients. The SDK should make it easier for you to build solutions on top of the Open XML Format by allowing you to perform complex operations, such as creating Open XML packages or adding/deleting tables, with just a few lines of code. Check out the following "hello world" example for a WordprocessingML document:

public void HelloWorld(string docName)

  // Create a Wordprocessing document.
  using (WordprocessingDocument package = WordprocessingDocument.Create(docName, WordprocessingDocumentType.Document))
    // Add a new main document part.

    // Create the Document DOM.
    package.MainDocumentPart.Document =
      new Document(
        new Body(
          new Paragraph(
            new Run(
              new Text("Hello World!")))));

    // Save changes to the main document part.

The SDK takes care of both the structure of the Open XML Format as well as the xml contained in each of the parts of the package. In other words, with this SDK, you will be able to add or remove parts within a package as well as manipulate xml constructs, such as paragraphs and tables.

The SDK also supports programming in the style of LINQ to XML, which makes coding against XML content much easier than the traditional W3C XML DOM programming model.

Why Use the Open XML SDK?

Using the Open XML SDK to create solutions that manipulate documents directly has many advantages as compared with automating Microsoft Office applications using macros or VBA. For those of you not familiar with the pains of automating Office applications on the server, check out the following KB article:

The major advantage is that the Open XML SDK is fully supported on the server, unlike automating Office applications. That means you can create managed code solutions that are scalable and stable on the server. Imagine being able to write multi-threaded solutions that build on top of the SDK.

In addition, there is a huge performance advantage when developing solutions with the Open XML SDK, which is very evident when dealing with large numbers of documents. You will be able to programmatically generate 1000s of documents based on data from a database within a matter of seconds rather than hours.

Lastly, the Open XML SDK is a dedicated file format API that specializes in the manipulation and creation of Open XML packages. The SDK is fully aware of the structure and schema of Open XML Formats.

The SDK should be the first thing you use when developing Open XML solutions.

What Can't the Open XML SDK do?

Before we get into the design of the SDK I want to point out a couple of key points of what the SDK will not be able to do:

  • The Open XML SDK is NOT a replacement for the Office Object Model; and provides no abstraction on top of the file formats
    • You need to understand the structure of the file formats to leverage the SDK, it doesn't hide it from you
  • The SDK does NOT provide functionality to convert Open XML Formats to and from other formats, like HTML or XPS
  • The SDK does NOT guarantee document validity of Open XML Formats when developers use the SDK or if the developer chooses to manipulate the underlying xml directly
    • We are working on providing validation functionality in subsequent CTP releases of version 2.0 of the SDK
  • The SDK does NOT provide application behaviors such as layout (ex. pagination of WordprocessingML documents) or recalculation functionality

Open XML SDK Roadmap

We decided to release the Open XML SDK as two versions:

  1. Version 1.0 – allows for direct manipulation of the Open XML Package at the part level
  2. Version 2.0 – provides strongly typed class support for the underlying XML content contained in each part

In other words, version 1.0 of the SDK deals with the structure or skeleton of Open XML Formats, while version 2.0 of the SDK deals with the xml contained within each of the xml parts. I will show you guys some code of version 1.0 vs. version 2.0 in a later post.

Version 1.0 of the SDK has been fully released with a "go-live" license back in June 2008. With this go-live license you will be able to build and deploy solutions confidently.

A couple of weeks ago we released the first Community Technology Preview (CTP) of version 2.0 of the Open XML SDK. Keep in mind this version of the SDK is still a CTP, so we are expecting to get a lot of customer feedback to polish this API.

SDK Version 1.0 download

SDK Version 2.0 download

MSDN SDK online documentation

MSDN SDK forum

Microsoft Connect for the SDK

What Scenarios Does the Open XML Target?

Let's suppose you are an xml developer, who understands the Open XML standard and are quite comfortable manipulating and creating Open XML files. The Open XML SDK targets the following core scenarios:

Strongly Typed Classes and Objects

Instead of relying on generic XML functionality to manipulate xml, where you need to be aware of element/attribute/value spelling as well as namespaces, you are able to use the Open XML SDK to accomplish the same solutions by manipulating objects that represent elements/attributes/values. All schema types are represented as strongly typed Common Language Runtime (CLR) classes and all attribute values as enumerations. In other words, you do not need to always reference the standard and Open XML schemas for hierarchy, spelling and namespace, but instead can use .Net's intellisense capabilities to faster and more reliably develop solutions.

Content Construction, Search, and Manipulation

Using the Open XML SDK you are able to continue to take advantage of your LINQ knowledge because the technology is built directly into the SDK. With the SDK you are able to perform functional constructs and lambda expression queries directly on objects representing Open XML elements. In addition, the SDK allows you to easily traverse and manipulate content by providing support of collections of objects, like tables and paragraphs.


You have the ability to specify which version of the Open XML format you are targeting, and the API will take this into account for validation. Note: This scenario will be added in future releases of version 2.0 of the SDK.

Markup Language Specific Scenario

With the SDK you can perform a variety of tasks to all types of Open XML packages and Open XML markup languages. For example, you can construct tables with dynamic data in a WordprocessingML document, extract and analyze data in a SpreadsheetML workbook, search and report incompliant content in a PresentationML presentation, or change shape colors in DrawingML.

Next Time

In my next post I am going to talk a bit more about the overall architecture of the SDK as shown in the diagram below.

Let me know if you have any specific questions or comments that you would like me to address here or in future posts.

Zeyad Rajabi