How to Create a List of All Parts in an Open XML Document

Sometimes you need to create a list of all parts in a package so that you can write some generalized code to deal with the parts. This post presents a bit of code that creates a list (List<OpenXmlPart>) of all parts in a package.

This blog is inactive.
New blog: EricWhite.com/blog

Blog TOCNote that the parts in a package don’t form a tree – they form a graph. Any part can potentially have a relationship to any other part, which in turn can potentially have a relationship back to the first part. So the recursive code has to look and see if the part already exists in the list, and not add it again if it does.

As an example of the use of this functionality, in the PowerTools for Open XML project, I had to write some generalized code that used some parameters to modify XML in arbitrary parts. I used the following technique:

  • I added an event handler to the XDocument that would be called if any code anywhere modified the XML in the document.
  • When the event was raised, I would add a semaphore annotation to the OpenXmlPart object, and remove the event handler (as it was no longer needed).
  • Then, after all modifications had been made to all parts, I could then iterate through all parts in the package looking for the semaphore annotation, and write all modified parts back to the package.

As another example, I needed to serialize all XML parts of a package into one larger XML document so that all parts would be available for an XSLT transform. After retrieving all parts using this code, it is trivial to write some code to create the larger XML document.

This code uses the Open XML SDK. Download it at https://go.microsoft.com/fwlink/?LinkId=120908.

When you run this code on a small Open XML document, you see something like this:

URI Content Type

=== ============

/docProps/app.xml application/vnd.openxmlformats-officedocument.extended-properties+xml

/word/theme/theme1.xml application/vnd.openxmlformats-officedocument.theme+xml

/word/document.xml application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml

/word/fontTable.xml application/vnd.openxmlformats-officedocument.wordprocessingml.fontTable+xml

/word/settings.xml application/vnd.openxmlformats-officedocument.wordprocessingml.settings+xml

/word/styles.xml application/vnd.openxmlformats-officedocument.wordprocessingml.styles+xml

/word/webSettings.xml application/vnd.openxmlformats-officedocument.wordprocessingml.webSettings+xml

/docProps/core.xml application/vnd.openxmlformats-package.core-properties+xml

/word/media/image1.png image/png

 

Here is the code (also attached to this post):

using System;

using System.Collections.Generic;

using System.Linq;

using System.IO;

using System.Xml.Linq;

using DocumentFormat.OpenXml.Packaging;

 

class Program

{

    private static void AddPart(HashSet<OpenXmlPart> partList, OpenXmlPart part)

    {

        if (partList.Contains(part))

            return;

        partList.Add(part);

        foreach (IdPartPair p in part.Parts)

            AddPart(partList, p.OpenXmlPart);

    }

 

    // the following three functions, plus the recursive function above,

    // creates a complete list of all parts in package.

    public static List<OpenXmlPart> GetAllParts(WordprocessingDocument doc)

    {

        // use the following so that parts are processed only once

        HashSet<OpenXmlPart> partList = new HashSet<OpenXmlPart>();

        foreach (IdPartPair p in doc.Parts)

            AddPart(partList, p.OpenXmlPart);

        return partList.OrderBy(p => p.ContentType).ThenBy(p => p.Uri.ToString()).ToList();

    }

 

    public static List<OpenXmlPart> GetAllParts(SpreadsheetDocument doc)

    {

        // use the following so that parts are processed only once

        HashSet<OpenXmlPart> partList = new HashSet<OpenXmlPart>();

        foreach (IdPartPair p in doc.Parts)

            AddPart(partList, p.OpenXmlPart);

        return partList.OrderBy(p => p.ContentType).ThenBy(p => p.Uri.ToString()).ToList();

    }

 

    public static List<OpenXmlPart> GetAllParts(PresentationDocument doc)

    {

        // use the following so that parts are processed only once

        HashSet<OpenXmlPart> partList = new HashSet<OpenXmlPart>();

        foreach (IdPartPair p in doc.Parts)

            AddPart(partList, p.OpenXmlPart);

        return partList.OrderBy(p => p.ContentType).ThenBy(p => p.Uri.ToString()).ToList();

    }

 

    public static void PrintParts(List<OpenXmlPart> partList)

    {

        int[] tabs = new[] { 25 };

        Console.WriteLine("{0}{1}", "URI".PadRight(tabs[0]), "Content Type");

        Console.WriteLine("{0}{1}", "===".PadRight(tabs[0]), "============");

  foreach (var p in partList)

            Console.WriteLine("{0}{1}", p.Uri.ToString().PadRight(tabs[0]), p.ContentType);

    }

 

    static void Main(string[] args)

    {

        string file = "Test.docx";

        if (!File.Exists(file))

        {

            Console.WriteLine("File '{0}' doesn't exist.", file);

            Environment.Exit(1);

        }

        FileInfo fi = new FileInfo(file);

        switch (fi.Extension.ToLower())

        {

            case ".docx":

                using (WordprocessingDocument wp1 = WordprocessingDocument.Open(file, true))

                {

                    List<OpenXmlPart> partList = GetAllParts(wp1);

                    PrintParts(partList);

                }

                break;

            case ".xlsx":

                using (SpreadsheetDocument s1 = SpreadsheetDocument.Open(file, true))

                {

                    List<OpenXmlPart> partList = GetAllParts(s1);

                    PrintParts(partList);

                }

                break;

            case ".pptx":

                using (PresentationDocument p1 = PresentationDocument.Open(file, true))

                {

                    List<OpenXmlPart> partList = GetAllParts(p1);

                    PrintParts(partList);

                }

                break;

        }

    }

}

 

FindAllParts.cs