Get all the text in a slide in a presentation

This topic shows how to use the classes in the Open XML SDK for Office to get all the text in a slide in a presentation programmatically.


Getting a PresentationDocument object

In the Open XML SDK, the PresentationDocument class represents a presentation document package. To work with a presentation document, first create an instance of the PresentationDocument class, and then work with that instance. To create the class instance from the document call the PresentationDocument.Open(String, Boolean) method that uses a file path, and a Boolean value as the second parameter to specify whether a document is editable. To open a document for read/write access, assign the value true to this parameter; for read-only access assign it the value false as shown in the following using statement. In this code, the file parameter is a string that represents the path for the file from which you want to open the document.

    // Open the presentation as read-only.
        using (PresentationDocument presentationDocument = PresentationDocument.Open(presentationFile, false))
    {
        // Insert other code here.
    }

The using statement provides a recommended alternative to the typical .Open, .Save, .Close sequence. It ensures that the Dispose method (internal method used by the Open XML SDK to clean up resources) is automatically called when the closing brace is reached. The block that follows the using statement establishes a scope for the object that is created or named in the using statement, in this case presentationDocument.


Basic Presentation Document Structure

The basic document structure of a PresentationML document consists of a number of parts, among which is the main part that contains the presentation definition. The following text from the ISO/IEC 29500 specification introduces the overall form of a PresentationML package.

The main part of a PresentationML package starts with a presentation root element. That element contains a presentation, which, in turn, refers to a slide list, a slide master list, a notes master list, and a handout master list. The slide list refers to all of the slides in the presentation; the slide master list refers to the entire slide masters used in the presentation; the notes master contains information about the formatting of notes pages; and the handout master describes how a handout looks.

A handout is a printed set of slides that can be provided to an audience.

As well as text and graphics, each slide can contain comments and notes, can have a layout, and can be part of one or more custom presentations. A comment is an annotation intended for the person maintaining the presentation slide deck. A note is a reminder or piece of text intended for the presenter or the audience.

Other features that a PresentationML document can include the following: animation, audio, video, and transitions between slides.

A PresentationML document is not stored as one large body in a single part. Instead, the elements that implement certain groupings of functionality are stored in separate parts. For example, all comments in a document are stored in one comment part while each slide has its own part.

© ISO/IEC29500: 2008.

The following XML code example represents a presentation that contains two slides denoted by the IDs 267 and 256.

    <p:presentation xmlns:p="…" … > 
       <p:sldMasterIdLst>
          <p:sldMasterId
             xmlns:rel="https://…/relationships" rel:id="rId1"/>
       </p:sldMasterIdLst>
       <p:notesMasterIdLst>
          <p:notesMasterId
             xmlns:rel="https://…/relationships" rel:id="rId4"/>
       </p:notesMasterIdLst>
       <p:handoutMasterIdLst>
          <p:handoutMasterId
             xmlns:rel="https://…/relationships" rel:id="rId5"/>
       </p:handoutMasterIdLst>
       <p:sldIdLst>
          <p:sldId id="267"
             xmlns:rel="https://…/relationships" rel:id="rId2"/>
          <p:sldId id="256"
             xmlns:rel="https://…/relationships" rel:id="rId3"/>
       </p:sldIdLst>
           <p:sldSz cx="9144000" cy="6858000"/>
       <p:notesSz cx="6858000" cy="9144000"/>
    </p:presentation>

Using the Open XML SDK, you can create document structure and content using strongly-typed classes that correspond to PresentationML elements. You can find these classes in the DocumentFormat.OpenXml.Presentation namespace. The following table lists the class names of the classes that correspond to the sld, sldLayout, sldMaster, and notesMaster elements.

PresentationML Element Open XML SDK Class Description
sld Slide Presentation Slide. It is the root element of SlidePart.
sldLayout SlideLayout Slide Layout. It is the root element of SlideLayoutPart.
sldMaster SlideMaster Slide Master. It is the root element of SlideMasterPart.
notesMaster NotesMaster Notes Master (or handoutMaster). It is the root element of NotesMasterPart.

How the Sample Code Works

The sample code consists of three overloads of the GetAllTextInSlide method. In the following segment, the first overloaded method opens the source presentation that contains the slide with text to get, and passes the presentation to the second overloaded method, which gets the slide part. This method returns the array of strings that the second method returns to it, each of which represents a paragraph of text in the specified slide.

    // Get all the text in a slide.
    public static string[] GetAllTextInSlide(string presentationFile, int slideIndex)
    {
        // Open the presentation as read-only.
        using (PresentationDocument presentationDocument = PresentationDocument.Open(presentationFile, false))
        {
            // Pass the presentation and the slide index
            // to the next GetAllTextInSlide method, and
            // then return the array of strings it returns. 
            return GetAllTextInSlide(presentationDocument, slideIndex);
        }
    }

The second overloaded method takes the presentation document passed in and gets a slide part to pass to the third overloaded method. It returns to the first overloaded method the array of strings that the third overloaded method returns to it, each of which represents a paragraph of text in the specified slide.

    public static string[] GetAllTextInSlide(PresentationDocument presentationDocument, int slideIndex)
    {
        // Verify that the presentation document exists.
        if (presentationDocument == null)
        {
            throw new ArgumentNullException("presentationDocument");
        }

        // Verify that the slide index is not out of range.
        if (slideIndex < 0)
        {
            throw new ArgumentOutOfRangeException("slideIndex");
        }

        // Get the presentation part of the presentation document.
        PresentationPart presentationPart = presentationDocument.PresentationPart;

        // Verify that the presentation part and presentation exist.
        if (presentationPart != null && presentationPart.Presentation != null)
        {
            // Get the Presentation object from the presentation part.
            Presentation presentation = presentationPart.Presentation;

            // Verify that the slide ID list exists.
            if (presentation.SlideIdList != null)
            {
                // Get the collection of slide IDs from the slide ID list.
                var slideIds = presentation.SlideIdList.ChildElements;

                // If the slide ID is in range...
                if (slideIndex < slideIds.Count)
                {
                    // Get the relationship ID of the slide.
                    string slidePartRelationshipId = (slideIds[slideIndex] as SlideId).RelationshipId;

                    // Get the specified slide part from the relationship ID.
                    SlidePart slidePart = (SlidePart)presentationPart.GetPartById(slidePartRelationshipId);

                    // Pass the slide part to the next method, and
                    // then return the array of strings that method
                    // returns to the previous method.
                    return GetAllTextInSlide(slidePart);
                }
            }
        }
        // Else, return null.
        return null;
    }

The following code segment shows the third overloaded method, which takes takes the slide part passed in, and returns to the second overloaded method a string array of text paragraphs. It starts by verifying that the slide part passed in exists, and then it creates a linked list of strings. It iterates through the paragraphs in the slide passed in, and using a StringBuilder object to concatenate all the lines of text in a paragraph, it assigns each paragraph to a string in the linked list. It then returns to the second overloaded method an array of strings that represents all the text in the specified slide in the presentation.

    public static string[] GetAllTextInSlide(SlidePart slidePart)
    {
        // Verify that the slide part exists.
        if (slidePart == null)
        {
            throw new ArgumentNullException("slidePart");
        }

        // Create a new linked list of strings.
        LinkedList<string> texts = new LinkedList<string>();

        // If the slide exists...
        if (slidePart.Slide != null)
        {
            // Iterate through all the paragraphs in the slide.
            foreach (var paragraph in slidePart.Slide.Descendants<DocumentFormat.OpenXml.Drawing.Paragraph>())
            {
                // Create a new string builder.                    
                StringBuilder paragraphText = new StringBuilder();

                // Iterate through the lines of the paragraph.
                foreach (var text in paragraph.Descendants<DocumentFormat.OpenXml.Drawing.Text>())
                {
                    // Append each line to the previous lines.
                    paragraphText.Append(text.Text);
                }

                if (paragraphText.Length > 0)
                {
                    // Add each paragraph to the linked list.
                    texts.AddLast(paragraphText.ToString());
                }
            }
        }

        if (texts.Count > 0)
        {
            // Return an array of strings.
            return texts.ToArray();
        }
        else
        {
            return null;
        }
    }

Sample Code

Following is the complete sample code that you can use to get all the text in a specific slide in a presentation file. For example, you can use the following foreach loop in your program to get the array of strings returned by the method GetAllTextInSlide, which represents the text in the second slide of the presentation file "Myppt8.pptx."

    foreach (string s in GetAllTextInSlide(@"C:\Users\Public\Documents\Myppt8.pptx", 1))
        Console.WriteLine(s);

Following is the complete sample code in both C# and Visual Basic.


using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Presentation;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

GetAllTextInSlide(args[0], int.Parse(args[1]));

// Get all the text in a slide.
static string[]? GetAllTextInSlide(string presentationFile, int slideIndex)
{
    // Open the presentation as read-only.
    using (PresentationDocument presentationDocument = PresentationDocument.Open(presentationFile, false))
    {
        // Pass the presentation and the slide index
        // to the next GetAllTextInSlide method, and
        // then return the array of strings it returns. 
        return GetAllTextInSlideFromPresentation(presentationDocument, slideIndex);
    }
}
static string[]? GetAllTextInSlideFromPresentation(PresentationDocument presentationDocument, int slideIndex)
{
    // Verify that the slide index is not out of range.
    if (slideIndex < 0)
    {
        throw new ArgumentOutOfRangeException("slideIndex");
    }

    // Get the presentation part of the presentation document.
    PresentationPart? presentationPart = presentationDocument.PresentationPart;

    // Verify that the presentation part and presentation exist.
    if (presentationPart is not null && presentationPart.Presentation is not null)
    {
        // Get the Presentation object from the presentation part.
        Presentation presentation = presentationPart.Presentation;

        // Verify that the slide ID list exists.
        if (presentation.SlideIdList is not null)
        {
            // Get the collection of slide IDs from the slide ID list.
            DocumentFormat.OpenXml.OpenXmlElementList slideIds = presentation.SlideIdList.ChildElements;

            // If the slide ID is in range...
            if (slideIndex < slideIds.Count)
            {
                // Get the relationship ID of the slide.
                string? slidePartRelationshipId = ((SlideId)slideIds[slideIndex]).RelationshipId;

                if (slidePartRelationshipId is null)
                {
                    return null;
                }

                // Get the specified slide part from the relationship ID.
                SlidePart slidePart = (SlidePart)presentationPart.GetPartById(slidePartRelationshipId);

                // Pass the slide part to the next method, and
                // then return the array of strings that method
                // returns to the previous method.
                return GetAllTextInSlideFromPart(slidePart);
            }
        }
    }

    // Else, return null.
    return null;
}
static string[] GetAllTextInSlideFromPart(SlidePart slidePart)
{
    // Verify that the slide part exists.
    if (slidePart is null)
    {
        throw new ArgumentNullException("slidePart");
    }

    // Create a new linked list of strings.
    LinkedList<string> texts = new LinkedList<string>();

    // If the slide exists...
    if (slidePart.Slide is not null)
    {
        // Iterate through all the paragraphs in the slide.
        foreach (DocumentFormat.OpenXml.Drawing.Paragraph paragraph in
            slidePart.Slide.Descendants<DocumentFormat.OpenXml.Drawing.Paragraph>())
        {
            // Create a new string builder.                    
            StringBuilder paragraphText = new StringBuilder();

            // Iterate through the lines of the paragraph.
            foreach (DocumentFormat.OpenXml.Drawing.Text text in
                paragraph.Descendants<DocumentFormat.OpenXml.Drawing.Text>())
            {
                // Append each line to the previous lines.
                paragraphText.Append(text.Text);
            }

            if (paragraphText.Length > 0)
            {
                // Add each paragraph to the linked list.
                texts.AddLast(paragraphText.ToString());
            }
        }
    }

    // Return an array of strings.
    return texts.ToArray();
}

See also