如何:获取演示文稿中的幻灯片内的所有文本

上次修改时间: 2010年10月14日

适用范围: Excel 2010 | Office 2010 | PowerPoint 2010 | Word 2010

本文内容
获取 PresentationDocument 对象
基本演示文稿文档结构
示例代码的工作方式
示例代码

本主题演示如何使用 Open XML SDK 2.0 for Microsoft Office 中的类以编程方式获取演示文稿幻灯片中的所有文本。

编译本主题中的代码需要使用以下程序集指令。

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using DocumentFormat.OpenXml.Presentation;
using DocumentFormat.OpenXml.Packaging;
Imports System
Imports System.Collections.Generic
Imports System.Linq
Imports System.Text
Imports DocumentFormat.OpenXml.Presentation
Imports DocumentFormat.OpenXml.Packaging

获取 PresentationDocument 对象

在 Open XML SDK 中,PresentationDocument 类表示演示文稿文档包。若要处理演示文稿文档,请首先创建 PresentationDocument 类的实例,然后处理该实例。若要从文档中创建类实例,请调用使用文件路径的 PresentationDocument.Open(String, Boolean) 方法,并以一个布尔值作为第二个参数来指定文档是否可编辑。若要打开文档进行读/写访问,请为此参数指定值 true;若要进行只读访问,请为其指定值 false,如以下的 using 语句所示。在该代码中,file 参数是一个字符串,表示要从中打开文档的文件路径。

// Open the presentation as read-only.
    using (PresentationDocument presentationDocument = PresentationDocument.Open(presentationFile, false))
{
    // Insert other code here.
}
Using presentationDocument As PresentationDocument = PresentationDocument.Open(file, False)
    ' Insert other code here.
End Using

using 语句提供典型 .Open, .Save, .Close 序列的建议备选序列。它确保在遇到右大括号时会自动调用 Dispose 方法(Open XML SDK 用来清理资源的内部方法)。using 语句后面的块为 using 语句中创建或指定的对象设定范围,在此示例中这个范围就是 presentationDocument。

基本演示文稿文档结构

PresentationML 文档的基本文档结构由包含演示文稿定义的主部件组成。ISO/IEC 29500(该链接可能指向英文页面) 规范中的以下文本介绍了 PresentationML 包的整体形式。

PresentationML 包的主部件以演示文稿根元素开头。该元素包含演示文稿,演示文稿又引用幻灯片 列表、幻灯片母版 列表、备注母版 列表和讲义母版 列表。幻灯片列表指的是演示文稿中的所有幻灯片;幻灯片母版列表指的是演示文稿中使用的全部幻灯片母版;备注母版包含有关备注页格式的信息;讲义母版描述讲义的外观。

讲义 是打印的一组幻灯片,可提供给访问群体 以供他们将来参考。

除了文本和图形,每个幻灯片还可以包含注释 和备注,可以具有布局,并且可以是一个或多个自定义演示文稿 的组成部分。(注释是供维护演示文稿幻灯片平台的人员参考的批注。备注是供演示者或访问群体参考的提醒信息或一段文字。)

PresentationML 文档可以包含的其他功能如下:动画、音频、视频 以及幻灯片之间的 切换。

PresentationML 文档不会存储为单个部件中的一个大型正文。而实现某些功能组合的元素会存储在各个部件中。例如,文档中的所有注释都存储在一个注释部件中,而每个幻灯片都有自己的部件。

© ISO/IEC29500: 2008。

以下 XML 代码段表示包含用 ID 267 和 256 表示的两个幻灯片的演示文稿。

<p:presentation xmlns:p="…" … > 
   <p:sldMasterIdLst>
      <p:sldMasterId
         xmlns:rel="http://…/relationships" rel:id="rId1"/>
   </p:sldMasterIdLst>
   <p:notesMasterIdLst>
      <p:notesMasterId
         xmlns:rel="http://…/relationships" rel:id="rId4"/>
   </p:notesMasterIdLst>
   <p:handoutMasterIdLst>
      <p:handoutMasterId
         xmlns:rel="http://…/relationships" rel:id="rId5"/>
   </p:handoutMasterIdLst>
   <p:sldIdLst>
      <p:sldId id="267"
         xmlns:rel="http://…/relationships" rel:id="rId2"/>
      <p:sldId id="256"
         xmlns:rel="http://…/relationships" rel:id="rId3"/>
   </p:sldIdLst>
       <p:sldSz cx="9144000" cy="6858000"/>
   <p:notesSz cx="6858000" cy="9144000"/>
</p:presentation>

通过使用 Open XML SDK 2.0,您可以利用 PresentationML 元素所对应的强类型类创建文档结构和内容。可以在 DocumentFormat.OpenXml.Presentation 命名空间中找到这些类。下表列出了 sld、sldLayout、sldMaster 和 notesMaster 元素所对应类的类名称。

PresentationML 元素

Open XML SDK 2.0 类

说明

sld

Slide

演示文稿幻灯片。它是 SlidePart 的根元素。

sldLayout

SlideLayout

幻灯片版式。它是 SlideLayoutPart 的根元素。

sldMaster

SlideMaster

幻灯片母版。它是 SlideMasterPart 的根元素。

notesMaster

NotesMaster

备注母版(或讲义母版)。它是 NotesMasterPart 的根元素。

示例代码的工作方式

示例代码由 GetAllTextInSlide 方法的三个重载组成。在下面的代码段中,第一个重载的方法打开包含具有要获取文本的幻灯片的源演示文稿,并将此演示文稿传递给用于获取幻灯片部件的第二个重载的方法。此方法返回第二个方法返回给它的字符串数组,其中每个字符串表示指定幻灯片中的一个文本段落。

// Get all the text in a slide.
public static string[] GetAllTextInSlide(string presentationFile, int slideIndex)
{
    // Open the presentation as read-only.
    using (PresentationDocument presentationDocument = PresentationDocument.Open(presentationFile, false))
    {
        // Pass the presentation and the slide index
        // to the next GetAllTextInSlide method, and
        // then return the array of strings it returns. 
        return GetAllTextInSlide(presentationDocument, slideIndex);
    }
}
' Get all the text in a slide.
Public Shared Function GetAllTextInSlide(ByVal presentationFile As String, ByVal slideIndex As Integer) As String()
    ' Open the presentation as read-only.
    Using presentationDocument As PresentationDocument = PresentationDocument.Open(presentationFile, False)
        ' Pass the presentation and the slide index
        ' to the next GetAllTextInSlide method, and
        ' then return the array of strings it returns. 
        Return GetAllTextInSlide(presentationDocument, slideIndex)
    End Using
End Function

第二个重载的方法采用传入的演示文稿文档并获取幻灯片部件以传递给第三个重载的方法。此方法向第一个重载的方法返回第三个重载的方法返回给它的字符串数组,其中每个字符串表示指定幻灯片中的一个文本段落。

public static string[] GetAllTextInSlide(PresentationDocument presentationDocument, int slideIndex)
{
    // Verify that the presentation document exists.
    if (presentationDocument == null)
    {
        throw new ArgumentNullException("presentationDocument");
    }

    // Verify that the slide index is not out of range.
    if (slideIndex < 0)
    {
        throw new ArgumentOutOfRangeException("slideIndex");
    }

    // Get the presentation part of the presentation document.
    PresentationPart presentationPart = presentationDocument.PresentationPart;

    // Verify that the presentation part and presentation exist.
    if (presentationPart != null && presentationPart.Presentation != null)
    {
        // Get the Presentation object from the presentation part.
        Presentation presentation = presentationPart.Presentation;

        // Verify that the slide ID list exists.
        if (presentation.SlideIdList != null)
        {
            // Get the collection of slide IDs from the slide ID list.
            var slideIds = presentation.SlideIdList.ChildElements;

            // If the slide ID is in range...
            if (slideIndex < slideIds.Count)
            {
                // Get the relationship ID of the slide.
                string slidePartRelationshipId = (slideIds[slideIndex] as SlideId).RelationshipId;

                // Get the specified slide part from the relationship ID.
                SlidePart slidePart = (SlidePart)presentationPart.GetPartById(slidePartRelationshipId);

                // Pass the slide part to the next method, and
                // then return the array of strings that method
                // returns to the previous method.
                return GetAllTextInSlide(slidePart);
            }
        }
    }
    // Else, return null.
    return null;
}
Public Shared Function GetAllTextInSlide(ByVal presentationDocument As PresentationDocument, ByVal slideIndex As Integer) As String()
    ' Verify that the presentation document exists.
    If presentationDocument Is Nothing Then
        Throw New ArgumentNullException("presentationDocument")
    End If

    ' Verify that the slide index is not out of range.
    If slideIndex < 0 Then
        Throw New ArgumentOutOfRangeException("slideIndex")
    End If

    ' Get the presentation part of the presentation document.
    Dim presentationPart As PresentationPart = presentationDocument.PresentationPart

    ' Verify that the presentation part and presentation exist.
    If presentationPart IsNot Nothing AndAlso presentationPart.Presentation IsNot Nothing Then
        ' Get the Presentation object from the presentation part.
        Dim presentation As Presentation = presentationPart.Presentation

        ' Verify that the slide ID list exists.
        If presentation.SlideIdList IsNot Nothing Then
            ' Get the collection of slide IDs from the slide ID list.
            Dim slideIds = presentation.SlideIdList.ChildElements

            ' If the slide ID is in range...
            If slideIndex < slideIds.Count Then
                ' Get the relationship ID of the slide.
                Dim slidePartRelationshipId As String = (TryCast(slideIds(slideIndex), SlideId)).RelationshipId

                ' Get the specified slide part from the relationship ID.
                Dim slidePart As SlidePart = CType(presentationPart.GetPartById(slidePartRelationshipId), SlidePart)

                ' Pass the slide part to the next method, and
                ' then return the array of strings that method
                ' returns to the previous method.
                Return GetAllTextInSlide(slidePart)
            End If
        End If
    End If

    ' Else, return null.
    Return Nothing
End Function

以下代码段显示第三个重载的方法,它采用传入的幻灯片部件,并向第二个重载的方法返回文本段落的字符串数组。它首先验证传入的幻灯片部件是否存在,然后创建字符串的链接列表。它在传入的幻灯片中循环访问各段落,并使用 StringBuilder 对象将一个段落中的所有文本行连接在一起,然后将每个段落分配给链接列表中的一个字符串。随后它向第二个重载的方法返回表示演示文稿中指定幻灯片内的所有文本的字符串数组。

public static string[] GetAllTextInSlide(SlidePart slidePart)
{
    // Verify that the slide part exists.
    if (slidePart == null)
    {
        throw new ArgumentNullException("slidePart");
    }

    // Create a new linked list of strings.
    LinkedList<string> texts = new LinkedList<string>();

    // If the slide exists...
    if (slidePart.Slide != null)
    {
        // Iterate through all the paragraphs in the slide.
        foreach (var paragraph in slidePart.Slide.Descendants<DocumentFormat.OpenXml.Drawing.Paragraph>())
        {
            // Create a new string builder.                    
            StringBuilder paragraphText = new StringBuilder();

            // Iterate through the lines of the paragraph.
            foreach (var text in paragraph.Descendants<DocumentFormat.OpenXml.Drawing.Text>())
            {
                // Append each line to the previous lines.
                paragraphText.Append(text.Text);
            }

            if (paragraphText.Length > 0)
            {
                // Add each paragraph to the linked list.
                texts.AddLast(paragraphText.ToString());
            }
        }
    }

    if (texts.Count > 0)
    {
        // Return an array of strings.
        return texts.ToArray();
    }
    else
    {
        return null;
    }
}
Public Shared Function GetAllTextInSlide(ByVal slidePart As SlidePart) As String()
    ' Verify that the slide part exists.
    If slidePart Is Nothing Then
        Throw New ArgumentNullException("slidePart")
    End If

    ' Create a new linked list of strings.
    Dim texts As New LinkedList(Of String)()

    ' If the slide exists...
    If slidePart.Slide IsNot Nothing Then
        ' Iterate through all the paragraphs in the slide.
        For Each paragraph In slidePart.Slide.Descendants(Of DocumentFormat.OpenXml.Drawing.Paragraph)()
            ' Create a new string builder.                    
            Dim paragraphText As New StringBuilder()

            ' Iterate through the lines of the paragraph.
            For Each text In paragraph.Descendants(Of DocumentFormat.OpenXml.Drawing.Text)()
                ' Append each line to the previous lines.
                paragraphText.Append(text.Text)
            Next text

            If paragraphText.Length > 0 Then
                ' Add each paragraph to the linked list.
                texts.AddLast(paragraphText.ToString())
            End If
        Next paragraph
    End If

    If texts.Count > 0 Then
        ' Return an array of strings.
        Return texts.ToArray()
    Else
        Return Nothing
    End If
End Function

示例代码

以下是可用于获取演示文稿文件中特定幻灯片内的所有文本的完整示例代码。例如,您可以在程序中使用以下 foreach 循环获取由 GetAllTextInSlide 方法返回的字符串数组,它表示演示文稿文件"Myppt8.pptx"的第二个幻灯片中的文本。

foreach (string s in GetAllTextInSlide(@"C:\Users\Public\Documents\Myppt8.pptx", 1))
    Console.WriteLine(s);
For Each s As String In GetAllTextInSlide("C:\Users\Public\Documents\Myppt8.pptx", 1)
    Console.WriteLine(s)
Next

以下是使用 C# 和 Visual Basic 编写的完整示例代码。

// Get all the text in a slide.
public static string[] GetAllTextInSlide(string presentationFile, int slideIndex)
{
    // Open the presentation as read-only.
    using (PresentationDocument presentationDocument = PresentationDocument.Open(presentationFile, false))
    {
        // Pass the presentation and the slide index
        // to the next GetAllTextInSlide method, and
        // then return the array of strings it returns. 
        return GetAllTextInSlide(presentationDocument, slideIndex);
    }
}
public static string[] GetAllTextInSlide(PresentationDocument presentationDocument, int slideIndex)
{
    // Verify that the presentation document exists.
    if (presentationDocument == null)
    {
        throw new ArgumentNullException("presentationDocument");
    }

    // Verify that the slide index is not out of range.
    if (slideIndex < 0)
    {
        throw new ArgumentOutOfRangeException("slideIndex");
    }

    // Get the presentation part of the presentation document.
    PresentationPart presentationPart = presentationDocument.PresentationPart;

    // Verify that the presentation part and presentation exist.
    if (presentationPart != null && presentationPart.Presentation != null)
    {
        // Get the Presentation object from the presentation part.
        Presentation presentation = presentationPart.Presentation;

        // Verify that the slide ID list exists.
        if (presentation.SlideIdList != null)
        {
            // Get the collection of slide IDs from the slide ID list.
            DocumentFormat.OpenXml.OpenXmlElementList slideIds = 
                presentation.SlideIdList.ChildElements;

            // If the slide ID is in range...
            if (slideIndex < slideIds.Count)
            {
                // Get the relationship ID of the slide.
                string slidePartRelationshipId = (slideIds[slideIndex] as SlideId).RelationshipId;

                // Get the specified slide part from the relationship ID.
                SlidePart slidePart = 
                    (SlidePart)presentationPart.GetPartById(slidePartRelationshipId);

                // Pass the slide part to the next method, and
                // then return the array of strings that method
                // returns to the previous method.
                return GetAllTextInSlide(slidePart);
            }
        }
    }

    // Else, return null.
    return null;
}
public static string[] GetAllTextInSlide(SlidePart slidePart)
{
    // Verify that the slide part exists.
    if (slidePart == null)
    {
        throw new ArgumentNullException("slidePart");
    }

    // Create a new linked list of strings.
    LinkedList<string> texts = new LinkedList<string>();

    // If the slide exists...
    if (slidePart.Slide != null)
    {
        // Iterate through all the paragraphs in the slide.
        foreach (DocumentFormat.OpenXml.Drawing.Paragraph paragraph in 
            slidePart.Slide.Descendants<DocumentFormat.OpenXml.Drawing.Paragraph>())
        {
            // Create a new string builder.                    
            StringBuilder paragraphText = new StringBuilder();

            // Iterate through the lines of the paragraph.
            foreach (DocumentFormat.OpenXml.Drawing.Text text in 
                paragraph.Descendants<DocumentFormat.OpenXml.Drawing.Text>())
            {
                // Append each line to the previous lines.
                paragraphText.Append(text.Text);
            }

            if (paragraphText.Length > 0)
            {
                // Add each paragraph to the linked list.
                texts.AddLast(paragraphText.ToString());
            }
        }
    }

    if (texts.Count > 0)
    {
        // Return an array of strings.
        return texts.ToArray();
    }
    else
    {
        return null;
    }
}
' Get all the text in a slide.
Public Function GetAllTextInSlide(ByVal presentationFile As String, ByVal slideIndex As Integer) As String()
    ' Open the presentation as read-only.
    Using presentationDocument As PresentationDocument = presentationDocument.Open(presentationFile, False)
        ' Pass the presentation and the slide index
        ' to the next GetAllTextInSlide method, and
        ' then return the array of strings it returns. 
        Return GetAllTextInSlide(presentationDocument, slideIndex)
    End Using
End Function
Public Function GetAllTextInSlide(ByVal presentationDocument As PresentationDocument, ByVal slideIndex As Integer) As String()
    ' Verify that the presentation document exists.
    If presentationDocument Is Nothing Then
        Throw New ArgumentNullException("presentationDocument")
    End If

    ' Verify that the slide index is not out of range.
    If slideIndex < 0 Then
        Throw New ArgumentOutOfRangeException("slideIndex")
    End If

    ' Get the presentation part of the presentation document.
    Dim presentationPart As PresentationPart = presentationDocument.PresentationPart

    ' Verify that the presentation part and presentation exist.
    If presentationPart IsNot Nothing AndAlso presentationPart.Presentation IsNot Nothing Then
        ' Get the Presentation object from the presentation part.
        Dim presentation As Presentation = presentationPart.Presentation

        ' Verify that the slide ID list exists.
        If presentation.SlideIdList IsNot Nothing Then
            ' Get the collection of slide IDs from the slide ID list.
            Dim slideIds = presentation.SlideIdList.ChildElements

            ' If the slide ID is in range...
            If slideIndex < slideIds.Count Then
                ' Get the relationship ID of the slide.
                Dim slidePartRelationshipId As String = (TryCast(slideIds(slideIndex), SlideId)).RelationshipId

                ' Get the specified slide part from the relationship ID.
                Dim slidePart As SlidePart = CType(presentationPart.GetPartById(slidePartRelationshipId), SlidePart)

                ' Pass the slide part to the next method, and
                ' then return the array of strings that method
                ' returns to the previous method.
                Return GetAllTextInSlide(slidePart)
            End If
        End If
    End If

    ' Else, return null.
    Return Nothing
End Function
Public Function GetAllTextInSlide(ByVal slidePart As SlidePart) As String()
    ' Verify that the slide part exists.
    If slidePart Is Nothing Then
        Throw New ArgumentNullException("slidePart")
    End If

    ' Create a new linked list of strings.
    Dim texts As New LinkedList(Of String)()

    ' If the slide exists...
    If slidePart.Slide IsNot Nothing Then
        ' Iterate through all the paragraphs in the slide.
        For Each paragraph In slidePart.Slide.Descendants(Of DocumentFormat.OpenXml.Drawing.Paragraph)()
            ' Create a new string builder.                    
            Dim paragraphText As New StringBuilder()

            ' Iterate through the lines of the paragraph.
            For Each Text In paragraph.Descendants(Of DocumentFormat.OpenXml.Drawing.Text)()
                ' Append each line to the previous lines.
                paragraphText.Append(Text.Text)
            Next Text

            If paragraphText.Length > 0 Then
                ' Add each paragraph to the linked list.
                texts.AddLast(paragraphText.ToString())
            End If
        Next paragraph
    End If

    If texts.Count > 0 Then
        ' Return an array of strings.
        Return texts.ToArray()
    Else
        Return Nothing
    End If
End Function

请参阅

引用

Class Library Reference