如何:获取演示文稿中的所有幻灯片内的所有文本

上次修改时间: 2010年10月14日

适用范围: Excel 2010 | Office 2010 | PowerPoint 2010 | Word 2010

本文内容
获取 PresentationDocument 对象
基本演示文稿文档结构
示例代码的工作方式
示例代码

本主题演示如何使用 Open XML SDK 2.0 for Microsoft Office 中的类通过编程方式获取演示文稿中所有幻灯片中的所有文本。

编译本主题中的代码需要使用以下程序集指令。

using System;
using System.Linq;
using System.Collections.Generic;
using DocumentFormat.OpenXml.Presentation;
using A = DocumentFormat.OpenXml.Drawing;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml;
using System.Text;
Imports System
Imports System.Linq
Imports System.Collections.Generic
Imports A = DocumentFormat.OpenXml.Drawing
Imports DocumentFormat.OpenXml.Presentation
Imports DocumentFormat.OpenXml.Packaging
Imports DocumentFormat.OpenXml
Imports System.Text

获取 PresentationDocument 对象

在 Open XML SDK 中,PresentationDocument 类表示演示文稿文档包。若要处理演示文稿文档,请首先创建 PresentationDocument 类的实例,然后使用该实例。若要从文档中创建类实例,请调用使用文件路径的 PresentationDocument.Open(String, Boolean) 方法,并以布尔值作为第二个参数来指定文档是否可编辑。若要打开文档进行读/写访问,请为此参数指定值 true;若要进行只读访问,请为其指定值 false,如以下的 using 语句所示。在该代码中,presentationFile 参数是一个字符串,表示要从中打开该文档的文件的路径。

// Open the presentation as read-only.
    using (PresentationDocument presentationDocument = PresentationDocument.Open(presentationFile, false))
{
    // Insert other code here.
}
Using presentationDocument As PresentationDocument = PresentationDocument.Open(presentationFile, False)
    ' Insert other code here.
End Using

using 语句提供典型 .Open, .Save, .Close 序列的建议备选序列。它确保在遇到右大括号时会自动调用 Dispose 方法(Open XML SDK 用来清理资源的内部方法)。using 语句后面的块为 using 语句中创建或指定的对象设定范围,在此示例中这个范围就是 presentationDocument。

基本演示文稿文档结构

PresentationML 文档的基本文档结构包含大量部件,在这些部件中,主部件是包含演示文稿定义的部件。ISO/IEC 29500(该链接可能指向英文页面) 规范中的以下文本介绍了 PresentationML 包的整体形式。

PresentationML 包的主部件以演示文稿根元素开头。该元素包含演示文稿,演示文稿又引用幻灯片 列表、幻灯片母版 列表、备注母版 列表和讲义母版 列表。幻灯片列表指的是演示文稿中的所有幻灯片;幻灯片母版列表指的是演示文稿中使用的全部幻灯片母版;备注母版包含有关备注页格式的信息;讲义母版描述讲义的外观。

讲义 是打印的一组幻灯片,可提供给访问群体 以供他们将来参考。

除了文本和图形,每个幻灯片还可以包含注释 和备注,可以具有布局,并且可以是一个或多个自定义演示文稿 的组成部分。(注释是供维护演示文稿幻灯片平台的人员参考的批注。备注是供演示者或访问群体参考的提醒信息或一段文字。)

PresentationML 文档可以包含的其他功能如下:动画、音频、视频 以及幻灯片之间的切换。

PresentationML 文档不会存储为单个部件中的一个大型正文。而实现某些功能组合的元素会存储在各个部件中。例如,文档中的所有注释都存储在一个注释部件中,而每个幻灯片都有自己的部件。

© ISO/IEC29500: 2008。

以下 XML 代码段代表包含用 ID 267 和 256 表示的两个幻灯片的演示文稿。

<p:presentation xmlns:p="…" … > 
   <p:sldMasterIdLst>
      <p:sldMasterId
         xmlns:rel="http://…/relationships" rel:id="rId1"/>
   </p:sldMasterIdLst>
   <p:notesMasterIdLst>
      <p:notesMasterId
         xmlns:rel="http://…/relationships" rel:id="rId4"/>
   </p:notesMasterIdLst>
   <p:handoutMasterIdLst>
      <p:handoutMasterId
         xmlns:rel="http://…/relationships" rel:id="rId5"/>
   </p:handoutMasterIdLst>
   <p:sldIdLst>
      <p:sldId id="267"
         xmlns:rel="http://…/relationships" rel:id="rId2"/>
      <p:sldId id="256"
         xmlns:rel="http://…/relationships" rel:id="rId3"/>
   </p:sldIdLst>
       <p:sldSz cx="9144000" cy="6858000"/>
   <p:notesSz cx="6858000" cy="9144000"/>
</p:presentation>

通过使用 Open XML SDK 2.0,您可以利用 PresentationML 元素所对应的强类型类创建文档结构和内容。可以在 DocumentFormat.OpenXml.Presentation 命名空间中找到这些类。下表列出了 sld、sldLayout、sldMaster 和 notesMaster 元素所对应类的类名称。

PresentationML 元素

Open XML SDK 2.0 类

说明

sld

Slide

演示文稿幻灯片。它是 SlidePart 的根元素。

sldLayout

SlideLayout

幻灯片版式。它是 SlideLayoutPart 的根元素。

sldMaster

SlideMaster

幻灯片母版。它是 SlideMasterPart 的根元素。

notesMaster

NotesMaster

备注母版(或讲义母版)。它是 NotesMasterPart 的根元素。

示例代码的工作方式

通过使用方法 CountSlides 的两个重载,该示例代码首先计算演示文稿中的幻灯片数量。第一个重载使用字符串参数,第二个重载使用 PresentationDocument 参数。在第一个 CountSlides 方法中,示例代码打开 using 语句中的演示文稿文档。然后它将 PresentationDocument 对象传递给第二个 CountSlides 方法,该方法返回一个整数,代表演示文稿中幻灯片的数目。

// Pass the presentation to the next CountSlides method
// and return the slide count.
return CountSlides(presentationDocument);
' Pass the presentation to the next CountSlides method
' and return the slide count.
Return CountSlides(presentationDocument)

在第二个 CountSlides 方法中,代码验证传入的 PresentationDocument 对象是否为 null,如果不为 null,则从 PresentationDocument 对象中获取 PresentationPart 对象。通过使用属于 SlideParts 的 Count 方法,代码获取并返回 slidesCount。

// Check for a null document object.
if (presentationDocument == null)
{
    throw new ArgumentNullException("presentationDocument");
}

int slidesCount = 0;

// Get the presentation part of document.
PresentationPart presentationPart = presentationDocument.PresentationPart;

// Get the slide count from the SlideParts.
if (presentationPart != null)
{
    slidesCount = presentationPart.SlideParts.Count();
}
// Return the slide count to the previous method.
return slidesCount;
' Check for a null document object.
If presentationDocument Is Nothing Then
    Throw New ArgumentNullException("presentationDocument")
End If

Dim slidesCount As Integer = 0

' Get the presentation part of document.
Dim presentationPart As PresentationPart = presentationDocument.PresentationPart

' Get the slide count from the SlideParts.
If presentationPart IsNot Nothing Then
    slidesCount = presentationPart.SlideParts.Count()
End If
' Return the slide count to the previous method.
Return slidesCount

在计算幻灯片数量后,该代码使用方法 GetSlideIdAndText 获取所有幻灯片的内容。它首先获取第一张幻灯片的关系 ID,然后从该关系 ID 中获取幻灯片部件。

// Get the relationship ID of the first slide.
PresentationPart part = ppt.PresentationPart;
OpenXmlElementList slideIds = part.Presentation.SlideIdList.ChildElements;

string relId = (slideIds[index] as SlideId).RelationshipId;

// Get the slide part from the relationship ID.
SlidePart slide = (SlidePart) part.GetPartById(relId);
' Get the relationship ID of the first slide.
Dim part As PresentationPart = ppt.PresentationPart
Dim slideIds As OpenXmlElementList = part.Presentation.SlideIdList.ChildElements

Dim relId As String = TryCast(slideIds(index), SlideId).RelationshipId

' Get the slide part from the relationship ID.
Dim slide As SlidePart = DirectCast(part.GetPartById(relId), SlidePart)

然后,该代码声明一个 StringBuilder 对象来存储幻灯片的内部文本。接下来,代码循环访问所有幻灯片,并在 StringBuilder 对象中附加每个幻灯片中的文本。

// Build a StringBuilder object.
StringBuilder paragraphText = new StringBuilder();

// Get the inner text of the slide:
IEnumerable<A.Text> texts = slide.Slide.Descendants<A.Text>();
foreach (A.Text text in texts)
{
    paragraphText.Append(text.Text);
}
sldText = paragraphText.ToString();
' Build a StringBuilder object.
Dim paragraphText As New StringBuilder()

' Get the inner text of the slide:
Dim texts As IEnumerable(Of A.Text) = slide.Slide.Descendants(Of A.Text)()
For Each text As A.Text In texts
    paragraphText.Append(text.Text)
Next
sldText = paragraphText.ToString()

示例代码

下面的代码获取特定演示文稿文件中所有幻灯片中的全部文本。例如,可从键盘输入演示文稿文件的名称,然后在程序中使用 foreach 循环获取方法 GetSlideIdAndText 返回的字符串数组,如以下示例所示。

Console.Write("Please enter a presentation file name without extension: ");
string fileName = Console.ReadLine();
string file = @"C:\Users\Public\Documents\" + fileName + ".pptx";
int numberOfSlides = CountSlides(file);
System.Console.WriteLine("Number of slides = {0}", numberOfSlides);
string slideText;
for (int i = 0; i < numberOfSlides; i++)
{
    GetSlideIdAndText(out slideText, file, i);
    System.Console.WriteLine("Slide #{0} contains: {1}", i + 1, slideText);
}
System.Console.ReadKey();
Console.Write("Please enter a presentation file name without extension: ")
Dim fileName As String = System.Console.ReadLine()
Dim file As String = "C:\Users\Public\Documents\" + fileName + ".pptx"
Dim numberOfSlides As Integer = CountSlides(file)
System.Console.WriteLine("Number of slides = {0}", numberOfSlides)
Dim slideText As String = Nothing
For i As Integer = 0 To numberOfSlides - 1
    GetSlideIdAndText(slideText, file, i)
    System.Console.WriteLine("Slide #{0} contains: {1}", i + 1, slideText)
Next
System.Console.ReadKey()

以下是使用 C# 和 Visual Basic 编写的完整示例代码。

public static int CountSlides(string presentationFile)
{
    // Open the presentation as read-only.
    using (PresentationDocument presentationDocument = PresentationDocument.Open(presentationFile, false))
    {
        // Pass the presentation to the next CountSlides method
        // and return the slide count.
        return CountSlides(presentationDocument);
    }
}

// Count the slides in the presentation.
public static int CountSlides(PresentationDocument presentationDocument)
{
    // Check for a null document object.
    if (presentationDocument == null)
    {
        throw new ArgumentNullException("presentationDocument");
    }

    int slidesCount = 0;

    // Get the presentation part of document.
    PresentationPart presentationPart = presentationDocument.PresentationPart;
    // Get the slide count from the SlideParts.
    if (presentationPart != null)
    {
        slidesCount = presentationPart.SlideParts.Count();
    }
    // Return the slide count to the previous method.
    return slidesCount;
}

public static void GetSlideIdAndText(out string sldText, string docName, int index)
{
    using (PresentationDocument ppt = PresentationDocument.Open(docName, false))
    {
        // Get the relationship ID of the first slide.
        PresentationPart part = ppt.PresentationPart;
        OpenXmlElementList slideIds = part.Presentation.SlideIdList.ChildElements;

        string relId = (slideIds[index] as SlideId).RelationshipId;

        // Get the slide part from the relationship ID.
        SlidePart slide = (SlidePart) part.GetPartById(relId);

        // Build a StringBuilder object.
        StringBuilder paragraphText = new StringBuilder();

        // Get the inner text of the slide:
        IEnumerable<A.Text> texts = slide.Slide.Descendants<A.Text>();
        foreach (A.Text text in texts)
        {
            paragraphText.Append(text.Text);
        }
        sldText = paragraphText.ToString();
    }              
}
Public Function CountSlides(ByVal presentationFile As String) As Integer
    ' Open the presentation as read-only.
    Using presentationDocument__1 As PresentationDocument = PresentationDocument.Open(presentationFile, False)
        ' Pass the presentation to the next CountSlides method
        ' and return the slide count.
        Return CountSlides(presentationDocument__1)
    End Using
End Function

' Count the slides in the presentation.
Public Function CountSlides(ByVal presentationDocument As PresentationDocument) As Integer
    ' Check for a null document object.
    If presentationDocument Is Nothing Then
        Throw New ArgumentNullException("presentationDocument")
    End If

    Dim slidesCount As Integer = 0

    ' Get the presentation part of document.
    Dim presentationPart As PresentationPart = presentationDocument.PresentationPart
    ' Get the slide count from the SlideParts.
    If presentationPart IsNot Nothing Then
        slidesCount = presentationPart.SlideParts.Count()
    End If
    ' Return the slide count to the previous method.
    Return slidesCount
End Function

Public Sub GetSlideIdAndText(ByRef sldText As String, ByVal docName As String, ByVal index As Integer)
    Using ppt As PresentationDocument = PresentationDocument.Open(docName, False)
        ' Get the relationship ID of the first slide.
        Dim part As PresentationPart = ppt.PresentationPart
        Dim slideIds As OpenXmlElementList = part.Presentation.SlideIdList.ChildElements

        Dim relId As String = TryCast(slideIds(index), SlideId).RelationshipId

        ' Get the slide part from the relationship ID.
        Dim slide As SlidePart = DirectCast(part.GetPartById(relId), SlidePart)
        ' Build a StringBuilder object.
        Dim paragraphText As New StringBuilder()

        ' Get the inner text of the slide:
        Dim texts As IEnumerable(Of A.Text) = slide.Slide.Descendants(Of A.Text)()
        For Each text As A.Text In texts
            paragraphText.Append(text.Text)
        Next
        sldText = paragraphText.ToString()
    End Using
End Sub

请参阅

引用

Class Library Reference