如何:获取演示文稿中的所有幻灯片内的所有文本
上次修改时间: 2010年10月14日
适用范围: Excel 2010 | Office 2010 | PowerPoint 2010 | Word 2010
本文内容
获取 PresentationDocument 对象
基本演示文稿文档结构
示例代码的工作方式
示例代码
本主题演示如何使用 Open XML SDK 2.0 for Microsoft Office 中的类通过编程方式获取演示文稿中所有幻灯片中的所有文本。
编译本主题中的代码需要使用以下程序集指令。
using System;
using System.Linq;
using System.Collections.Generic;
using DocumentFormat.OpenXml.Presentation;
using A = DocumentFormat.OpenXml.Drawing;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml;
using System.Text;
Imports System
Imports System.Linq
Imports System.Collections.Generic
Imports A = DocumentFormat.OpenXml.Drawing
Imports DocumentFormat.OpenXml.Presentation
Imports DocumentFormat.OpenXml.Packaging
Imports DocumentFormat.OpenXml
Imports System.Text
获取 PresentationDocument 对象
在 Open XML SDK 中,PresentationDocument 类表示演示文稿文档包。若要处理演示文稿文档,请首先创建 PresentationDocument 类的实例,然后使用该实例。若要从文档中创建类实例,请调用使用文件路径的 PresentationDocument.Open(String, Boolean) 方法,并以布尔值作为第二个参数来指定文档是否可编辑。若要打开文档进行读/写访问,请为此参数指定值 true;若要进行只读访问,请为其指定值 false,如以下的 using 语句所示。在该代码中,presentationFile 参数是一个字符串,表示要从中打开该文档的文件的路径。
// Open the presentation as read-only.
using (PresentationDocument presentationDocument = PresentationDocument.Open(presentationFile, false))
{
// Insert other code here.
}
Using presentationDocument As PresentationDocument = PresentationDocument.Open(presentationFile, False)
' Insert other code here.
End Using
using 语句提供典型 .Open, .Save, .Close 序列的建议备选序列。它确保在遇到右大括号时会自动调用 Dispose 方法(Open XML SDK 用来清理资源的内部方法)。using 语句后面的块为 using 语句中创建或指定的对象设定范围,在此示例中这个范围就是 presentationDocument。
基本演示文稿文档结构
PresentationML 文档的基本文档结构包含大量部件,在这些部件中,主部件是包含演示文稿定义的部件。ISO/IEC 29500(该链接可能指向英文页面) 规范中的以下文本介绍了 PresentationML 包的整体形式。
PresentationML 包的主部件以演示文稿根元素开头。该元素包含演示文稿,演示文稿又引用幻灯片 列表、幻灯片母版 列表、备注母版 列表和讲义母版 列表。幻灯片列表指的是演示文稿中的所有幻灯片;幻灯片母版列表指的是演示文稿中使用的全部幻灯片母版;备注母版包含有关备注页格式的信息;讲义母版描述讲义的外观。
讲义 是打印的一组幻灯片,可提供给访问群体 以供他们将来参考。
除了文本和图形,每个幻灯片还可以包含注释 和备注,可以具有布局,并且可以是一个或多个自定义演示文稿 的组成部分。(注释是供维护演示文稿幻灯片平台的人员参考的批注。备注是供演示者或访问群体参考的提醒信息或一段文字。)
PresentationML 文档可以包含的其他功能如下:动画、音频、视频 以及幻灯片之间的切换。
PresentationML 文档不会存储为单个部件中的一个大型正文。而实现某些功能组合的元素会存储在各个部件中。例如,文档中的所有注释都存储在一个注释部件中,而每个幻灯片都有自己的部件。
© ISO/IEC29500: 2008。
以下 XML 代码段代表包含用 ID 267 和 256 表示的两个幻灯片的演示文稿。
<p:presentation xmlns:p="…" … >
<p:sldMasterIdLst>
<p:sldMasterId
xmlns:rel="http://…/relationships" rel:id="rId1"/>
</p:sldMasterIdLst>
<p:notesMasterIdLst>
<p:notesMasterId
xmlns:rel="http://…/relationships" rel:id="rId4"/>
</p:notesMasterIdLst>
<p:handoutMasterIdLst>
<p:handoutMasterId
xmlns:rel="http://…/relationships" rel:id="rId5"/>
</p:handoutMasterIdLst>
<p:sldIdLst>
<p:sldId id="267"
xmlns:rel="http://…/relationships" rel:id="rId2"/>
<p:sldId id="256"
xmlns:rel="http://…/relationships" rel:id="rId3"/>
</p:sldIdLst>
<p:sldSz cx="9144000" cy="6858000"/>
<p:notesSz cx="6858000" cy="9144000"/>
</p:presentation>
通过使用 Open XML SDK 2.0,您可以利用 PresentationML 元素所对应的强类型类创建文档结构和内容。可以在 DocumentFormat.OpenXml.Presentation 命名空间中找到这些类。下表列出了 sld、sldLayout、sldMaster 和 notesMaster 元素所对应类的类名称。
PresentationML 元素 |
Open XML SDK 2.0 类 |
说明 |
---|---|---|
sld |
演示文稿幻灯片。它是 SlidePart 的根元素。 |
|
sldLayout |
幻灯片版式。它是 SlideLayoutPart 的根元素。 |
|
sldMaster |
幻灯片母版。它是 SlideMasterPart 的根元素。 |
|
notesMaster |
备注母版(或讲义母版)。它是 NotesMasterPart 的根元素。 |
示例代码的工作方式
通过使用方法 CountSlides 的两个重载,该示例代码首先计算演示文稿中的幻灯片数量。第一个重载使用字符串参数,第二个重载使用 PresentationDocument 参数。在第一个 CountSlides 方法中,示例代码打开 using 语句中的演示文稿文档。然后它将 PresentationDocument 对象传递给第二个 CountSlides 方法,该方法返回一个整数,代表演示文稿中幻灯片的数目。
// Pass the presentation to the next CountSlides method
// and return the slide count.
return CountSlides(presentationDocument);
' Pass the presentation to the next CountSlides method
' and return the slide count.
Return CountSlides(presentationDocument)
在第二个 CountSlides 方法中,代码验证传入的 PresentationDocument 对象是否为 null,如果不为 null,则从 PresentationDocument 对象中获取 PresentationPart 对象。通过使用属于 SlideParts 的 Count 方法,代码获取并返回 slidesCount。
// Check for a null document object.
if (presentationDocument == null)
{
throw new ArgumentNullException("presentationDocument");
}
int slidesCount = 0;
// Get the presentation part of document.
PresentationPart presentationPart = presentationDocument.PresentationPart;
// Get the slide count from the SlideParts.
if (presentationPart != null)
{
slidesCount = presentationPart.SlideParts.Count();
}
// Return the slide count to the previous method.
return slidesCount;
' Check for a null document object.
If presentationDocument Is Nothing Then
Throw New ArgumentNullException("presentationDocument")
End If
Dim slidesCount As Integer = 0
' Get the presentation part of document.
Dim presentationPart As PresentationPart = presentationDocument.PresentationPart
' Get the slide count from the SlideParts.
If presentationPart IsNot Nothing Then
slidesCount = presentationPart.SlideParts.Count()
End If
' Return the slide count to the previous method.
Return slidesCount
在计算幻灯片数量后,该代码使用方法 GetSlideIdAndText 获取所有幻灯片的内容。它首先获取第一张幻灯片的关系 ID,然后从该关系 ID 中获取幻灯片部件。
// Get the relationship ID of the first slide.
PresentationPart part = ppt.PresentationPart;
OpenXmlElementList slideIds = part.Presentation.SlideIdList.ChildElements;
string relId = (slideIds[index] as SlideId).RelationshipId;
// Get the slide part from the relationship ID.
SlidePart slide = (SlidePart) part.GetPartById(relId);
' Get the relationship ID of the first slide.
Dim part As PresentationPart = ppt.PresentationPart
Dim slideIds As OpenXmlElementList = part.Presentation.SlideIdList.ChildElements
Dim relId As String = TryCast(slideIds(index), SlideId).RelationshipId
' Get the slide part from the relationship ID.
Dim slide As SlidePart = DirectCast(part.GetPartById(relId), SlidePart)
然后,该代码声明一个 StringBuilder 对象来存储幻灯片的内部文本。接下来,代码循环访问所有幻灯片,并在 StringBuilder 对象中附加每个幻灯片中的文本。
// Build a StringBuilder object.
StringBuilder paragraphText = new StringBuilder();
// Get the inner text of the slide:
IEnumerable<A.Text> texts = slide.Slide.Descendants<A.Text>();
foreach (A.Text text in texts)
{
paragraphText.Append(text.Text);
}
sldText = paragraphText.ToString();
' Build a StringBuilder object.
Dim paragraphText As New StringBuilder()
' Get the inner text of the slide:
Dim texts As IEnumerable(Of A.Text) = slide.Slide.Descendants(Of A.Text)()
For Each text As A.Text In texts
paragraphText.Append(text.Text)
Next
sldText = paragraphText.ToString()
示例代码
下面的代码获取特定演示文稿文件中所有幻灯片中的全部文本。例如,可从键盘输入演示文稿文件的名称,然后在程序中使用 foreach 循环获取方法 GetSlideIdAndText 返回的字符串数组,如以下示例所示。
Console.Write("Please enter a presentation file name without extension: ");
string fileName = Console.ReadLine();
string file = @"C:\Users\Public\Documents\" + fileName + ".pptx";
int numberOfSlides = CountSlides(file);
System.Console.WriteLine("Number of slides = {0}", numberOfSlides);
string slideText;
for (int i = 0; i < numberOfSlides; i++)
{
GetSlideIdAndText(out slideText, file, i);
System.Console.WriteLine("Slide #{0} contains: {1}", i + 1, slideText);
}
System.Console.ReadKey();
Console.Write("Please enter a presentation file name without extension: ")
Dim fileName As String = System.Console.ReadLine()
Dim file As String = "C:\Users\Public\Documents\" + fileName + ".pptx"
Dim numberOfSlides As Integer = CountSlides(file)
System.Console.WriteLine("Number of slides = {0}", numberOfSlides)
Dim slideText As String = Nothing
For i As Integer = 0 To numberOfSlides - 1
GetSlideIdAndText(slideText, file, i)
System.Console.WriteLine("Slide #{0} contains: {1}", i + 1, slideText)
Next
System.Console.ReadKey()
以下是使用 C# 和 Visual Basic 编写的完整示例代码。
public static int CountSlides(string presentationFile)
{
// Open the presentation as read-only.
using (PresentationDocument presentationDocument = PresentationDocument.Open(presentationFile, false))
{
// Pass the presentation to the next CountSlides method
// and return the slide count.
return CountSlides(presentationDocument);
}
}
// Count the slides in the presentation.
public static int CountSlides(PresentationDocument presentationDocument)
{
// Check for a null document object.
if (presentationDocument == null)
{
throw new ArgumentNullException("presentationDocument");
}
int slidesCount = 0;
// Get the presentation part of document.
PresentationPart presentationPart = presentationDocument.PresentationPart;
// Get the slide count from the SlideParts.
if (presentationPart != null)
{
slidesCount = presentationPart.SlideParts.Count();
}
// Return the slide count to the previous method.
return slidesCount;
}
public static void GetSlideIdAndText(out string sldText, string docName, int index)
{
using (PresentationDocument ppt = PresentationDocument.Open(docName, false))
{
// Get the relationship ID of the first slide.
PresentationPart part = ppt.PresentationPart;
OpenXmlElementList slideIds = part.Presentation.SlideIdList.ChildElements;
string relId = (slideIds[index] as SlideId).RelationshipId;
// Get the slide part from the relationship ID.
SlidePart slide = (SlidePart) part.GetPartById(relId);
// Build a StringBuilder object.
StringBuilder paragraphText = new StringBuilder();
// Get the inner text of the slide:
IEnumerable<A.Text> texts = slide.Slide.Descendants<A.Text>();
foreach (A.Text text in texts)
{
paragraphText.Append(text.Text);
}
sldText = paragraphText.ToString();
}
}
Public Function CountSlides(ByVal presentationFile As String) As Integer
' Open the presentation as read-only.
Using presentationDocument__1 As PresentationDocument = PresentationDocument.Open(presentationFile, False)
' Pass the presentation to the next CountSlides method
' and return the slide count.
Return CountSlides(presentationDocument__1)
End Using
End Function
' Count the slides in the presentation.
Public Function CountSlides(ByVal presentationDocument As PresentationDocument) As Integer
' Check for a null document object.
If presentationDocument Is Nothing Then
Throw New ArgumentNullException("presentationDocument")
End If
Dim slidesCount As Integer = 0
' Get the presentation part of document.
Dim presentationPart As PresentationPart = presentationDocument.PresentationPart
' Get the slide count from the SlideParts.
If presentationPart IsNot Nothing Then
slidesCount = presentationPart.SlideParts.Count()
End If
' Return the slide count to the previous method.
Return slidesCount
End Function
Public Sub GetSlideIdAndText(ByRef sldText As String, ByVal docName As String, ByVal index As Integer)
Using ppt As PresentationDocument = PresentationDocument.Open(docName, False)
' Get the relationship ID of the first slide.
Dim part As PresentationPart = ppt.PresentationPart
Dim slideIds As OpenXmlElementList = part.Presentation.SlideIdList.ChildElements
Dim relId As String = TryCast(slideIds(index), SlideId).RelationshipId
' Get the slide part from the relationship ID.
Dim slide As SlidePart = DirectCast(part.GetPartById(relId), SlidePart)
' Build a StringBuilder object.
Dim paragraphText As New StringBuilder()
' Get the inner text of the slide:
Dim texts As IEnumerable(Of A.Text) = slide.Slide.Descendants(Of A.Text)()
For Each text As A.Text In texts
paragraphText.Append(text.Text)
Next
sldText = paragraphText.ToString()
End Using
End Sub