Manipulating Office document contents – On Steroids!
Yes, after a long time .. sorry, whatever .. let’s not spend time on reasons. Not going to help anyone …
Here is the new story: How many time did you try manipulating document contents (for example adding something .. removing something etc), and the perf wasn’t something that you really expected. I’ve been there, part of the reason is: generally when things work great, I don’t get to see them
While working on an issue, I came across a way to manipulate Office documents with a great performance – OpenXML: Yes, I know what are you thinking, because I thought the same thing – “How on earth can you use OpenXML to manipulate a loaded document!! you can’t even open it with OpenXML SDK!!”. The answer lies in one of my previous post where I talked about FlatOPC, (not explicitly though). I am using the same thing for document manipulation. The core idea is -
- Get “System.IO.Packaging.Package” stream for the document
- Open it using OpenXML SDK (Yes! you can open memory stream using OpenXML SDK)
- Convert it to FlatOPC
- Manipulate whatever you want ..
- Use InsertXML to insert it back to the document
Now, this is the idea – how to use it, is left to your imagination. Though I have already built a reusable library that you can use for achieving the same results without bothering what’s going on under the hoods, but it’s still in need of a good plugin system. But, you’ll get it for sure
Below is one example of what are the things that you can achieve using this: In this example I am removing all the “Editors” from the document (because having a lot of editors might mean, a lot of network calls)
1: private void button1_Click(object sender, RibbonControlEventArgs e)
2: {
3: wdApp.ScreenUpdating = false;
4: wdApp.ActiveDocument.Content.Select();
5: string openxml = string.Empty;
6:
7: //Get stream for the range. This is the System.IO.Packaging.Package stream
8: Stream packageStream = OpcHelper.GetPackageStreamFromRange(wdApp.Selection.Range);
9:
10: //Stream packageStream = this.Paragraphs[1].Range.GetPackageStreamFromRange();
11: //Use Open Xml SDK to process it.
12: using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(packageStream, true))
13: {
14: //Convert to flat opc using this in-memory package
15: XDocument xDoc = OpcHelper.OpcToFlatOpc(wordDoc.Package);
16:
17: XmlNamespaceManager xnm = new XmlNamespaceManager(xDoc.CreateReader().NameTable);
18: xnm.AddNamespace("w", "https://schemas.openxmlformats.org/wordprocessingml/2006/main");
19:
20: xDoc.XPathSelectElements("//w:permStart", xnm).ToList().ForEach(a => a.Remove());
21: xDoc.XPathSelectElements("//w:permEnd", xnm).ToList().ForEach(a => a.Remove());
22:
23: openxml = xDoc.ToString();
24: }
25:
26:
27:
28: //Insert this flat opc Xml
29: wdApp.ActiveDocument.Select();
30:
31: try
32: {
33: object nullstring = "";
34: wdApp.ActiveDocument.Unprotect(ref nullstring);
35: }
36: catch (Exception)
37: {
38: throw;
39: }
40:
41:
42: wdApp.Selection.Range.InsertXML(openxml, ref missing);
43: wdApp.ScreenUpdating = true;
44: }
Now, I am sure you are looking for explanation for some of the things here .. which I would do surely: but in the next post. This one is for you to figure out what’s happening . But, don’t worry, I am not going to throw you in the dark – Below is OpcHelper that is being used:
1: using System;
2: using System.IO;
3: using System.IO.Packaging;
4: using System.Linq;
5: using System.Text;
6: using System.Xml;
7: using System.Xml.Linq;
8: using Microsoft.Office.Interop.Word;
9:
10:
11: namespace WordAddIn2
12: {
13: public static class OpcHelper
14: {
15: /// <summary>
16: /// Returns the part contents in xml
17: /// </summary>
18: /// <param name="part">System.IO.Packaging.Packagepart</param>
19: /// <returns></returns>
20: static XElement GetContentsAsXml(PackagePart part)
21: {
22: XNamespace pkg =
23: "https://schemas.microsoft.com/office/2006/xmlPackage";
24: if (part.ContentType.EndsWith("xml"))
25: {
26: using (Stream partstream = part.GetStream())
27: using (StreamReader streamReader = new StreamReader(partstream))
28: {
29: string streamString = streamReader.ReadToEnd();
30: XElement newXElement =
31: new XElement(pkg + "part", new XAttribute(pkg + "name", part.Uri),
32: new XAttribute(pkg + "contentType", part.ContentType),
33: new XElement(pkg + "xmlData", XElement.Parse(streamString)));
34: return newXElement;
35: }
36: }
37: else
38: {
39: using (Stream str = part.GetStream())
40: using (BinaryReader binaryReader = new BinaryReader(str))
41: {
42: int len = (int)binaryReader.BaseStream.Length;
43: byte[] byteArray = binaryReader.ReadBytes(len);
44: // the following expression creates the base64String, then chunks
45: // it to lines of 76 characters long
46: string base64String = (System.Convert.ToBase64String(byteArray))
47: .Select
48: (
49: (c, i) => new
50: {
51: Character = c,
52: Chunk = i / 76
53: }
54: )
55: .GroupBy(c => c.Chunk)
56: .Aggregate(
57: new StringBuilder(),
58: (s, i) =>
59: s.Append(
60: i.Aggregate(
61: new StringBuilder(),
62: (seed, it) => seed.Append(it.Character),
63: sb => sb.ToString()
64: )
65: )
66: .Append(Environment.NewLine),
67: s => s.ToString()
68: );
69:
70: return new XElement(pkg + "part",
71: new XAttribute(pkg + "name", part.Uri),
72: new XAttribute(pkg + "contentType", part.ContentType),
73: new XAttribute(pkg + "compression", "store"),
74: new XElement(pkg + "binaryData", base64String)
75: );
76: }
77: }
78: }
79: /// <summary>
80: /// Returns an XDocument
81: /// </summary>
82: /// <param name="package">System.IO.Packaging.Package</param>
83: /// <returns></returns>
84: public static XDocument OpcToFlatOpc(Package package)
85: {
86: XNamespace
87: pkg = "https://schemas.microsoft.com/office/2006/xmlPackage";
88: XDeclaration
89: declaration = new XDeclaration("1.0", "UTF-8", "yes");
90: XDocument doc = new XDocument(
91: declaration,
92: new XProcessingInstruction("mso-application", "progid=\"Word.Document\""),
93: new XElement(pkg + "package",
94: new XAttribute(XNamespace.Xmlns + "pkg", pkg.ToString()),
95: package.GetParts().Select(part => GetContentsAsXml(part))
96: )
97: );
98: return doc;
99: }
100: /// <summary>
101: /// Returns a System.IO.Packaging.Package stream for the given range.
102: /// </summary>
103: /// <param name="range">Range in word document</param>
104: /// <returns></returns>
105: public static Stream GetPackageStreamFromRange(Range range)
106: {
107: XDocument doc = XDocument.Parse(range.WordOpenXML);
108: XNamespace pkg =
109: "https://schemas.microsoft.com/office/2006/xmlPackage";
110: XNamespace rel =
111: "https://schemas.openxmlformats.org/package/2006/relationships";
112: Package InmemoryPackage = null;
113: MemoryStream memStream = new MemoryStream();
114: using (InmemoryPackage = Package.Open(memStream, FileMode.Create))
115: {
116: // add all parts (but not relationships)
117: foreach (var xmlPart in doc.Root
118: .Elements()
119: .Where(p =>
120: (string)p.Attribute(pkg + "contentType") !=
121: "application/vnd.openxmlformats-package.relationships+xml"))
122: {
123: string name = (string)xmlPart.Attribute(pkg + "name");
124: string contentType = (string)xmlPart.Attribute(pkg + "contentType");
125: if (contentType.EndsWith("xml"))
126: {
127: Uri u = new Uri(name, UriKind.Relative);
128: PackagePart part = InmemoryPackage.CreatePart(u, contentType,
129: CompressionOption.SuperFast);
130: using (Stream str = part.GetStream(FileMode.Create))
131: using (XmlWriter xmlWriter = XmlWriter.Create(str))
132: xmlPart.Element(pkg + "xmlData")
133: .Elements()
134: .First()
135: .WriteTo(xmlWriter);
136: }
137: else
138: {
139: Uri u = new Uri(name, UriKind.Relative);
140: PackagePart part = InmemoryPackage.CreatePart(u, contentType,
141: CompressionOption.SuperFast);
142: using (Stream str = part.GetStream(FileMode.Create))
143: using (BinaryWriter binaryWriter = new BinaryWriter(str))
144: {
145: string base64StringInChunks =
146: (string)xmlPart.Element(pkg + "binaryData");
147: char[] base64CharArray = base64StringInChunks
148: .Where(c => c != '\r' && c != '\n').ToArray();
149: byte[] byteArray =
150: System.Convert.FromBase64CharArray(base64CharArray,
151: 0, base64CharArray.Length);
152: binaryWriter.Write(byteArray);
153: }
154: }
155: }
156: foreach (var xmlPart in doc.Root.Elements())
157: {
158: string name = (string)xmlPart.Attribute(pkg + "name");
159: string contentType = (string)xmlPart.Attribute(pkg + "contentType");
160: if (contentType ==
161: "application/vnd.openxmlformats-package.relationships+xml")
162: {
163: // add the package level relationships
164: if (name == "/_rels/.rels")
165: {
166: foreach (XElement xmlRel in
167: xmlPart.Descendants(rel + "Relationship"))
168: {
169: string id = (string)xmlRel.Attribute("Id");
170: string type = (string)xmlRel.Attribute("Type");
171: string target = (string)xmlRel.Attribute("Target");
172: string targetMode =
173: (string)xmlRel.Attribute("TargetMode");
174: if (targetMode == "External")
175: InmemoryPackage.CreateRelationship(
176: new Uri(target, UriKind.Absolute),
177: TargetMode.External, type, id);
178: else
179: InmemoryPackage.CreateRelationship(
180: new Uri(target, UriKind.Relative),
181: TargetMode.Internal, type, id);
182: }
183: }
184: else
185: // add part level relationships
186: {
187: string directory = name.Substring(0, name.IndexOf("/_rels"));
188: string relsFilename = name.Substring(name.LastIndexOf('/'));
189: string filename =
190: relsFilename.Substring(0, relsFilename.IndexOf(".rels"));
191: PackagePart fromPart = InmemoryPackage.GetPart(
192: new Uri(directory + filename, UriKind.Relative));
193: foreach (XElement xmlRel in
194: xmlPart.Descendants(rel + "Relationship"))
195: {
196: string id = (string)xmlRel.Attribute("Id");
197: string type = (string)xmlRel.Attribute("Type");
198: string target = (string)xmlRel.Attribute("Target");
199: string targetMode =
200: (string)xmlRel.Attribute("TargetMode");
201: if (targetMode == "External")
202: fromPart.CreateRelationship(
203: new Uri(target, UriKind.Absolute),
204: TargetMode.External, type, id);
205: else
206: fromPart.CreateRelationship(
207: new Uri(target, UriKind.Relative),
208: TargetMode.Internal, type, id);
209: }
210: }
211: }
212: }
213: InmemoryPackage.Flush();
214: }
215: return memStream;
216: }
217: }
218: }
Stay tuned for the next set of entries where I’d attempt to explains some of the things that we’ve used here – and we’d have a full fledged library (which supports addins – and you’ll be able to do the contribution)