Open XML links for 01-11-2007

I've been pretty busy since the holidays, and consequently haven't got around to posting links to all of the great Open XML information that I've come across lately. So without further delay, here are some of my favorite Open XML blog posts and projects I've run into since the first of the year ...

Open source Open XML projects. I met Jason Harrop at the Office 2.0 conference in San Francisco last summer, and he showed me a very interesting Word add-in he was working on. Since then, Jason has started collaborating with a colleague to start up a series of open-source Open XML projects. I'll soon have more to say about some of the interesting work Jason and Jo are doing, but for now I wanted to point out a cool trick they mentioned on their blog that many developers aren't aware of: the ActiveDocument.WordOpenXML feature of the Word object model. On a related note, there are two other methods that developers may want to use for similar purposes: ActiveDocument.ExportFragment and Range.ExportFragment. Those work like SaveCopyAs in Excel, and can save to any supported format.

What's up, DOCX? Guy Creese's blog has some information about a free report available from the Burton Group entitled "What’s Up, .DOC? ODF, OOXML, and the Revolutionary Implications of XML in Productivity Applications." As Guy explains, "we'll probably ruffle some vendor feathers on this one, but we've tried hard to look into this objectively and in some detail (the report is 37 pages long)."

DOCX to HTML in ASP.NET. Maarten Balliauw, the driving force behind the PHPExcel API, is also a talented C# developer, and he has posted a useful article about how to preview DOCX files in HTML. His example takes advantage of LINQ to XML technology in the creation of an ASP.NET HttpHandler that transforms WordprocessingML into HTML. It's a simple example, but a great starting point for those who want to post DOCX files on web sites. Which is a very common need these days; here's another approach that starts from the DocX2Html.xsl that ships with SharePoint.

Taming the complexity of style inheritance. WordprocessingML's approach to style inheritance can be a complex topic, because a given string of text may have many formatting properties that apply: direct formatting, run properties, paragraph properties, list styles, table properties, etc. James Newton King has posted some thoughts on how to manage that complexity, by taking advantage of the fact that Open XML consistently uses property elements (rPr, pPr, etc.) for storing styling and formatting information.

Don't want macros? Remove them. Vineela Kavoori of Sonata Software has posted an article on the OpenXMLDeveloper site entitled Removing macro from WordProcessingML document using Java that demonstrates how to remove macros from a DOCM file and turn it into a DOCX. The sample uses no special libraries or tools, just the standard zip functionality in the util package.

Wouter's pretty developer tab. If you're doing Open XML development work with Word 2007 and you've not installed Wouter Van Vugt's Word add-ins, you're probably working too hard. Install the Databinding toolkit and the Word Source Viewer, and your Developer tab will be as useful — and look as good — as Wouter's.

The design goals of XML. The Open XML standards process has resulted in some interesting debate about the design of XML schemas. Rick Jelliffe's post on Design Goals of XML helps put that debate in its proper historical perspective.