Ah, laddie, too bad the wheels have fallen off of the Word wagon ... (for you).
The general concept for your problem is "document corruption". It was a major problem in the older "DOC" / 2003 format and was supposed to be fixed by the change to the DOCX XML format. Unfortunately it created new types of corruption. Fortunately, or
not, for you you are not suffering from the most common form of corruption which diagnosed by a XML "End Tag" error.
There are some real XML experts on this forum. Look for the "XML End Tag" discussions. Even though your problem is off topic, they will probably be willing to look at your file and possibly help you.
The first thing I can suggest to try (after you have already done the troubleshooting link) is to try the competition, OpenOffice or LibreOffice. They include a better XML validation and correction process at file open. It has been known to fix problems
that Word can't handle.
Sorry, I don't have a specific fix for your problem.
Here is my collection of "corruption" fixing tips. Maybe you can find a useful technique in one of them:
Fixing Corrupt MS Word Documents
The most important thing to do is STOP USING THIS COMPUTER! Your best hope is recover a deleted version of the file. Every time you or windows (it does a LOT of writes without your knowledge) write a file to the HD you are reducing the chances of recovering
a deleted file.
Quick learning about “X” file structures - XML – Editing XML
Pt1: Breaking Into Your Office Open XML Format Documents (2007)
http://msevents.microsoft.com/CUI/WebCastEventDetails.aspx?culture=en-US&EventID=1032319980&CountryCode=US
But what is different about these new formats, and what can the Open XML Formats do for advanced Microsoft Office users? Join this session for a guided tour through the XML behind your documents, along with an introduction to editing
the XML directly so you can get more from your documents than ever before. Learn about the files that make up a Microsoft Office Word 2007, Excel 2007, or PowerPoint 2007 document and get timesaving tips and troubleshooting tricks for editing, sharing, and
reusing content without even opening Word, Excel, or PowerPoint.
Note: This is the first in a series of three sessions for advanced users who want to learn to customize 2007 Office release documents. The second session in this series introduces Microsoft Visual Basic for Applications (VBA). The
third session brings concepts covered in the first two webcasts together. We show you the basics of customizing the Ribbon, including how to add your own VBA macros to built-in or custom Ribbon tabs.
Pt2: Using Visual Basic for Applications (VBA) Every Day Is Easier Than You Think (Level 300)
https://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032326222&culture=en-us
Ever spend time repeating the same actions in the Microsoft Office programs time and again, sure that there must be a better way? Well, there probably is. Contrary to what you might think, Microsoft Visual Basic for Applications (VBA)
can be easy to use. Once you know the basics of working with VBA, you may be amazed at what you can do and how much time you can save on things you do every day, from complex document tasks to daily updates. In this webcast, we focus primarily on tasks that
can be performed in Microsoft Office Word 2007. We introduce you to some core essentials of VBA that you can apply in the 2007 Microsoft Office system programs Word, Excel, and PowerPoint and provide tips and tricks for using VBA as a daily productivity tool.
Note: This is the second in a series of three sessions for advanced users who want to learn how to customize 2007 Microsoft Office release documents. The first session (not a prerequisite for this session) introduced the Office Open XML Formats and the basics
of how to edit the XML behind your documents. The third session brings concepts covered in the first two webcasts together. We show you the basics of customizing the Ribbon user interface, including how to add your own VBA macros to built-in or custom Ribbon
tabs. Presenter:
Pt3: Customizing the Ribbon Using Office Open XML (Level 300)
https://msevents.microsoft.com/CUI/EventDetail.aspx?EventID=1032326224&culture=en-us
Are you frustrated by the fact that you can no longer create custom toolbars or menus without some programming knowledge? Do you think that using XML to customize the Ribbon in the 2007 Microsoft Office system programs Word, Excel,
and PowerPoint sounds too complicated for you? Think again. Once you know the basics of editing the XML behind 2007 Office release documents, customizing the Ribbon is one of the easiest tasks you can do. In this session, we look at the basics of Ribbon customization,
including how to create your own custom tab or add commands to a built-in tab. We also show you how to add your own Microsoft Visual Basic for Applications (VBA) macros to the Ribbon. Note: This session is intended for advanced Microsoft Office users with
some prior introduction to Microsoft Office Open XML Formats and a basic understanding of VBA.
Modify corrupt file so the error message provides a usable line number in the XML
http://answers.microsoft.com/en-us/office/forum/office_2010-word/unspecified-error-worddocumentxml-line2-column-0/21971fa0-df44-4ba6-ac42-7d4b5cd4174f?page=30&msgId=f35f2924-811e-4170-9da5-6846d36a47b4
socrtwo replied
The error message “**Unspecified Error, word/document.xml, Line:2, Column: 0****”**is pretty useless. You can edit XML to add lines breaks at each XML tag. Then when you open the file it will provide you with hopefully a more usefull line number to start from (as long as it is not the end of the document).
Opening the word\document.xml file with Notepad++ and then opening the Replace function window, you can put each XML tag on its own line and get a line number for your error, instead of “line 2, column 0”. Apparently putting the tags each on their own line
and then rezipping the document.xml file with its fellow XML subfiles, does no harm, but obviously helps a great deal pinpointing the problem.
However one should note, there is a trick to putting each tag on its own line in NotePad++...you have to be sure to move the radio button in the Search Mode section of the Replace Window from “Normal” to “Extended” and use “\n” to indicate a new line
to the program. See step number 5.
So here are the steps for recovering manually from Unspecified Error Line 2, Column 0. Credit for this algorithm should go to ???
- NotePad++. Many expert users agree NotePad++ is the best software for this kind of thing. I have turned on word-wrap from the View menu, so that why the text fills the screen
instead of the normal case where line 2 runs off the page to the right almost ad infinitum I’m sure it seems...
Note: Here are 4 patterns of bad XML which will lead to “Unspecified error - Location: Part: /word/document.xml, Line: 2, Column: 0”.
- Something wrong inside <m:oMath>
</m:oMath> tags
Removing the tags and everything in between, sometimes cures the problem. This problem apparently is caused by editing equations after they are initially created and saved. The work around is if you have to fix an error in an existing equation it is safer to
simply recreate the whole equation
- Something wrong inside of <w:tbl> and
</w:tbl> tags.
Removing the tags and everything in between, sometimes cures the problem.
- Something wrong inside of <w:fldChar w:fldCharType=”begin”/> and
<w:fldChar w:fldCharType=”end”/> tags. These tags define the Table of Contents.
Removing the tags and the content in between allows these files to sometime open. That is a bit drastic but it is what chrima reccommends here: http://help.wugnet.com/office/open-docx-file-content-error-ftopict1081932.html.
It could come down to some tags that shouldn’t exist in the TOC. You might experiment to remove some odd looking tags within the TOC to see if you can save it although recreating it may be better. This blog post suggest an easy way to get rid of the TOC is
to open the corrupt Word file in Office Docs/Office365/Office Online, which have apparently lower standards as to what is corrupt in this regard, and apparently you then easily remove the TOC there and save it and then download it again for TOC recreation
on your Desktop version of Word: http://problemswiththecontent.blogspot.com/2011/03/file-cannot-be-opened-because-there-are.html.
>
However, in the following specific case, it was wrapping the lettersBH which were the numerator in a formula. The malformed section looked like this,
<m:num>
<m:r>
<w:rpr>
<w:rFonts w:ascii
…. “Cambria math”/>
</w:rpr>
<m:t>
BH
</m:t>
</m:r>
<m:ctrlPr>
<w:rpr>
<w:rFontgs w:ascii*”Cambria math” … “cambria math”/>
<w:1/>
<w:rpr>
<m:ctrlPr>
</m:num>
NOTE: Here are additional non-XML causes for the problem
>
My freeware is here: http:/hostedfiles.wherehaveibeen.info/savvy_corrupt_DOCX_setup_2
.0.4_without_adware.exe
I’m working on an update of the program and will be posting here soon.
Finally, as Jeeped repeatedly points out, many of these issues are prevented from creating future problems by installing Service
Packs and/or Hotfixes.
· http://support.microsoft.com/kb/970942
- Hotfix for Word 2007. Fixes issue to prevent Unspecified errors for future files.
· http://support.microsoft.com/kb/2817583
- Hotfix for Word 2010. Fixes issue to prevent Unspecified errors with Line 2, Column 0 indicated as the error location. Works for future files.
· http://support.microsoft.com/kb/2528942
- Mr. Fixit application will directly fix present Word files that won’t open and manifest the error: “The name in the end tag of the element must match the element type in the start tag.” Office 2013 and Office 2010 Service Pack 1 resolves the issue
for new files.
· http://www.microsoft.com/en-us/download/office-service-packs.aspx
- As mentioned Service Pack 1 for Office 2010 and 2013 fixes math end tag not matching errors for future files.
**Unspecified Error, word/document.xml, Line:2, Column: 0** – end tag
<snip>
This last DOCX Unspecified error had to do with the <m:num> XML tag and I’ve had a few of those lately. Often, the <m:num></m:num> pair contains absolutely no content and can be removed without any damage to the DOCX at all. In
this case, it was wrapping the lettersBH which were the numerator in a formula. The malformed section looked like this,
<m:num>
<m:r>
<w:rpr>
<w:rFonts w:ascii
…. “Cambria math”/>
</w:rpr>
<m:t>
BH
</m:t>
</m:r>
<m:ctrlPr>
<w:rpr>
<w:rFontgs w:ascii*”Cambria math” … “cambria math”/>
<w:1/>
<w:rpr>
<m:ctrlPr>
</m:num>
Now that XML code sequence passes syntax muster but Word still chokes on it. This can be proven by extracting the
word\document.xml and opening it in Excel (or some other XML editor).
I first tried to delete the lower <m:ctrlPr></m:ctrlPr> section as it seemed empty of any actual contribution to the content of the DOCX but that still did not allow Word to open the document. Failing that, I noted other surrounding text so I could
locate the problem area within the document at a later time (e.g. within Word) and snipped out the entire the <m:num></m:num> block. This allowed Word to open the DOCX.
I searched for the surrounding content I had noted previously and recorded the page number. I zoomed to
10% and took a screenshot of the first four pages and wrote my reply.
</snip>
Here is another cause for this error:
<snip>
here’s what caused my problem: Where you found the error, I had a checkbox (legacy form field) inside of a plain text content control field. It didn’t cause any problems when left alone, but when I altered only that field (i.e., checked the box), saved
the doc, and reopened it, presto, there was the error. I guess Word doesn’t like form fields inside of form fields...which is maybe an obvious thing... I’d done it based on a suggestion I found online, because putting the checkbox inside the plain text content
control field allowed the user to actually tab to the checkbox -- Word won’t tab to it otherwise (apparently due to the mix of legacy fields and 2007 fields). Oh well, I guess I can live without that :-) I took the checkbox out of the plain text field and
now the form works fine.
</snip>
XML Editing in Visual Studio Express 2010 with 7-Zip (Both Free)
http://www.indezine.com/products/powerpoint/learn/themes/xml-editing-visual-studio-express.html
This article shows how to install 7-Zip and the Visual Basic part of Visual Studio Express 2010.
Then it shows how to use these 2 tools to open X-format files (without renaming them) and edit the XML files in the zip container.
I have not tested this tool (yet), but it appears that the VB editor can be set to automatically indent the XML, which I personally find makes it
MUCH easier to read.
XML Notepad 2007
https://www.microsoft.com/download/en/details.aspx?displaylang=en&id=7973
Free MS XML editor
Thanks to your advice I managed to restore my document and it saved me a ton of headache. So I thought I should write some words about the method I used to fix xml errors in document.xml.
I opened the docx in 7-zip and unzipped document.xml. The problem was that document.xml had ~90k tags so this was far too much to read through manually to find the error.
What I also saw is that Word store document.xml without Windows line endings, so the whole document was on one line(I saw this when I opened it in Notepad and Visual Studio).
I then opened document.xml with Microsoft Xml Notepad and saved it again. Xml Notepad is configured to normalize the line endings when saving the xml file. I zipped document.xml back into the docx-file and opened it in Word. This time, instead of getting
Line:2,column:0, I got something like Line:75377,column:0.
I then jumped to the erroneous line, deleted it and some other suspicious tags around it and zipped document.xml back into the docx. Opened the document in word again, got an error on a different line and repeated the procedure. After three attempts I had
managed to clean my document of errors and it opened correctly in Word.
Word 2007 will often show the location of the error when Word 2010 or 2013 will not but not everyone has access to multiple Office versions.
Recover data from a damaged Office file with the help of 7-Zip
http://www.techrepublic.com/blog/itdojo/recover-data-from-a-damaged-office-file-with-the-help-of-7-zip/3993
This tip is a variation on renaming DOCX to ZIP and using Windows explorer to view the contents of the ZIP “folder”.
You receive an “end tag” error when you open a DOCX file in Word 2007, 2010 or 2013 - KB 2528942
http://support.microsoft.com/kb/2528942/en-us
Office 2013 / 2010 SP1 fix this problem for new files.
The KB also has a Fix IT for corrupt files
End Tag - unspecified error Location: Part: /word/document.xml, Line: 2, Column: 0
http://support.microsoft.com/kb/2817583
There are 2 “end tag” type errors. They have different causes and different fixes.
This one can be identified by the “**Line: 2, Column: 0”**in the error messge. This problem can be fixed by applying the hotfix in KB2817583.
The other one lists a LARGE column number “Line: 2, Column: 100001”. This type of error requires manual editing to place the missing End Tag into the file. Described below
One Person reported being able to fix this sort of problem simply by doing a File / Open and save in Libre Office.
Can’t open Word File because of end tag/start tag mismatch error... XML Tag - XML Error – Fix It tool - “The name in the end tag of the element must match the element type in the start tag”
This error is caused when Word either “forgets” to write an XML tag, or writes them in the wrong order.
Tony Jolans was the first person that I heard of with home made tool to fix the problem. Now MS has released a Fix It for one specific variation of the problem.
If the tools don’t fix your problem, the file will have to be fixed manually, repairing the tag order.
The Fix It article notes that the document is still in a fragile state. You have to do some addition fixing to avoid repeats of the problem.
https://blogs.technet.com/b/wordonenotesupport/archive/2011/03/24/error-when-opening-a-word-2007-or-2010-document.aspx****
Document Recovery
**http://www.wordarticles.com/Shorts/Corruption/Formats.php******
This page has the most readable description of Word file structures, DOC and DOCX, I have seen so far
The logical structure of a Word 97‑2003 format document is one of a series of elements arranged in a hierarchy, much like a mini file system. As an example, here is the structure of a simple Word 97‑2003 (.doc) format document:
MyDocument.doc
1Table
*CompObj
Word Document
*SummaryInformation
*DocumentSummaryInformation
The physical structure of the complete file bears little relation to the logical structure; it is, again, of a proprietary design, a compound, or structured storage, file. Briefly, and loosely, the separate logical elements of the file are broken
up into blocks; these blocks are treated as individual units, which units are then organised without regard for their logical arrangement, and catalogued, catalogue and organisation detail being held alongside the blocks themselves, to enable recombination
into logical components when necessary.
Just to give you a flavour, here are some views of three small parts of such a document, viewed in a hex editor:
Views of a Word 97-2003 format Document
The logical structure of a Word 2007 format document is one of a series of elements arranged in a hierarchy, much like a mini file system. As an example, here is the structure of a simple Word 2007 (.docx) format:
MyDocument.docx
_rels
rels
docProps
app.xml
core.xml
word
_rels
document.xml.rels
theme
theme1.xml
document.xml
settings.xml
fontTable.xml
webSettings.xml
styles.xml
[Content_Types].xml
As briefly as before, the [Content_Types] file and the _rels folders, along with the subordinate files therein, contain information about the logical structure, and the two files in the docProps folder contain much the same as the two Information
files in the old format. The document.xml element within the word folder holds the bulk of the document content and the other files within that same folder hold formatting details.
So, you might say, the internal structure of a document has changed a little, so what? There are, however, other changes that make a bigger difference. The first is that, although both logical formats are conceptually similar, they are wrapped
up in completely different ways to make a single file. Instead of the proprietary physical structure used for Word 97‑2003 format documents, a fairly standard, and open, Zip Archive format is used for Word 2007 format documents. The second change is that instead
of using obscure binary codes, everything in Word 2007 format documents, well almost everything, is held in XML format.
All data held as XML? In a standard Zip Package? It should be much easier to work with, then? Judge for yourself; here are some views of parts of a Word 2007 format document taken from a hex editor:
Views of a Word 2007 format Document
I found that the 13th <mc:AlternateContent><mc:Choice Requires=”wps”> actually ended with
</mc:Choice></w:r>
... instead of this,
</mc:Choice></mc:AlternateContent></w:r>
... so I fixed it.
What you have found was, indeed, the error; this is a new type of corruption that is starting to appear - Microsoft have told me nothing has changed but something, somewhere, certainly appears to have done - and well done for fixing it - especially
in Notepad; the more people who can do this the better.
The correct construct looks like this (apologies in advance for the formatting - I don’t see any tools in this forum):
<mc:AlternateContent>
<mc:Choice requires = “wps”>
Word 2010 xml
</mc:Choice>
<mc:Fallback>
Word 2007 xml
</mc:Fallback>
</mc:AlternateContent>
When Word 2010 reads the xml it reads the “wps” Choice section - the Word 2010 graphics format code - and ignores the rest. Earlier versions of Word will read whatever is in the Fallback section (which should be a Word 2007 format graphic) and ignore
the Choice.
Requires=”wps” does not actually _mean_ Word 2010, but in this case it implies it. The contents of the two lots of xml are, typically, a <drawing> structure for 2010, and a <pict> structure for 2007.
When there is no fallback, the document will open in earlier versions, but there will simply be nothing in place of the image. I can not reconstruct the required pict structure - although it would be fun to try - and I don’t think Yves can either,
so we add a basic Fallback construct containing a text placeholder, so that anyone opening the document in Word 2007 is notified that something is missing.
Ideally the document owner will fix the problem once they can open the document but, unless they are running multiple versions of Word - which isn’t common for normal users - it will be very hard for them to know whether they have done so.
Enjoy, Tony
www.WordArticles.com
FreeFileViewer – reads 100+ text, Office, audio, video format file types – Can open some XML tag error files
**http://www.freefileviewer.com/formats.html******
DOCX disaster recovery: How I rescued my wife from XM-HELL
http://www.theregister.co.uk/2014/06/09/docx\_disaster\_recovery/
Something strange in your closing tag, who you gonna call?
By Trevor Pott,
9 Jun 2014
Sysadmin blog What do you do when a critical Word document won’t open? Even in today’s world of versioned documents, it is entirely possible for
corruption to squeak in and go unnoticed, wrecking your entire version history.
But all is not lost. My wife had this happen to her; here’s how we solved it.
Real world example
In my case, Word wouldn’t open an important file, dying instead with the error “the name in the end tag of the element must match the element type in the start tag”. Translated from Microsoftese: “The word processor that created this document
made an XML boo-boo, and Word is going to refuse to read this document now.”
The most common kind of XML boo-boos that word processors will make are either saving tags out of order (the most famous example being Microsoft Word’s
oMath tags error) or opening a tag but not closing it. Today’s issue was the latter. The wife was using an
old version of LibreOffice Writer (v4.1) and had made several changes to hyperlinks in one area of the document. Writer got confused somehow, opened a hyperlink tag, but didn’t actually put in any information as to where it was hyperlinking to, and didn’t
close the tag.
What should be noted here is that Writer and Word behave very differently with this broken file. Writer will open the file, but simply stop processing the document around where the XML stops making sense. Word will vomit that error and die.
Both are useful.