How to fix/process a bad XML file

ac-lap 71 Reputation points
2023-01-03T15:39:30.113+00:00

My application needs to parse XML files provided by the user, so I don't have control over its quality.

I have an XML which is incorrect, snippet I have shared below. Notice the missing prefix, xmlns:=

<?xml version="1.0" encoding="utf-8" ?>  
<ncx version="2005-1" xml:lang="por" xmlns:="http://www.daisy.org/z3986/2005/ncx/">  
</ncx>  

When I try to use XDocument.Load() in C#, it throws this exception and rightly so -

System.Xml.XmlException: 'Name cannot begin with the '=' character, hexadecimal value 0x3D. Line 2, position 44.'  

Is there a way I can correct such minor errors in the XML? I tried looking for it but didn't find any good solution. And I want to avoid string search/replace to fix such errors.

Developer technologies C#
{count} votes

2 answers

Sort by: Most helpful
  1. Michael Taylor 60,161 Reputation points
    2023-01-03T16:55:06.513+00:00

    XML parsers, by and large, require well formatted XML. Since XML is so sensitive to formatting then there is no easy way to do this and pretty much every solution is going to require that you string manipulate the file first and then try to read it in as XML. Of course you could try to read the XML first and then respond accordingly but that would only make sense if errors are rare.

    If you consistently have the same errors for a given XML provider then you might need to set up some custom error handling for the very specific scenarios. Handling them all is unrealistic (even heuristically). Things like xmlns:= makes sense as being converted to xmlns= but this would be just one possible issue.

    Ultimately I agree with AgaveJoe that you should contact the vendor and have them fix their XML. But in some cases this isn't possible or just takes too long to wait.

    0 comments No comments

  2. Lex Li (Microsoft) 6,037 Reputation points Microsoft Employee
    2023-01-05T08:13:43.283+00:00

    If you use any prebuilt XML parser (like the one in .NET itself), then malformed XML contents won't be accepted as long as they violate the standards.

    However, you can write your own fault tolerant XML parser if you like, if you learn the technique. Quite a few parser generators/frameworks are out there, such as ANTLR.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.