Invalid high surrogate character (0xxxxx)

This is my first blog, why not start it with an Exception! Some background on the application, this application collects data from machines. Think of the application using remote registry to read the a key from the registry. After that it constructs a giant XML and tries to return it in a string format. Now that's where the problem is. One of our tester uncovered this weird exception. All we are doing in the code is

StringBuilder sbWrite = new StringBuilder();
StringWriter strWrite = new StringWriter(sbWrite);
XmlTextWriter xmlWrite = new XmlTextWriter(strWrite);
xmlDoc.WriteTo(xmlWrite);
xmlWrite.Flush();
xmlWrite.Close();
strWrite.Flush();
strWrite.Close();
Console.WriteLine(sbWrite.ToString());

You get the error at xmlDoc.WriteTo() method. Here the entire stack trace. Of course, it only has the exception part.

Invalid high surrogate character (0xxxxx). A high surrogate character must have a value from range (0xxxxx - 0xxxxx).
Server stack trace:
   at System.Xml.XmlTextEncoder.Write(String text)
   at System.Xml.XmlTextWriter.WriteString(String text)
   at System.Xml.XmlText.WriteTo(XmlWriter w)
   at System.Xml.XmlElement.WriteContentTo(XmlWriter w)
   at System.Xml.XmlElement.WriteTo(XmlWriter w)
   at System.Xml.XmlElement.WriteContentTo(XmlWriter w)
   at System.Xml.XmlElement.WriteTo(XmlWriter w)
   at System.Xml.XmlElement.WriteContentTo(XmlWriter w)
   at System.Xml.XmlElement.WriteTo(XmlWriter w)
   at System.Xml.XmlElement.WriteContentTo(XmlWriter w)
   at System.Xml.XmlElement.WriteTo(XmlWriter w)
   at System.Xml.XmlElement.WriteContentTo(XmlWriter w)
   at System.Xml.XmlElement.WriteTo(XmlWriter w)
   at System.Xml.XmlElement.WriteContentTo(XmlWriter w)
   at System.Xml.XmlElement.WriteTo(XmlWriter w)
   at System.Xml.XmlElement.WriteContentTo(XmlWriter w)
   at System.Xml.XmlElement.WriteTo(XmlWriter w)
   at System.Xml.XmlElement.WriteContentTo(XmlWriter w)
   at System.Xml.XmlElement.WriteTo(XmlWriter w)
   at System.Xml.XmlElement.WriteContentTo(XmlWriter w)
   at System.Xml.XmlElement.WriteTo(XmlWriter w)
   at System.Xml.XmlElement.WriteContentTo(XmlWriter w)
   at System.Xml.XmlElement.WriteTo(XmlWriter w)
   at System.Xml.XmlElement.WriteContentTo(XmlWriter w)
   at System.Xml.XmlElement.WriteTo(XmlWriter w)
   at System.Xml.XmlDocument.WriteContentTo(XmlWriter xw)
   at System.Xml.XmlDocument.WriteTo(XmlWriter w)

The problem clearly is with the data that is in the registry. Lets dig deeper into the problem, from the stack trace it is clear that the XmlDocument.WriteTo is calling the XmlTextWriter.WriteString and finally XmlTextEncoder.Write is writing it to the stream. When XmlTextWriter is writing the string with the encoding that is supported by the XmlDocument, it is failing to write the unicode characters. If we can simply replace the high surrogate characters it would fix the issue.

Interestingly there is a way to do this is by using EncoderReplacementFallback. Now the problem is that you can only use this with an Encoding class and XmlDocument class does not support Encoding class, instead it accepts encoding in the string format. Thus the simple way to fix this is by creating a custom XmlTextWriter class and overriding the XmlTextWriter.WriteString and use the EncoderReplacementFallback class. The following code does the trick.

public class CustomXmlWriter : XmlTextWriter
{
        public CustomXmlWriter(TextWriter writer) : base(writer) { }
        public CustomXmlWriter(Stream stream, Encoding encoding) : base(stream, encoding) { }
        public CustomXmlWriter(string file, Encoding encoding) : base(file, encoding) { }
        public override void WriteString(string text)
        {
            Encoding utfencoder = UTF8Encoding.GetEncoding("UTF-8", new EncoderReplacementFallback(""), new DecoderReplacementFallback(""));
            byte[] bytText = utfencoder.GetBytes(text);
            string strEncodedText = utfencoder.GetString(bytText);
            base.WriteString(strEncodedText);
        }

}

Now all we have to do is use this instead of using XmlTextWriter to get the Xml use the above class to get the Xml.

StringBuilder sbWrite = new StringBuilder();
StringWriter strWrite = new StringWriter(sbWrite);
CustomXmlWriter xmlWrite = new CustomXmlWriter(strWrite);
xmlDoc.WriteTo(xmlWrite);
xmlWrite.Flush();
xmlWrite.Close();
strWrite.Flush();
strWrite.Close();
Console.WriteLine(sbWrite.ToString());

As you see by using EncoderReplacementFallback class we are replacing the high surrogate characters with nothing. You can replace it with ? or something of your choice.

Comments

  • Anonymous
    November 24, 2008
    I just ran into this same issue and your solution worked perfectly.  Thank you!