How to validate XML dataset when hexadecimal invalid character arrives

BeUnique 2,112 Reputation points
2023-09-21T09:49:42.04+00:00

I am processing some xml document. while reading xml value through dataset, I am getting below error.

System.Xml.XmlException: ''', hexadecimal value 0x02, is an invalid character. Line 1, position 989.'

How to avoid this...? How to validate and handle this error. I am stuck in the project and not able to proceed due to this error...it is very urgent

User's image

.NET
.NET
Microsoft Technologies based on the .NET software framework.
3,784 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Jiale Xue - MSFT 44,231 Reputation points Microsoft Vendor
    2023-09-22T03:30:00.71+00:00

    Hi @BeUnique , Welcome to Microsoft Q&A,

    The xml must not contain any errors when read. The correct way is to read it as a string and process anything unrecognizable first. One method is to use regular expressions to blank out all illegal characters.

    As follows:

    using System;
    using System.Data;
    using System.IO;
    using System.Text.RegularExpressions;
    using System.Windows.Forms;
    using System.Xml;
    
    namespace _9_22_x
    {
        public partial class Form1 : Form
        {
            public Form1()
            {
                InitializeComponent();
            }
    
            private void button1_Click(object sender, EventArgs e)
            {
                string xmlData = @"<?xml version='1.0'?>
    <!-- This file represents a fragment of a book store inventory database -->
    <bookstore>
      <book genre=""autobiography"" publicationdate=""1981"" ISBN=""1-861003-11-0"">
        <title>The Autobiography of Benjamin Franklin</title>
        <author>
          <first-name>Benjamin</first-name>
          <last-name>Franklin</last-name>
        </author>
        <price>8.99</price>
      </book>
      <book genre=""novel"" publicationdate=""1967"" ISBN=""0-201-63361-2"">
        <title>The Confidence Man</title>
        <author>
          <first-name>Herman</first-name>
          <last-name>Melville</last-name>
        </author>
        <price>11.99</price>
      </book>
      <book genre=""philosophy"" publicationdate=""1991"" ISBN=""1-861001-57-6"">
        <title>The Gorgias</title>
        <author>
          <name>Plato</name>
        </author>
        <price>9.99</price>
      </book>
    </bookstore>";
                //xmlData = xmlData.Replace(@"", "");
                string cleanedXml = RemoveUnrecognizedCharacters(xmlData);
                DataSet dataSet = new DataSet();
                try
                {
                    dataSet.ReadXml(new StringReader(cleanedXml));
                    dataGridView1.DataSource = dataSet.Tables[0];
                    listBox1.DataSource = dataSet.Tables[0];
                }
                catch (XmlException ex)
                {
                    Console.WriteLine("XML parsing error: " + ex.Message);
                }
            }
            static string RemoveUnrecognizedCharacters(string input)
            {
                return Regex.Replace(input, @"[^\x09\x0A\x0D\x20-\xD7FF\xE000-\xFFFD\x10000-x10FFFF]", string.Empty);
            }
        }
    }
    

    User's image

    Best Regards,

    Jiale


    If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment". 

    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.