"An invalid character was found" errors when upgrading sp_xml_preparedocument from SQL Server 2000 SP3 to SP4 or SQL Server 2005

Recently, I received several customer reports, that sp_xml_preparedocument started rasing the following errors after upgrading their XML-based application from SQL Server 2000 SP3a (or earlier) to either SP4 or SQL Server 2005:

Msg 6602, Level 16, State 2, Procedure sp_xml_preparedocument, Line 1
The error description is 'An invalid character was found in text content.'.
Msg 6607, Level 16, State 3, Procedure sp_xml_removedocument, Line 1
sp_xml_removedocument: The value supplied for parameter number 1 is invalid.

This error is now raised because of a stricter error-discovery during parsing. When we moved from MSXML 2.6 to MSXMLSQL for SQL Server 2000 SP4 and SQL Server 2005, we fixed a couple of bugs in the parser that could lead to data corruption (invalid data being parsed). As a consequence, you are now receiving this error code instead of having invalid characters accepted.

The consequence is that one has to be more explicit with setting the encoding. This will also help mitigate against involuntary data corruption (see below for an example). There are two ways to fix an application (besides moving to an XML datatype and the nodes() method):

  1. Make sure that the XML document when passed in a TEXT or (VAR)CHAR argument is compatible with the default code page of the database (or the string type) by either
    1. setting the encoding property in the XML declaration (e.g., when the code page is ISO-Latin1), or
    2. by making sure that the code page of the database (or string type) can preserve all UTF-8 code points that will ever be passed in through the string.
  2. Change the type of the argument to NTEXT (or N(VAR)CHAR) and pass in the XML in UTF-16 encoding (an XML declaration is optional).

Please continue for the technical details...