Share via


Using Unicode with XML Data

SQL Server stores XML data using the UTF-16 encoding scheme. Because UTF-16 data is variable-width, it is processed according to a byte-oriented protocol. This means that UTF-16 data can be treated in a way that is independent of the byte ordering on different computers (little endian versus big endian). Therefore, UTF-16 is well-suited for traversing different computers that use different encodings and byte-ordering systems. Because XML data is typically shared widely across networks, it makes sense to maintain the default UTF-16 storage of XML data in your database, and when you export XML data to clients.

If you must specify a different encoding, you can use FOR XML requests and specify the following:

  • The Output Encoding property of an XML-formatted data stream Response object in Active Server Pages (ASP).

    For example, the following ASP code tells the browser to display an incoming XML data stream in UCS-2:

    <% cmdXML.Properties("Output Encoding") = "UCS-2" %>
    
  • An output encoding in a URL when you make an HTTP request.

    The following example specifies UCS-2 as the output encoding of the XML document returned by this request:

    http://IISServer/nwind?sql=SELECT+*+FROM+Customers+FOR+XML+AUTO&outputencoding=UCS-2
    
  • An output encoding in an XML template or style sheet.

    The following example specifies UCS-2 as the output encoding in the header of this XML template document:

    <?xml version ='1.0' encoding='UCS-2'?>
     <root xmlns:sql='urn:schemas-microsoft-com:xml-sql'
           sql:xsl='MyXSL.xsl'>
       <sql:query>
          SELECT FirstName, LastName FROM Employees FOR XML AUTO  
       </sql:query>
    </root> 
    

Note that if an encoding is specified directly in an XSL style sheet, it will override whatever encoding is specified in the template. However, both are overridden by the Output Encoding property specified on the ASP page.

When you insert data into a SQL Server database by using OPENXML, you should specify Unicode data types anywhere in the rowset where multilingual data might appear. This will minimize character corruption.

See Also

Reference