Import Unicode character

Spunny 326 Reputation points
2022-08-17T21:49:32.237+00:00

Hi,
I get data in csv file. Some of the columns have unicode data like 'Württemberg'.
I need to import this into sql server table column. I am using openrowset and xml definition file. But it is not getting imported or character are coming out differently like 'W++rttemberg'
My csv file is comma separated and double quoted. it is not fixed length

How should I get or import exactly.
I can't follow what is said in below link because mine is not fixed width.
https://dba.stackexchange.com/questions/303337/is-it-possible-to-use-openrowset-to-import-fixed-width-utf8-encoded-files/303341#303341

My xml definition file looks like this:
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="Filler1" xsi:type="CharTerm" TERMINATOR='"' />
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR='","' />
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR='","' />
<FIELD ID="3" xsi:type="CharTerm" TERMINATOR='","' />
<FIELD ID="4" xsi:type="CharTerm" TERMINATOR='","' />
<FIELD ID="5" xsi:type="CharTerm" TERMINATOR='","' />
<FIELD ID="6" xsi:type="CharTerm" TERMINATOR='","' />
<FIELD ID="7" xsi:type="CharTerm" TERMINATOR='","' />
<FIELD ID="8" xsi:type="CharTerm" TERMINATOR='","' />
<FIELD ID="9" xsi:type="CharTerm" TERMINATOR='","' />
<FIELD ID="10" xsi:type="CharTerm" TERMINATOR='\n' />
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="C" xsi:type="SQLVARYCHAR" LENGTH="25" />
<COLUMN SOURCE="5" NAME="R" xsi:type="SQLVARYCHAR" LENGTH="10" />
<COLUMN SOURCE="8" NAME="Rname" xsi:type="SQLVARYCHAR" LENGTH="250" />
<COLUMN SOURCE="9" NAME="lan" xsi:type="SQLVARYCHAR" LENGTH="10" />
</ROW>
</BCPFORMAT>

SQL query looks like this:
declare @sqlCmd nvarchar(4000)
SET @sqlCmd = 'SELECT C, R, Rname, lan FROM OPENROWSET(BULK ''' + @usRegionsFileName + ''',
FORMATFILE=''' + @regionsDefinitionFileName + ''',
FIRSTROW=2) AS T
WHEN I do
SELECT @sqlcmd, I get
SELECT C, R, Rname, lan FROM OPENROWSET(BULK '\servername\foldername\s.csv',
FORMATFILE='\servername\foldername\sdef.xml', FIRSTROW=2)
AS T

This results in getting W++rttemberg instead of Württemberg
Please let me know how to handle.

Thank You

Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,601 questions
{count} votes

4 answers

Sort by: Most helpful
  1. Viorel 114.7K Reputation points
    2022-08-17T22:59:31.797+00:00

    Try specifying the codepage:

    SELECT . . . FIRSTROW=2, CODEPAGE = '65001') AS T


  2. Yitzhak Khabinsky 25,731 Reputation points
    2022-08-17T23:04:22.207+00:00

    Hi @Spunny ,

    I hope your target table columns defined as NVARCHAR(...) data type.

    Also, you need to modify a section in your XML format file as follows:

    <ROW>  
    	<COLUMN SOURCE="1" NAME="C" xsi:type="SQLNVARCHAR" LENGTH="25"/>  
    	<COLUMN SOURCE="5" NAME="R" xsi:type="SQLNVARCHAR" LENGTH="10"/>  
    	<COLUMN SOURCE="8" NAME="Rname" xsi:type="SQLNVARCHAR" LENGTH="250"/>  
    	<COLUMN SOURCE="9" NAME="lan" xsi:type="SQLNVARCHAR" LENGTH="10"/>  
    </ROW>  
    

  3. Spunny 326 Reputation points
    2022-08-18T18:03:59.933+00:00

    I had to take both above responses and combined. But still didn't work.
    For CODEPAGE, I had to give value of 'ACP' as suggested in --https://dba.stackexchange.com/questions/159564/not-displaying-special-characters-imported-from-a-file link.
    Once I used, ACP it worked.