Not able to load Chinese special characters from Excel file to SQL table (Sink)

ADF_Coder 0 Reputation points
2024-02-28T07:37:09.7466667+00:00

Hi, We are trying to load a Excel file(.xlsx) file data with Chinese Special Characters in Azure Data Factory to Sql Server tables and it fails to render and load Chinese characters. DB Collation is "SQL_Latin1_General_CP1_CI_AS"

We notice that the Japanese special characters are substituted with ‘?’ in the database after loading. There are no options for UTF-8/UTF-16 for excel. Can you please provide a resolution or fix for this in ADF.

Azure SQL Database
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,656 questions
Microsoft 365 and Office | Excel | For business | Windows
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Harishga 6,005 Reputation points Microsoft External Staff
    2024-02-28T13:41:57.7033333+00:00

    Hi @Vrushabh Malbari

    Welcome to Microsoft Q&A platform and thanks for posting your question here.  

    The issue you are facing is due to the fact that the DB Collation is set to "SQL_Latin1_General_CP1_CI_AS", which does not support Chinese characters. To resolve this issue, you need to change the collation of the database to one that supports Chinese characters, such as "Chinese_PRC_CI_AS" or "Chinese_Taiwan_Stroke_CI_AS".  

    As you mentioned that there are no options for UTF-8/UTF-16 for Excel, you can still save the Excel file as a CSV file and specify the encoding as UTF-8. To do this, follow these steps:

    •  Open the Excel file and select "Save As".
    • Choose "CSV (Comma delimited)" as the file type and click "Save".
    • In the "Save As" dialog box, click on the "Tools" dropdown and select "Web Options".
    • In the "Web Options" dialog box, select the "Encoding" tab and choose "Unicode (UTF-8)" as the encoding.
    • Click "OK" to close the dialog box and save the file.

    Once you have saved the Excel file as a CSV file with UTF-8 encoding, you can use Azure Data Factory to load the data into SQL Server tables. Make sure to specify the correct encoding when configuring the data source in Azure Data Factory.

    Reference
    https://learn.microsoft.com/en-us/sql/relational-databases/collations/collation-and-unicode-support?view=sql-server-ver15
    https://support.microsoft.com/en-us/office/import-or-export-text-txt-or-csv-files-5250ac4c-663c-47ce-937b-339e391393ba

    I hope this helps! Let me know if you have any further questions or concerns.

    0 comments No comments

  2. Amira Bedhiafi 34,491 Reputation points Volunteer Moderator
    2024-02-28T14:15:19.5133333+00:00

    In your case, the database collation is "SQL_Latin1_General_CP1_CI_AS", which primarily supports the Latin alphabet and may not correctly handle Chinese characters or other Unicode characters. You may try the "Chinese_PRC_CI_AS" collation for chinese characters. Another point, the NVARCHAR data type supports Unicode data and should be used for columns that will store Chinese characters. If your table uses VARCHAR, it may not correctly store Unicode characters, leading to the substitution of "?" for unsupported characters.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.