Not able to load Japanese special characters in csv files using Data Flow task in ADF.

Pratyay Bhattacharjee 20 Reputation points
2023-04-28T17:49:59.5033333+00:00

Hi,

We are trying to load a csv file with Japanese Special Characters using data flow task in Azure Data Factory to Sql Server tables and it fails to render and load Japanese characters.

We notice that the Japanese special characters are substituted with ‘?’ in the database after loading. We tried different options by changing the encoding to other types (such as (Default)UTF-8, UTF-16LE, UTF-8 without BOM etc.) but still it did not help.

But when we use a Data Copy task in ADF to load the data with Japanese special characters, it loads perfectly fine.

However, we still need to use data flow task only in the pipeline as there are some transformations needed on the data which data copy task isn’t able to support. Also, we don’t want to implement a different pipeline design just for this special character issue by replacing the data flow task with something else as that might create issues for code maintenance and we want a standard consistent design to maintain code reusability.

So, this seems to be an Azure product bug that the data flow task in ADF is not able to render and load specific special characters even with UTF-8 encoding.

Can you please provide a resolution or fix for this using Data Flow task in ADF.

Thanks,

Pratyay

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,925 questions
{count} votes

Accepted answer
  1. BhargavaGunnam-MSFT 28,271 Reputation points Microsoft Employee
    2023-05-01T21:29:50.8433333+00:00

    Hello Pratyay Bhattacharjee ,

    Welcome to the MS Q&A platform.

    Could you please send a sample CSV file(please remove any sensitive data) to reproduce the issue from my end?

    Meanwhile, please try the below steps:

    • Open the CSV file in a text editor like Notepad++ and check the encoding of the file. Make sure that it is saved in UTF-8 encoding.
    • In the Data Flow task, add a Derived Column transformation and use the following expression to convert the column with Japanese special characters to Unicode:

    replace(columnName, '', N'Unicode equivalent')

    Replace ''with the Japanese special characters,

    • After the Derived Column transformation, add a Sink transformation to load the data into the SQL Server table.
    • In the Sink transformation, ensure the encoding is set to UTF-8.
    • Save and run the pipeline to load the data into the SQL Server table.

    Please try and let me know if you see any issues.

    I hope this helps.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful