How to handle special characters in my input csv file which is loading data into synapse table

Vivek Komarla Bhaskar 956 Reputation points
2023-11-17T09:56:55.58+00:00

Using Azure data factory data flows, when I attempt to load user-related information that includes special characters and weird usernames into the Synapse table, I am not able to add the data as it is in files. Instead, it replaces with '?' Or Interpreting it differently. Here is an example from my file and the data that is loaded into my synapse table.

Below is the data in my input .csv file:

User.txt

Screenshot 2023-11-17 at 09.41.35

Below is the data loaded on my synapse table:

Screenshot 2023-11-17 at 09.41.45

Below are my settings for mapping data flow for my .CSV source file:

Screenshot 2023-11-17 at 09.44.52

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,624 questions
{count} votes

Accepted answer
  1. KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator
    2023-11-28T21:30:11.64+00:00

    @Vivek Komarla Bhaskar I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others (Opens in new window or tab)", I'll repost your solution in case you'd like to accept the answer (Opens in new window or tab).

    Issue:

    • While trying to load source data which includes special characters and weird usernames into the Synapse table using ADF Mapping data flows, user not able to add the data as it is in files. Instead, it replaces with '?' Or interpreting it differently.

    Below is the source data:
    Screenshot 2023-11-17 at 09.41.35

    Below is how data loaded into Synapse table:

    Screenshot 2023-11-17 at 09.41.45

    Solution:

    • Issue is sorted now by changing the datatype for the Username column on table from VARCHAR to NVARCHAR. Varchar stores Non-unicode or English character data types, and it can contain a maximum of 8000 characters. It only supports ASCII values. Nvarchar stores Unicode or Non-English character data types, and it can contain a maximum of 4000 characters. It supports ASCII values as well as special characters.

    If I missed anything please let me know and I'd be happy to add it to my answer, or feel free to comment below with any additional information.

    I hope this helps others!

    Thank you again for your time and patience throughout this issue.

    0 comments No comments

2 additional answers

Sort by: Most helpful
  1. Amira Bedhiafi 33,071 Reputation points Volunteer Moderator
    2023-11-18T10:54:01.4033333+00:00

    Your issue is related to encoding. When special characters are replaced with a '?' or interpreted differently, it typically indicates that the encoding setting in ADF doesn't match the encoding of your input file.

    Common encodings that support a wide range of characters include UTF-8, UTF-16, and Windows-1252.

    Once you've identified the encoding of your CSV file, you should set the same encoding in the ADF data flow's source settings. If your CSV file is encoded in UTF-8, you should set the encoding to UTF-8 in ADF as well.

    Also try to check your DB collation.

    https://stackoverflow.com/questions/3710374/get-encoding-of-a-file-in-windows

    https://stackoverflow.com/questions/66255548/check-the-csv-file-encoding-in-data-factory


  2. Subashri Vasudevan 11,226 Reputation points
    2023-11-18T11:19:28.42+00:00

    Hello Vivek Komarla Bhaskar

    I just tried to replicate your issue. But unfortunately, it works fine for me.

    1. i downloaded the .txt file attached and uploaded to my ADLS gen2 container.
    2. created a data flow and pointed to the file in ADLS, settings below -

    Screenshot 2023-11-18 at 4.41.48 PM

    1. From my data flow, i just copied records to my AZ SQL DB table. (no transformations in between)
    2. see the data loaded to my table below

    Screenshot 2023-11-18 at 4.44.33 PM

    Note: My collation in DB is SQL_Latin1_General_CP1_CI_AS. You can also verify this using below query.

    select databasepropertyex(DB_NAME(),'Collation') as collationame
    
    
    

    Similar thread that discusses about Japanese characters: https://learn.microsoft.com/en-us/answers/questions/1265978/not-able-to-load-japanese-special-characters-in-cs

    Please try once again and let us know.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.