Azure Data Factory (ADF) Escape Character Issue When Quoting

AzureHero 41 Reputation points
2020-10-03T19:40:10.38+00:00

I've got a pipeline built in Azure Data Factory that's pulling data from SQL and storing it as csv. My Dataset for the csv has Escape Character set to a blackslash (\) and Quote Character set to Double Quote ("). The problem this causes is when I have an empty string with just a backslash (or at the end of a string value), which exists in my data quite a bit. Dozens of tables and columns.

When this happens the backslash at the end of the string (example: "this is my test\" can't be interpreted by other systems to load. The quote after the blackslash is ignored because it's being escaped and throws off the columns. I have tried using other characters such as "^" to escape but I also have "^" in many values across many table columns and the same issue can happen.

It seems this is an issue with ADF and whenever an escape character exists in data that is at the end of a value (or the only value) it will cause an error. How can I handle this without having to address it on the source side? Why doesn't data Factory convert escape characters to double backslash when it's in data? I would expect a backslash in a string value to get converted to "\\" even when quoting but it isn't happening. It only happens if you don't use quoting.

Thanks for the help!

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,534 questions
0 comments No comments
{count} votes

3 answers

Sort by: Most helpful
  1. HimanshuSinha-msft 19,376 Reputation points Microsoft Employee
    2020-10-05T22:16:41.167+00:00

    Hello ,
    Thanks for the question and also for using this forum.

    I did tried to repro the issue on my side . This is what i did , created a source table and inserted the dummy data which you suggested , created a pipeline with copy activity with a csv as the sink . Attached one more copy activity and the idea was to copy the records in the sink file to SQL to destination table . I was expecting it to fail as you mentioned . Unfortunately i was unable to repro and it successed just fine .

    DROP TABLE TEST10052020   
    CREATE TABLE TEST10052020   
    (  
    COL VARCHAR(MAX),COL1 VARCHAR(MAX),COL2 VARCHAR(MAX)  
    )  
      
    DROP TABLE TESTDistination  
    CREATE TABLE TESTDestination   
    (  
    COL VARCHAR(MAX),COL1 VARCHAR(MAX),COL2 VARCHAR(MAX)  
    )  
      
      
    insert into TEST10052020 values ('"this is my test\"','sometest','"\"')  
      
    select COL,COL1,COL2 from TEST10052020   
    select COL,COL1,COL2 from TESTDestination  
    

    30281-escapecharaterissue.gif

    When you say that "can't be interpreted by other systems to load" , what kind system are you refering to ? At least with this test , i think data factory is handling this pretty well .

    One more suggestion is may be you can explore the power of TSQL ( since I am not aware of the data size and other parameters so may be it does not work out for you ) . When you configure the source you can opt from TSQL and clean the data , something like ...

    SELECT REPLACE(COL,'\',' ')  
          ,COL1  
       ,REPLACE(COL2,'\',' ')    
      FROM TEST10052020  
    

    Thanks & stay safe
    Himanshu
    Please do consider to click on "Accept Answer" and "Up-vote" on the post that helps you, as it can be beneficial to other community members

    0 comments No comments

  2. AzureHero 41 Reputation points
    2020-10-06T00:54:29.327+00:00

    Thank you for the response, @HimanshuSinha-msft . Your example works because you have wrapped the values in quotes within the SQL table column. If you strip the quotes and instead do:

     insert into TEST10052020 values ('this is my test\','sometest','\')  
    

    You will see the output will cause issues in 3rd party systems trying to read the data. The value '\' is output in the file as "\". But because Backslash is the escape character, a 3rd party system will escape the second quote and it will fail. I tried using the exact data and changed the output file to json and the value was correctly output as "\" within the field.

    I tested trying to convert the output delimitedtext file to json and it doesn't work correctly. I have 2 copy activities, the first with a Source of SQL and Sink as DelimitedText. The second has a Source of DelimitedText (for the same file I created in the first activity) and a Sink of JSON. The second out file in JSON has incorrect data because it does not read the DelimtedText file correctly.


  3. Pablo Puente 1 Reputation point
    2022-12-06T13:31:49.173+00:00

    Hey, thanks all for your comments, I had the same issue, will try AzureHero Solution from ADLS gen 1 and then to gen 2. I changed the Scape Character, the approach I took is to look into the tables that are connected through that database, and the ERP system used to populate those tables, for specific characters that were never used there. From those I looked into the one that looked more promising and used it, the solution implemented is not in production yet, but it survived all test performed so far.