Bug in ADF CSV Escaping
I have exported some data from SQL Server into a CSV file (copyData)
The actual text is:
leading text \"Jane Doe\" trailing text
It comes into the CSV file as:
"leading text \"Jane Doe\" trailing text"
Which fails on input parsing because the field now looks like:
"leading text \"Jane Doe\" trailing text"
It should be escaped as:
"leading text \\"Jane Doe\\" trailing text"
Azure Data Factory
-
Saurabh Sharma 23,811 Reputation points • Microsoft Employee
2020-12-05T00:54:55.257+00:00 @Alex Peake Thanks for using Microsoft Q&A.
Are you importing this CSV to a blob. Also, can you please let me know when you say "fails on input parsing" in which activity it fails. It would be nice if you can provide a screenshot.
-
Alex Peake 1 Reputation point
2020-12-05T22:06:40.003+00:00 I extract the data, using ADF, to a csv Blob. It is when I import the csv Blob into another system that it fails (for obvious reasons - \" gets interpreted as literal \ followed by ", where the " acts as a field terminator, but is just in the middle of the text string) . The system, BTW, is Snowflake, where we use COPY INTO
-
Saurabh Sharma 23,811 Reputation points • Microsoft Employee
2020-12-07T20:27:54.723+00:00 @Alex Peake Have you tried using setting the source dataset properties, *Escape character** and Quote character to No Escape character and No quote character. It will make the string come as it is available in SQL table.
Result:
Please let me know if this works for your scenario.
-
Alex Peake 1 Reputation point
2020-12-08T00:41:21.673+00:00 Let's walk through the actual example (I added a comma to show the issues), assuming correct export as you suggest :
leading text \"Jane Doe\" trailing, text
Now on import into the next database, what do you do? Escape, quote, or not? If you do not escape and we set no quote character, then the first a comma in the field looks like end of field, which it is not. If we quote, then the first quote looks like end of field. If we escape and quote, the field arrives without the escape character, which should be in the real data.
['1', 'leading text "Jane Doe" trailing, text', '2']
['1', 'leading text \Jane Doe\" trailing', ' text"', '2']
['1', '"leading text \"Jane Doe\" trailing', ' text"', '2']
-
Saurabh Sharma 23,811 Reputation points • Microsoft Employee
2020-12-11T02:01:17.647+00:00 @Alex Peake
Sorry if I am not fully understanding you scenario but while moving data from the imported csv to destination if the I use no quote character or no escape, then also I am getting the same data as in the source. Please see the below gif which I have created where I am first importing SQL data with the \" characters into a Blob container then using the same blob to write to the final SQL table. You can see the source and destination tables has the same data.
-
Alex Peake 1 Reputation point
2020-12-11T14:16:55.197+00:00 Sorry that I have not explained clearly. I have a single field in SQL Server, which I will delineate with [...], to show what data is actually in the database:
[leading text \"Jane Doe\" trailing, text]
When ADF exports it, the correct way to escape this is:
"leading text \\"Jane Doe\\" trailing, text"
What comes out is:
"leading text \"Jane Doe\" trailing, text"
which when imported, comes in as two fields, the first being incorrect, the second now constitutes too many fields:
[leading text \"Jane Doe\" trailing][text] with your suggestion,
or in my original, with quote = " and escape as \
The issue is ADF exporting from SQL Server incorrectly escapes.
[leading text "^^^ <- error: expected comma following quote
-
Saurabh Sharma 23,811 Reputation points • Microsoft Employee
2020-12-15T01:30:55.263+00:00 @Alex Peake Thanks for sharing the details. To my knowledge the escaping is as expected, however if you like someone to look into this into your environment and data, please create a support ticket. In case if you have any limitations creating a support ticket please let me know.
-
Alex Peake 1 Reputation point
2020-12-18T20:34:27.377+00:00 Thanks for the work that you have put into this. The difficulty is that the final sink is Snowflake, and the files are very large, with all sorts of stuff in the fields.
I can get it to work if I follow your instructions up to the last, and then manually run copy into in Snowflake, with no escape character. However, if I use the ADF Snowflake copy into, I find that I cannot set the Escape character. ADF automatically sets it to \ and so this fails. What would you suggest?
-
MartinJaffer-MSFT 26,086 Reputation points
2020-12-21T18:18:24.603+00:00 Hi @Alex Peake , quick question, were you able to see the private message Saurab sent you?
-
MartinJaffer-MSFT 26,086 Reputation points
2020-12-24T00:53:22.173+00:00 @Alex Peake if you found your own solution could you please share it here with the community?
-
Alex Peake 1 Reputation point
2020-12-24T01:00:43.28+00:00 No solution found. I do not believe that I received a private message from Saurab. Only the above messages.
Sign in to comment