Why is my CSV File not being parsed correctly by Azure Data Factory?

Daniel Despain 20 Reputation points
2024-10-22T04:05:10.41+00:00

I am trying to parse a csv file from an external provider into a MySQL database using Azure Data Factory. Some of the rows parse fine, but Azure Data Factory is unable to correctly parse a row that has "" followed at any point by a comma like "This is 25"" wide, and 500' long"

For example, this snippet:

0.00,"EZ Load Gray End Cap Laminating Roll Film, Gloss, 25"" x 500', 1.5 mil, 2 Rolls",CHN

Should be parsed into:

  • 0.00
  • EZ Load Gray End Cap Laminating Roll Film, Gloss, 25" x 500', 1.5 mil, 2 Rolls
  • CHN

But that's not what's happening. It's getting parsed into:

  • 0.00
  • EZ Load Gray End Cap Laminating Roll Film, Gloss, 25" x 500',
  • 1.5 mil
  • 2 Rolls
  • CHN

I assume it has something to do with the "" and then it sees the next , as a field delimiter even though it's actually inside the larger quoted string. It's able to parse other chunks like:

0.00,"A, b, and c","column 3"

The problem only seems to happen when the double-double and comma(s) are inside a larger string.

How can I resolve this issue?

I have the Quote character set to "Double quote," and the delimiter set to comma, but there doesn't seem to be anywhere to tell Azure Data Factory how to handle "escaped quotes" like ""

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,859 questions
{count} votes

Accepted answer
  1. Chandra Boorla 2,990 Reputation points Microsoft Vendor
    2024-10-23T00:37:40.95+00:00

    Hi @Daniel Despain

    I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others ", I'll repost your solution in case you'd like to accept the answer.

    Issue:

    I am trying to parse a csv file from an external provider into a MySQL database using Azure Data Factory. Some of the rows parse fine, but Azure Data Factory is unable to correctly parse a row that has "" followed at any point by a comma like "This is 25"" wide, and 500' long"

    For example, this snippet:

    0.00,"EZ Load Gray End Cap Laminating Roll Film, Gloss, 25"" x 500', 1.5 mil, 2 Rolls",CHN

    Should be parsed into:

    • 0.00
    • EZ Load Gray End Cap Laminating Roll Film, Gloss, 25" x 500', 1.5 mil, 2 Rolls
    • CHN

    But that's not what's happening. It's getting parsed into:

    • 0.00
    • EZ Load Gray End Cap Laminating Roll Film, Gloss, 25" x 500',
    • 1.5 mil
    • 2 Rolls
    • CHN

    I assume it has something to do with the "" and then it sees the next , as a field delimiter even though it's actually inside the larger quoted string. It's able to parse other chunks like:

    0.00,"A, b, and c","column 3"

    The problem only seems to happen when the double-double and comma(s) are inside a larger string.

    How can I resolve this issue?

    I have the Quote character set to "Double quote," and the delimiter set to comma, but there doesn't seem to be anywhere to tell Azure Data Factory how to handle "escaped quotes" like ""

    Solution:

    I am using the Copy Data action. I had looked at that article, but somehow missed that it was setting the ESACAPE character. I thought it was setting the QUOTE character - which I already had at double quotes.

    Setting the ESCAPE character seems to have solved my issue.

    If I missed anything please let me know and I'd be happy to add it to my answer, or feel free to comment below with any additional information.

    Hope this helps. Do let us know if you have any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.