Share via

Azure Data Factory: Comma causing unwanted column splitting when copying from Excel to CSV

chrisw 41 Reputation points
2020-10-28T16:06:11.623+00:00

This might seem like the kind of question that has been asked many times, but I can’t seem to find an answer to my specific problem on this forum or elsewhere. I hope that I’m not missing something obvious.

I am using an ADF (v2) copy activity to convert an Excel file to a (new) CSV file. Some of the column headers in the Excel source file have a single comma. Example: “foo, bar”. When inspecting the output CSV file with a text editor, the column headers are not quoted even though “Quote All Text” is selected on the Sink tab of the copy activity. Data in all other rows are quoted as expected. When I open the CSV file in the Excel UI, each column containing a comma in the header is split into two fields. For example, the (single) column “foo, bar” from the Excel file appears as two separate columns in the CSV: “foo” and “bar”, which is undesired.

The only potential solution I can think of is to uncheck “First Row as Header” in the Excel dataset so that the copy activity assigns “Prop_0”, “Prop_1” as the headers in the CSV and then create a second CSV from the first one, with “Skip Line Count” set to 1 in the Source settings on the copy activity. I haven’t tested this solution, but it seems inefficient and potentially costly to scale.

Does anyone know of a better solution?

Thanks in advance for any advice.

Azure Data Factory
Azure Data Factory

An Azure service for ingesting, preparing, and transforming data at scale.


Answer accepted by question author

  1. MartinJaffer-MSFT 26,161 Reputation points
    2020-10-30T00:44:21.963+00:00

    @chrisw there is one more solution, not sure why I didn't think of it before.

    In Both the source Excel and sink CSV datasets, disable "First row as header". In the copy activity, leave the mapping empty. Leave the CSV dataset schema empty.

    Since the first row is not treated as header in either side, the first row gets treated as data. Since the "Quote all text" works on data, the first row with its comma gets quoted correctly.

    Before you worry about there not being headers, recall that visually, there is nothing to distinguish header from text. Headers are an artifact of intent in reading/writing data.

    36038-image.png
    36166-image.png
    36211-image.png
    36050-image.png

    Was this answer helpful?

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.