CSV Delimited Text More Columns Than Defined - Data Factory

Sundar Sriram Garimella 136 Reputation points
2022-07-06T18:16:37.147+00:00

My Source is a CSV (Comma Delimited) file, it is having some columns where there are values separated by comma

eg: 218208-image.png

While loading the file the pipeline in Data Factory is failing whenever there is a column mismatch since it is taking each value is a new column and throwing an error with column mismatch in the definition.

ErrorCode=DelimitedTextMoreColumnsThanDefined,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error found when processing 'Csv/Tsv Format Text' source 'ABC2022-05-02.csv' with row number 2: found more columns than expected column count 30.,Source=Microsoft.DataTransfer.Common,'

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,624 questions
0 comments No comments
{count} votes

Accepted answer
  1. KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator
    2022-07-06T22:47:58.967+00:00

    Hello @Sundar Sriram Garimella ,

    Thanks for the question and using MS Q&A platform.

    As per my understanding since your source data contains a comma as column delimiter as well as few of the column values has comma in them the copy activity is treating them as seperate columns and, hence throwing the above error. Please correct if I misunderstood your problem.

    As this issue is purely related to data on your source, I don't think there is any out of box solution available in ADF. But below are few workarounds you could try:

    1. Workaround 1: You could configure or request the source team that is generating the comma separated files to double quote either all column field values or the only the columns with field values that contain comma(,) as shown below. 218392-image.png

    Then you could configure your source data set quoteCharacter to Double quote(") as shown below:

    218364-image.png

    This is to differentiate between comma in column separation and comma in field value.

    Alternatively, you could check whether you could change the delimiting character at the source.

    1. Wordaround 2: You could choose to skip rows that are incompatible (i.e. rows that has columns with field values having comma) with the below configuration under the setting section. 218373-image.png This setting skips all the rows that are incompatible, and you would also get a log stating which row in the file has been skipped. The copy activity is carried out successfully for the rest of the rows. And as needed you can fix those specific rows as discussed in workaround 1 and reprocess them. But this would be a manual effort if you would want to fix and reprocess them.

    Hope this will help. Please let us know if any further queries.


    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.