Additional columns created in CSV files while loading to synapse

kshitij jain 1 Reputation point
2021-06-18T09:57:14.083+00:00

HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: HadoopExecutionException: Not enough columns in this line.

My CSV file contain columns in which there are multiple values in a cell separated by comma, when I am trying to load this in synapse, it is throwing this error. I am sharing the external file format settings below:

CREATE EXTERNAL FILE FORMAT ArcdummyFileFormat
WITH
(
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS
(
FIELD_TERMINATOR = ',',
STRING_DELIMITER = '"',
FIRST_ROW = 2,
USE_TYPE_DEFAULT = True,
Encoding = 'UTF8'
)
)

What changes should i make in the file format or is there any other way to resolve this issue ?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,342 questions
{count} votes

1 answer

Sort by: Most helpful
  1. MartinJaffer-MSFT 26,011 Reputation points
    2021-06-18T20:19:09.067+00:00

    Hello @kshitijjain-0490 and welcome to Microsoft Q&A.

    If I understand you correctly, your data looks like:

    ColumnX, ColumnY, ColumnZ
    DataX, DataYpart1,DataYpart2,DataYpart3, DataZ
    

    If this is the case, it is not a valid CSV. You should try with a different delimiter such as | or ; or tab. Then it would look like:

    ColumnX | ColumnY | ColumnZ
    DataX | DataYpart1,DataYpart2,DataYpart3 | DataZ
    

    Does this make sense?

    0 comments No comments