Polybase - Load CSV file that contains text column with commas surrounded by quotes

satish umapathy 26 Reputation points
2020-10-06T13:29:07.16+00:00

I'm trying to load a CSV file using Polybase into Synapse.

Here is a sample file

product_id,product_name,aisle_id,department_id
1,Chocolate Sandwich Cookies,61,19
2,All-Seasons Salt,104,13
3,Robust Golden Unsweetened Oolong Tea,94,7
4,Smart Ones Classic Favorites Mini Rigatoni With Vodka Cream Sauce,38,1
5,"Three Cheese Ziti, Marinara with Meatballs",38,1
6,Green Chile Anytime Sauce,5,13

Because of rows like in bold above , I'm unable to load the data. When there is a comma in the column value it is surrounded by quotes. But Polybase is not able to handle this. It interprets the comma as a column delimiter and throws an error.

The error I get is HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: HadoopSqlException: Error converting data type NVARCHAR to INT.

Below is the code I'm using.

CREATE EXTERNAL FILE FORMAT skipHeader_CSV
WITH (FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS(
FIELD_TERMINATOR = ',',
FIRST_ROW = 2,
USE_TYPE_DEFAULT = True,
Encoding = 'UTF8'
)
)

CREATE EXTERNAL TABLE dbo.EXT_products (
[product_id] INT NOT NULL,
[product_name] VARCHAR(200) NULL,
[aisle_id] INT NULL,
[department_id] INT NULL
)
WITH (
LOCATION = '/Instacart/products.csv',
DATA_SOURCE = sampledwsa,
FILE_FORMAT = skipHeader_CSV
)

Thank you.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,935 questions
0 comments No comments
{count} votes

Accepted answer
  1. HarithaMaddi-MSFT 10,136 Reputation points
    2020-10-07T09:54:03.947+00:00

    Hi @satish umapathy ,

    Welcome to Microsoft Q&A Platform. Thanks for posting the query.

    Please use the below external file format along with "STRING_DELIMITER" option that will ensure polybase to understand data inside quotes as an single value and the data is successfully loaded as shown in below snap.

    CREATE EXTERNAL FILE FORMAT skipHeader_CSV  
    WITH (FORMAT_TYPE = DELIMITEDTEXT,  
    FORMAT_OPTIONS(  
    FIELD_TERMINATOR = ',',  
    STRING_DELIMITER = '"',  
    FIRST_ROW = 2,  
    USE_TYPE_DEFAULT = True,  
    Encoding = 'UTF8'  
    )  
    )  
    

    30439-image.png

    Hope this helps! Please let us know for further queries and we will be glad to assist.

    1 person found this answer helpful.

2 additional answers

Sort by: Most helpful
  1. Garth Colasurdo 1 Reputation point
    2021-03-10T18:56:22.077+00:00

    We are running across this error as well. We have fields that have comma separated information. Generally the Accepted Answer is correct.

    However we still have issues. To get around this we have tried using Pipe ('|') as the field terminator rather than comma (',') AND double quotes for the string delimiter. But, polybase still finds the commas in the field and turns them into terminators. We even tired switching the file extension to .txt. Is there something else we are missing?

    76377-pipe2csv.png

    0 comments No comments

  2. Garth Colasurdo 1 Reputation point
    2021-03-10T19:49:11.857+00:00

    To answer my own question: https://stackoverflow.com/questions/32727369/escaping-quotes-in-polybase-with-string-delimiter

    If you use Pipe terminators, don't use anything for the string. Weird, but it worked.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.