I'm trying to load a CSV file using Polybase into Synapse.
Here is a sample file
product_id,product_name,aisle_id,department_id
1,Chocolate Sandwich Cookies,61,19
2,All-Seasons Salt,104,13
3,Robust Golden Unsweetened Oolong Tea,94,7
4,Smart Ones Classic Favorites Mini Rigatoni With Vodka Cream Sauce,38,1
5,"Three Cheese Ziti, Marinara with Meatballs",38,1
6,Green Chile Anytime Sauce,5,13
Because of rows like in bold above , I'm unable to load the data. When there is a comma in the column value it is surrounded by quotes. But Polybase is not able to handle this. It interprets the comma as a column delimiter and throws an error.
The error I get is HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: HadoopSqlException: Error converting data type NVARCHAR to INT.
Below is the code I'm using.
CREATE EXTERNAL FILE FORMAT skipHeader_CSV
WITH (FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS(
FIELD_TERMINATOR = ',',
FIRST_ROW = 2,
USE_TYPE_DEFAULT = True,
Encoding = 'UTF8'
)
)
CREATE EXTERNAL TABLE dbo.EXT_products (
[product_id] INT NOT NULL,
[product_name] VARCHAR(200) NULL,
[aisle_id] INT NULL,
[department_id] INT NULL
)
WITH (
LOCATION = '/Instacart/products.csv',
DATA_SOURCE = sampledwsa,
FILE_FORMAT = skipHeader_CSV
)
Thank you.