SYNAPSE - ERROR HdfsBridge::recordReaderFillBuffer - External Table - Parquet file format

Question

I have parquet files created using Databricks on the following format:

I created an External file format for Parquet files and a External table with the following especs:

When I try to read any data I'm getting the following error:

I even tried changing all columns to varchar(500), but no success.

I have no clue what's going on.
If I try CSV files it works just fine.

Accepted Answer

Hello @Vinny Paluch ,
Thanks for the ask and using the forum .
I am confident that the data types are not mapped correctly . Not sure if you have gone thorugh this doc : https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/design-elt-data-loading#define-the-tables

On a quick look i see that the drawno and Sessionrundate are not pobabaly mapped to the correct data type .

Thanks
Himanshu

Answer

I have a follow up on Synapse and parquet error messages:

I am getting this error when creating an external table in my dedicated pool, or when trying to use bulk copy. The preview works fine in Synapse Studio:

HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: ArrayIndexOutOfBoundsException: arraycopy: length -1 is negative

Answer

I realized that one of the columns was using a incorrect datatype.. from IntegerType() to DoubleType().. that fixed my issue.. I don't know why the message wasn't clear about the error..
The column is a calculated one with no casting on Databricks notebook.. I believe that in some cases the value range was above integer or may led to decimal value.. I believe that the Preview was using the parquet file with integer definition.. and when processing and reaching the double values the exception was raised.. I solved forcing cast to double and recreating the parquet files.

Answer

Any clues on where to find a place to interpretate the error messages?

Answer

I used COPY INTO and I was getting the same error. It worked for me after I set the FILE_TYPE = 'PARQUET' and AUTO_CREATE_TABLE = 'ON'.

SYNAPSE - ERROR HdfsBridge::recordReaderFillBuffer - External Table - Parquet file format

4 additional answers