SYNAPSE - ERROR HdfsBridge::recordReaderFillBuffer - External Table - Parquet file format

Vinny Paluch 31 Reputation points
2021-01-21T15:22:19.193+00:00

I have parquet files created using Databricks on the following format:
59129-image.png

I created an External file format for Parquet files and a External table with the following especs:
79216-im2.png

When I try to read any data I'm getting the following error:
59077-image.png

I even tried changing all columns to varchar(500), but no success.
79139-im4.png

I have no clue what's going on.
If I try CSV files it works just fine.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,359 questions
{count} votes

Accepted answer
  1. HimanshuSinha-msft 19,376 Reputation points Microsoft Employee
    2021-01-21T23:30:30.317+00:00

    Hello @Vinny Paluch ,
    Thanks for the ask and using the forum .
    I am confident that the data types are not mapped correctly . Not sure if you have gone thorugh this doc : https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/design-elt-data-loading#define-the-tables

    On a quick look i see that the drawno and Sessionrundate are not pobabaly mapped to the correct data type .

    Thanks
    Himanshu


4 additional answers

Sort by: Most helpful
  1. Fosse83 1 Reputation point
    2021-03-18T09:17:38.25+00:00

    I have a follow up on Synapse and parquet error messages:

    I am getting this error when creating an external table in my dedicated pool, or when trying to use bulk copy. The preview works fine in Synapse Studio:

    HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: ArrayIndexOutOfBoundsException: arraycopy: length -1 is negative

    0 comments No comments

  2. Vinny Paluch 31 Reputation points
    2021-03-18T11:31:10.82+00:00

    I realized that one of the columns was using a incorrect datatype.. from IntegerType() to DoubleType().. that fixed my issue.. I don't know why the message wasn't clear about the error..
    The column is a calculated one with no casting on Databricks notebook.. I believe that in some cases the value range was above integer or may led to decimal value.. I believe that the Preview was using the parquet file with integer definition.. and when processing and reaching the double values the exception was raised.. I solved forcing cast to double and recreating the parquet files.

    0 comments No comments

  3. Fosse83 1 Reputation point
    2021-03-18T12:11:07.603+00:00

    Any clues on where to find a place to interpretate the error messages?

    0 comments No comments

  4. Nish Nannapaneni 0 Reputation points
    2023-11-13T20:34:01.2766667+00:00

    I used COPY INTO and I was getting the same error. It worked for me after I set the FILE_TYPE = 'PARQUET' and AUTO_CREATE_TABLE = 'ON'.

    0 comments No comments