Error when converting CSV to Parquet file from blob storage to ADLS Gen2 on Synapse

Djedjiga Chikhi 6 Reputation points
2022-10-26T15:33:20.093+00:00

Hi, we are tying to copy CSV files from Blob storage to ADLS Gen2 (Parquet), and we are getting this error message, we tried to verify files one by one and for some folders, the pipeline did not run and than when we try again it run succesfully and if we try again it failed, please do you have any idea? Thank you.
"errorCode": "2200",
"message": "ErrorCode=ParquetJavaInvocationException,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=An error occurred when invoking java, message: org.apache.parquet.schema.InvalidSchemaException:Cannot write a schema with an empty group: message adms_schema {\n}\n\ntotal entry:11\r\norg.apache.parquet.schema.TypeUtil$1.visit(TypeUtil.java:27)\r\norg.apache.parquet.schema.TypeUtil$1.visit(TypeUtil.java:37)\r\norg.apache.parquet.schema.MessageType.accept(MessageType.java:58)\r\norg.apache.parquet.schema.TypeUtil.checkValidWriteSchema(TypeUtil.java:23)\r\norg.apache.parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:228)\r\norg.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:273)\r\norg.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:222)\r\norg.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:188)\r\ncom.microsoft.datatransfer.bridge.parquet.ParquetWriterBuilderBridge.build(ParquetWriterBuilderBridge.java:174)\r\ncom.microsoft.datatransfer.bridge.parquet.ParquetWriterBridge.open(ParquetWriterBridge.java:13)\r\ncom.microsoft.datatransfer.bridge.parquet.ParquetFileBridge.createWriter(ParquetFileBridge.java:27)\r\n.,Source=Microsoft.DataTransfer.Richfile.ParquetTransferPlugin,''Type=Microsoft.DataTransfer.Richfile.JniExt.JavaBridgeException,Message=,Source=Microsoft.DataTransfer.Richfile.HiveOrcBridge,'",
"failureType": "UserError",
"target": "Copy ATM Electronic Journal Deposits CSV",
"details": []

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,373 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Djedjiga Chikhi 6 Reputation points
    2022-10-31T18:54:47.673+00:00

    Yes, the issue was resolved. The issue was caused by the empty files. Thank you!

    1 person found this answer helpful.

  2. KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator
    2022-10-27T00:16:18.783+00:00

    Hello @Djedjiga Chikhi ,

    Thanks for the question and using MS Q&A platform.

    As per your details the issue seems to be inconsistent or intermittent. Are you using SHIR either on source or sink connectors? If yes, then it might be related to a insufficient memory on your SHIR. Please refer to below troubleshooting guide and follow the recommendations as suggested and let us know if that helps.

    Doc: Error code: ParquetJavaInvocationException - Troubleshoot the Parquet format connector in Azure Data Factory and Azure Synapse

    254469-image.png

    Hope this helps. Let us know how it goes.

    Thank you


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.