How to fix HybridDeliveryException,Message=An error occurred when invoking java, message: java.io.IOException while merging two parquet files in ADF

abby_17 0 Reputation points
2025-04-01T09:11:06.8466667+00:00

We have written a pipeline to copy data from database and merge data to existing parquet file.

While doing so, we found following error in the pipeline for merging the parquet file activity. This activity uses copy acitivity of the ADF.

ErrorCode=ParquetJavaInvocationException,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=An error occurred when invoking java, message: java.io.IOException:S
total entry:15
com.microsoft.datatransfer.bridge.io.parquet.IoBridge.inputStreamRead(Native Method)
com.microsoft.datatransfer.bridge.io.parquet.BridgeInputFileStream.fillBuffer(BridgeInputFileStream.java:88)
com.microsoft.datatransfer.bridge.io.parquet.BridgeInputFileStream.read(BridgeInputFileStream.java:42)
java.io.DataInputStream.read(DataInputStream.java:149)
org.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:102)
org.apache.parquet.io.DelegatingSeekableInputStream.readFullyHeapBuffer(DelegatingSeekableInputStream.java:127)
org.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:91)
org.apache.parquet.hadoop.ParquetFileReader$ConsecutivePartList.readAll(ParquetFileReader.java:1850)
org.apache.parquet.hadoop.ParquetFileReader.internalReadRowGroup(ParquetFileReader.java:990)
org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:940)
org.apache.parquet.hadoop.ParquetFileReader.readNextFilteredRowGroup(ParquetFileReader.java:1082)
org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:130)
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:230)
org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:132)
com.microsoft.datatransfer.bridge.parquet.ParquetBatchReaderBridge.nextBuffer(ParquetBatchReaderBridge.java:168)
.,Source=Microsoft.DataTransfer.Richfile.ParquetTransferPlugin,''Type=Microsoft.DataTransfer.Richfile.JniExt.JavaBridgeException,Message=,Source=Microsoft.DataTransfer.Richfile.HiveOrcBridge,'

Here is the copy activity JSON to start the activity.

{
"source": {    
"type": "ParquetSource",
    "storeSettings": {
        "type": "AzureBlobFSReadSettings",
        "recursive": false,
        "wildcardFolderPath": "policy-priced-peril-characteristic-commissions",
        "wildcardFileName": "policy_priced_peril_characteristic_commissions?*.parquet",
        "enablePartitionDiscovery": false
    },
    "formatSettings": {
        "type": "ParquetReadSettings"
    }
},
"sink": {
    "type": "ParquetSink",
    "storeSettings": {
        "type": "AzureBlobFSWriteSettings",
        "copyBehavior": "MergeFiles"
    },
    "formatSettings": {
        "type": "ParquetWriteSettings"
    }
},
"enableStaging": false,
"parallelCopies": 2,
"validateDataConsistency": true,
"logSettings": {
    "enableCopyActivityLog": true,
    "copyActivityLogSettings": {
        "logLevel": "Warning",
        "enableReliableLogging": false
    },
    "logLocationSettings": {
        "linkedServiceName": {
            "referenceName": "AzureDataLakeStorage2",
            "type": "LinkedServiceReference"
        },
        "path": "adf-logs"
    }
},
"dataIntegrationUnits": 4,
"translator": {
    "type": "TabularTranslator",
    "typeConversion": true,
    "typeConversionSettings": {
        "allowDataTruncation": true,
        "treatBooleanAsNumber": false
    }
}
}```

We are using **AutoResolveIntegrationRuntime**.

Here is output from this activity   

```json
{
	"dataRead": 275450924,
	"dataWritten": 0,
	"filesRead": 2,
	"filesWritten": 0,
	"sourcePeakConnections": 2,
	"sinkPeakConnections": 1,
	"rowsRead": 6853649,
	"rowsCopied": 6853649,
	"copyDuration": 184,
	"throughput": 2459.383,
	"logFilePath": "adf-logs/copyactivity-logs/MergeParquetFiles_copy1/d22650f7-9158-4720-8543-a735165a7caa/",
	"errors": [
		{
			"Code": 21000,
			"Message": "ErrorCode=ParquetJavaInvocationException,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=An error occurred when invoking java, message: java.io.IOException:S\ntotal entry:15\r\ncom.microsoft.datatransfer.bridge.io.parquet.IoBridge.inputStreamRead(Native Method)\r\ncom.microsoft.datatransfer.bridge.io.parquet.BridgeInputFileStream.fillBuffer(BridgeInputFileStream.java:88)\r\ncom.microsoft.datatransfer.bridge.io.parquet.BridgeInputFileStream.read(BridgeInputFileStream.java:42)\r\njava.io.DataInputStream.read(DataInputStream.java:149)\r\norg.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:102)\r\norg.apache.parquet.io.DelegatingSeekableInputStream.readFullyHeapBuffer(DelegatingSeekableInputStream.java:127)\r\norg.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:91)\r\norg.apache.parquet.hadoop.ParquetFileReader$ConsecutivePartList.readAll(ParquetFileReader.java:1850)\r\norg.apache.parquet.hadoop.ParquetFileReader.internalReadRowGroup(ParquetFileReader.java:990)\r\norg.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:940)\r\norg.apache.parquet.hadoop.ParquetFileReader.readNextFilteredRowGroup(ParquetFileReader.java:1082)\r\norg.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:130)\r\norg.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:230)\r\norg.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:132)\r\ncom.microsoft.datatransfer.bridge.parquet.ParquetBatchReaderBridge.nextBuffer(ParquetBatchReaderBridge.java:168)\r\n.,Source=Microsoft.DataTransfer.Richfile.ParquetTransferPlugin,''Type=Microsoft.DataTransfer.Richfile.JniExt.JavaBridgeException,Message=,Source=Microsoft.DataTransfer.Richfile.HiveOrcBridge,'",
			"EventType": 0,
			"Category": 5,
			"Data": {},
			"MsgId": null,
			"ExceptionType": null,
			"Source": null,
			"StackTrace": null,
			"InnerEventInfos": []
		}
	],
	"effectiveIntegrationRuntime": "AutoResolveIntegrationRuntime (West Europe)",
	"usedDataIntegrationUnits": 4,
	"billingReference": {
		"activityType": "DataMovement",
		"billableDuration": [
			{
				"meterType": "ManagedVNetIR",
				"duration": 0.26666666666666666,
				"unit": "DIUHours"
			}
		],
		"totalBillableDuration": [
			{
				"meterType": "AzureIR",
				"duration": 0.26666666666666666,
				"unit": "DIUHours"
			}
		]
	},
	"usedParallelCopies": 2,
	"executionDetails": [
		{
			"source": {
				"type": "AzureBlobFS",
				"region": "West Europe"
			},
			"sink": {
				"type": "AzureBlobFS",
				"region": "West Europe"
			},
			"status": "Failed",
			"start": "3/28/2025, 10:04:33 PM",
			"duration": 184,
			"usedDataIntegrationUnits": 4,
			"usedParallelCopies": 2,
			"profile": {
				"queue": {
					"status": "Completed",
					"duration": 70
				},
				"transfer": {
					"status": "Completed",
					"duration": 112,
					"details": {
						"listingSource": {
							"type": "AzureBlobFS",
							"workingDuration": 0
						},
						"readingFromSource": {
							"type": "AzureBlobFS",
							"workingDuration": 7
						},
						"writingToSink": {
							"type": "AzureBlobFS",
							"workingDuration": 0
						}
					}
				}
			},
			"detailedDurations": {
				"queuingDuration": 70,
				"transferDuration": 112
			}
		}
	],
	"dataConsistencyVerification": {
		"VerificationResult": "Verified"
	},
	"durationInQueue": {
		"integrationRuntimeQueue": 0
	}
}

Any help or feedback would be helpful to understand and debug this error.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,644 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Venkat Reddy Navari 3,470 Reputation points Microsoft External Staff Moderator
    2025-04-01T11:57:53.9433333+00:00

    Hi @abby_17@abby_17

    The HybridDeliveryException in Azure Data Factory (ADF) while merging Parquet files usually happens due to file compatibility or configuration issues. Try these steps to fix it:

    1. Check the Parquet Files: Download the files from your storage (policy-priced-peril-characteristic-commissions). Use this Python script to check if they open correctly and compare their schemas:
         import pyarrow.parquet as pq  
         for file in ['file1.parquet', 'file2.parquet']:  
           table = pq.read_table(file)  
           print(f"Schema for {file}: {table.schema}")
      
      If one file doesn’t open or schemas don’t match, that could be the issue.
    2. Test with One File: Change "wildcardFileName" in your JSON to use just one file (e.g., policy_priced_peril_characteristic_commissions1.parquet Run the pipeline. If it works, the second file or the merge operation might be causing the problem.
    3. Adjust Copy Activity Settings
      • Reduce "parallelCopies": 2"parallelCopies": 1 (for better stability).
      • Increase "dataIntegrationUnits": 4"dataIntegrationUnits": 8 (for more processing power).
    4. Check the Logs for Errors
      • Look at the logs in: These logs may point to the exact issue.
        adf-logs/copyactivity-logs/MergeParquetFiles_copy1/[execution-id]/
    5. Try Merging the Files Manually
      • Run this Python script to merge the files:
             import pyarrow.parquet as pq  
             import pyarrow as pa  
             files = ['file1.parquet', 'file2.parquet']  
             tables = [pq.read_table(f) for f in files]  
             merged = pa.concat_tables(tables)  
             pq.write_table(merged, 'output.parquet')
        

    For more details, check the Microsoft troubleshooting guide: Azure Data Factory Parquet Connector Troubleshooting

    I hope this information helps. Please do let us know if you have any further queries.

    Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.


  2. abby_17 0 Reputation points
    2025-04-22T07:56:58.9833333+00:00

    Hey,

    I had talked on this topic with MS support and only resolutions without any know why is to set retry to 3 in merge/copy activity.

    Hoping this helps others too.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.