We have written a pipeline to copy data from database and merge data to existing parquet file.
While doing so, we found following error in the pipeline for merging the parquet file activity. This activity uses copy acitivity of the ADF.
ErrorCode=ParquetJavaInvocationException,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=An error occurred when invoking java, message: java.io.IOException:S
total entry:15
com.microsoft.datatransfer.bridge.io.parquet.IoBridge.inputStreamRead(Native Method)
com.microsoft.datatransfer.bridge.io.parquet.BridgeInputFileStream.fillBuffer(BridgeInputFileStream.java:88)
com.microsoft.datatransfer.bridge.io.parquet.BridgeInputFileStream.read(BridgeInputFileStream.java:42)
java.io.DataInputStream.read(DataInputStream.java:149)
org.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:102)
org.apache.parquet.io.DelegatingSeekableInputStream.readFullyHeapBuffer(DelegatingSeekableInputStream.java:127)
org.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:91)
org.apache.parquet.hadoop.ParquetFileReader$ConsecutivePartList.readAll(ParquetFileReader.java:1850)
org.apache.parquet.hadoop.ParquetFileReader.internalReadRowGroup(ParquetFileReader.java:990)
org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:940)
org.apache.parquet.hadoop.ParquetFileReader.readNextFilteredRowGroup(ParquetFileReader.java:1082)
org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:130)
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:230)
org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:132)
com.microsoft.datatransfer.bridge.parquet.ParquetBatchReaderBridge.nextBuffer(ParquetBatchReaderBridge.java:168)
.,Source=Microsoft.DataTransfer.Richfile.ParquetTransferPlugin,''Type=Microsoft.DataTransfer.Richfile.JniExt.JavaBridgeException,Message=,Source=Microsoft.DataTransfer.Richfile.HiveOrcBridge,'
Here is the copy activity JSON to start the activity.
{
"source": {
"type": "ParquetSource",
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"recursive": false,
"wildcardFolderPath": "policy-priced-peril-characteristic-commissions",
"wildcardFileName": "policy_priced_peril_characteristic_commissions?*.parquet",
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "ParquetReadSettings"
}
},
"sink": {
"type": "ParquetSink",
"storeSettings": {
"type": "AzureBlobFSWriteSettings",
"copyBehavior": "MergeFiles"
},
"formatSettings": {
"type": "ParquetWriteSettings"
}
},
"enableStaging": false,
"parallelCopies": 2,
"validateDataConsistency": true,
"logSettings": {
"enableCopyActivityLog": true,
"copyActivityLogSettings": {
"logLevel": "Warning",
"enableReliableLogging": false
},
"logLocationSettings": {
"linkedServiceName": {
"referenceName": "AzureDataLakeStorage2",
"type": "LinkedServiceReference"
},
"path": "adf-logs"
}
},
"dataIntegrationUnits": 4,
"translator": {
"type": "TabularTranslator",
"typeConversion": true,
"typeConversionSettings": {
"allowDataTruncation": true,
"treatBooleanAsNumber": false
}
}
}```
We are using **AutoResolveIntegrationRuntime**.
Here is output from this activity
```json
{
"dataRead": 275450924,
"dataWritten": 0,
"filesRead": 2,
"filesWritten": 0,
"sourcePeakConnections": 2,
"sinkPeakConnections": 1,
"rowsRead": 6853649,
"rowsCopied": 6853649,
"copyDuration": 184,
"throughput": 2459.383,
"logFilePath": "adf-logs/copyactivity-logs/MergeParquetFiles_copy1/d22650f7-9158-4720-8543-a735165a7caa/",
"errors": [
{
"Code": 21000,
"Message": "ErrorCode=ParquetJavaInvocationException,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=An error occurred when invoking java, message: java.io.IOException:S\ntotal entry:15\r\ncom.microsoft.datatransfer.bridge.io.parquet.IoBridge.inputStreamRead(Native Method)\r\ncom.microsoft.datatransfer.bridge.io.parquet.BridgeInputFileStream.fillBuffer(BridgeInputFileStream.java:88)\r\ncom.microsoft.datatransfer.bridge.io.parquet.BridgeInputFileStream.read(BridgeInputFileStream.java:42)\r\njava.io.DataInputStream.read(DataInputStream.java:149)\r\norg.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:102)\r\norg.apache.parquet.io.DelegatingSeekableInputStream.readFullyHeapBuffer(DelegatingSeekableInputStream.java:127)\r\norg.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:91)\r\norg.apache.parquet.hadoop.ParquetFileReader$ConsecutivePartList.readAll(ParquetFileReader.java:1850)\r\norg.apache.parquet.hadoop.ParquetFileReader.internalReadRowGroup(ParquetFileReader.java:990)\r\norg.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:940)\r\norg.apache.parquet.hadoop.ParquetFileReader.readNextFilteredRowGroup(ParquetFileReader.java:1082)\r\norg.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:130)\r\norg.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:230)\r\norg.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:132)\r\ncom.microsoft.datatransfer.bridge.parquet.ParquetBatchReaderBridge.nextBuffer(ParquetBatchReaderBridge.java:168)\r\n.,Source=Microsoft.DataTransfer.Richfile.ParquetTransferPlugin,''Type=Microsoft.DataTransfer.Richfile.JniExt.JavaBridgeException,Message=,Source=Microsoft.DataTransfer.Richfile.HiveOrcBridge,'",
"EventType": 0,
"Category": 5,
"Data": {},
"MsgId": null,
"ExceptionType": null,
"Source": null,
"StackTrace": null,
"InnerEventInfos": []
}
],
"effectiveIntegrationRuntime": "AutoResolveIntegrationRuntime (West Europe)",
"usedDataIntegrationUnits": 4,
"billingReference": {
"activityType": "DataMovement",
"billableDuration": [
{
"meterType": "ManagedVNetIR",
"duration": 0.26666666666666666,
"unit": "DIUHours"
}
],
"totalBillableDuration": [
{
"meterType": "AzureIR",
"duration": 0.26666666666666666,
"unit": "DIUHours"
}
]
},
"usedParallelCopies": 2,
"executionDetails": [
{
"source": {
"type": "AzureBlobFS",
"region": "West Europe"
},
"sink": {
"type": "AzureBlobFS",
"region": "West Europe"
},
"status": "Failed",
"start": "3/28/2025, 10:04:33 PM",
"duration": 184,
"usedDataIntegrationUnits": 4,
"usedParallelCopies": 2,
"profile": {
"queue": {
"status": "Completed",
"duration": 70
},
"transfer": {
"status": "Completed",
"duration": 112,
"details": {
"listingSource": {
"type": "AzureBlobFS",
"workingDuration": 0
},
"readingFromSource": {
"type": "AzureBlobFS",
"workingDuration": 7
},
"writingToSink": {
"type": "AzureBlobFS",
"workingDuration": 0
}
}
}
},
"detailedDurations": {
"queuingDuration": 70,
"transferDuration": 112
}
}
],
"dataConsistencyVerification": {
"VerificationResult": "Verified"
},
"durationInQueue": {
"integrationRuntimeQueue": 0
}
}
Any help or feedback would be helpful to understand and debug this error.