Hi @Janice Chi
Does ADF provide native support for row-level failure logging and redirecting bad records during Copy activity?
Yes, Azure Data Factory (ADF) provides fault tolerance support in the Copy activity, which allows you to capture row-level errors during data movement. You can enable this by configuring the "Fault Tolerance" settings in the Copy activity.
Specifically, you can:
- Set Skip incompatible rows (schema mismatches, conversion errors, nullability violations, etc.).
- Enable "Log invalid rows" to capture failed rows in a separate error file.
- Redirect the bad records to a designated error folder in your ADLS Gen2.
https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-fault-tolerance
How Can I re-ingest only the failed/bad records later?
When fault tolerance is enabled, ADF stores the bad records (along with error details) in text/CSV format under the specified error folder in your sink (e.g., ADLS Gen2). These files include:
- The row that failed.
- A column describing the error code and reason (e.g., data type conversion failure).
You can design another ADF pipeline to:
- Read from this error folder.
- Optionally cleanse/fix the data.
- Re-ingest only the bad records into the destination (or quarantine DB/table).
What configurations steps do we need in Copy activity
In your Copy Activity settings in ADF:
Navigate to the "Fault Tolerance" tab.
- Set: "Skip incompatible rows": true "Log invalid rows": true "Error file path": provide a folder path in ADLS (e.g., adls-container/errorlogs/db2-badrecords/)
- Enable retry settings to handle transient failures.
Example JSON snippet from pipeline definition:
"faultToleranceSettings": {
"skipIncompatibleRow": true,
"logInvalidRows": true,
"errorFilePath": {
"type": "AzureBlobFSLocation",
"fileSystem": "mycontainer",
"folderPath": "errorlogs/db2-badrecords"
}
}
Will enabling this fault tolerance to degrade the performance?
Yes, enabling row-level fault tolerance introduces additional overhead as ADF must:
- Check each row for schema conformance.
- Log any invalid rows separately.
Performance impact is usually acceptable for moderate data volumes but should be tested and monitored for large datasets.
I hope this information helps. Please do let us know if you have any further queries.
Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.