How to Capture Row-Level Schema Failures and Redirect Bad Records in ADF During DB2 to ADLS Ingestion

Question

How to Capture Row-Level Schema Failures and Redirect Bad Records in ADF During DB2 to ADLS Ingestion

Janice Chi 140

We are ingesting data from an IBM DB2 source into ADLS Gen2 using Azure Data Factory (ADF) Copy activity. The Copy activity is set up with schema validation enabled. During ingestion, if a row violates schema (e.g., data type mismatch), the entire partition or file fails, and we are not able to identify which row caused the issue.

We want to:

Capture row-level failure details instead of failing the full partition.

Re-ingest only the failed/bad records later.

Automatically move bad records to an error folder inside ADLS (similar to fault tolerance or error-handling mechanisms).

Questions:

Does ADF provide native support for row-level failure logging and redirecting bad records?

If not, what is the recommended design pattern to capture bad rows and continue with partial ingestion?

Are there any fault-tolerance settings or alternatives (like custom mapping data flows) that can help us isolate and move errored rows?

1 answer

Your answer

Answer 1

Hi @Janice Chi

Does ADF provide native support for row-level failure logging and redirecting bad records during Copy activity?

Yes, Azure Data Factory (ADF) provides fault tolerance support in the Copy activity, which allows you to capture row-level errors during data movement. You can enable this by configuring the "Fault Tolerance" settings in the Copy activity.

Specifically, you can:

Set Skip incompatible rows (schema mismatches, conversion errors, nullability violations, etc.).
Enable "Log invalid rows" to capture failed rows in a separate error file.
Redirect the bad records to a designated error folder in your ADLS Gen2.

https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-fault-tolerance

How Can I re-ingest only the failed/bad records later?

When fault tolerance is enabled, ADF stores the bad records (along with error details) in text/CSV format under the specified error folder in your sink (e.g., ADLS Gen2). These files include:

The row that failed.
A column describing the error code and reason (e.g., data type conversion failure).

You can design another ADF pipeline to:

Read from this error folder.
Optionally cleanse/fix the data.
Re-ingest only the bad records into the destination (or quarantine DB/table).

What configurations steps do we need in Copy activity

In your Copy Activity settings in ADF:

Navigate to the "Fault Tolerance" tab.

Set: "Skip incompatible rows": true "Log invalid rows": true "Error file path": provide a folder path in ADLS (e.g., adls-container/errorlogs/db2-badrecords/)
Enable retry settings to handle transient failures.

Example JSON snippet from pipeline definition:

"faultToleranceSettings": {
    "skipIncompatibleRow": true,
    "logInvalidRows": true,
    "errorFilePath": {
        "type": "AzureBlobFSLocation",
        "fileSystem": "mycontainer",
        "folderPath": "errorlogs/db2-badrecords"
    }
}

https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-schema-and-type-mapping#schema-mismatch-and-error-row-handling

Will enabling this fault tolerance to degrade the performance?

Yes, enabling row-level fault tolerance introduces additional overhead as ADF must:

Check each row for schema conformance.
Log any invalid rows separately.

Performance impact is usually acceptable for moderate data volumes but should be tested and monitored for large datasets.

I hope this information helps. Please do let us know if you have any further queries.

Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.

Janice Chi 140 Reputation points

2025-06-10T09:21:25.8866667+00:00

since in our project next step once data is there in ADLS is to use DBR and perform basic transfromations to match the schema btween source DB2 and sink Hyperscale so do we reallly need to "Check each row for schema conformance." in ADF does it help as we surely have performance impact with size like 15TB for single table
J N S S Kasyap 3,625 Reputation points Microsoft External Staff Moderator

2025-06-11T11:53:46.5433333+00:00

@Janice Chi
For large-scale ingestion pipelines like yours (e.g., 15 TB tables), it is not recommended to enable row-level schema validation or fault tolerance in Azure Data Factory (ADF).
ADF is optimized for high-throughput bulk movement, not fine-grained validation. Enabling fault tolerance (skip incompatible rows, log invalid rows) forces ADF to check each row, which can significantly degrade performance for large datasets increasing execution time, memory usage, and storage I/O.
Since you're using Databricks (DBR) in the next stage, it's better to offload schema enforcement, data type alignment, and error handling to Spark. Databricks offers more scalable and flexible controls for:

Schema evolution

Bad record capture (e.g., _corrupt_record)

Custom cleansing and transformation logic

Best practice at this scale is to use ADF only to land raw data in ADLS in Parquet or Avro format, then apply transformations in Databricks before loading into Azure SQL Hyperscale.

I hope this information helps. Please do let us know if you have any further queries.

Share via

How to Capture Row-Level Schema Failures and Redirect Bad Records in ADF During DB2 to ADLS Ingestion

1 answer

Your answer