Handling Files and Data with Azure Data Factory

Question

Handling Files and Data with Azure Data Factory

Relay 160

I have created a data pipeline using ADF which processed data from ADLS Gen2 pushed by SAP system and after transformation persists to Azure SQL.

User's image

Please help me in designing on:-

How ADF ensure the files is already processed or not processed. I want to avoid reprocessing of same file or missing of new files.
How to Handle if File is corrupt. how to move to quarantine folder.
How to make sure error record move to the error table in sql.

Thanks a lot for help.

Accepted answer

1 additional answer

Your answer

Answer 1

Hi @Relay

How does ADF ensure a file is already processed or not processed to avoid reprocessing or missing new files?

Use a control table in Azure SQL (e.g., ProcessedFiles) to track processed files.

Steps:

Use the Get Metadata activity to list files and get their lastModified timestamp.
Use the Lookup activity to check if the file already exists in the control table.
Use an If Condition to process only new or updated files.
After successful processing, insert a record into ProcessedFiles with status 'Processed'.

How to handle if a file is corrupt and move it to a quarantine folder?

Validate the file contents using Data Flow or custom validation logic.

If validation fails:

Use a Copy Data activity to move the file from the input folder to a quarantine/ folder in ADLS Gen2.
Optionally, log this file in the control table with status 'Corrupt' and include the failure reason.

How to ensure error records are moved to an error table in Azure SQL?

To handle error records in Azure Data Factory, you can use Mapping Data Flows with error row handling enabled. Configure your Data Flow to redirect faulty or malformed rows to a separate sink, instead of failing the entire load.

These error records can then be written to an ErrorRecords table in Azure SQL Database, capturing useful metadata such as the source file name, the full row content (as a string or JSON), the specific error message encountered, and a timestamp.

This setup ensures robust error tracking without interrupting the pipeline for valid records.

I hope this information helps. Please do let us know if you have any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

As your feedback is valuable and can assist others in the community facing similar issues.

Thank you.

Answer 2

Assuming you are using copy activity to copy the data from file into some sink, if the copy activity is successful it means the ADF has processed the file. As a safety aspect, you can archive / delete the file once it is processed to ensure it is not executed again
Is there a need to process correct data and only ignore errornous rows? Then you can enable fault tolerance in copy activity https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-fault-tolerance This would also log erroneous rows in a file .you can then copy those into a table if need be

Share via

Handling Files and Data with Azure Data Factory

1 additional answer

Your answer