Handling Files and Data with Azure Data Factory

Relay 160 Reputation points
2025-06-20T13:25:53.2166667+00:00

I have created a data pipeline using ADF which processed data from ADLS Gen2 pushed by SAP system and after transformation persists to Azure SQL.

User's image

Please help me in designing on:-

  1. How ADF ensure the files is already processed or not processed. I want to avoid reprocessing of same file or missing of new files.
  2. How to Handle if File is corrupt. how to move to quarantine folder.
  3. How to make sure error record move to the error table in sql.

Thanks a lot for help.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,623 questions
0 comments No comments
{count} votes

Accepted answer
  1. J N S S Kasyap 3,625 Reputation points Microsoft External Staff Moderator
    2025-06-20T14:29:06.74+00:00

    Hi @Relay

    How does ADF ensure a file is already processed or not processed to avoid reprocessing or missing new files? 

     Use a control table in Azure SQL (e.g., ProcessedFiles) to track processed files. 

     Steps: 

    • Use the Get Metadata activity to list files and get their lastModified timestamp. 
    • Use the Lookup activity to check if the file already exists in the control table. 
    • Use an If Condition to process only new or updated files. 
    • After successful processing, insert a record into ProcessedFiles with status 'Processed'.  

    How to handle if a file is corrupt and move it to a quarantine folder? 

     Validate the file contents using Data Flow or custom validation logic. 

     If validation fails: 

    • Use a Copy Data activity to move the file from the input folder to a quarantine/ folder in ADLS Gen2. 
    • Optionally, log this file in the control table with status 'Corrupt' and include the failure reason.  

    How to ensure error records are moved to an error table in Azure SQL? 

    To handle error records in Azure Data Factory, you can use Mapping Data Flows with error row handling enabled. Configure your Data Flow to redirect faulty or malformed rows to a separate sink, instead of failing the entire load.

    These error records can then be written to an ErrorRecords table in Azure SQL Database, capturing useful metadata such as the source file name, the full row content (as a string or JSON), the specific error message encountered, and a timestamp.

    This setup ensures robust error tracking without interrupting the pipeline for valid records. 

    I hope this information helps. Please do let us know if you have any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    As your feedback is valuable and can assist others in the community facing similar issues.

    Thank you.

     

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Nandan Hegde 36,146 Reputation points MVP Volunteer Moderator
    2025-06-20T14:19:44.4266667+00:00
    1. Assuming you are using copy activity to copy the data from file into some sink, if the copy activity is successful it means the ADF has processed the file. As a safety aspect, you can archive / delete the file once it is processed to ensure it is not executed again
    2. Is there a need to process correct data and only ignore errornous rows? Then you can enable fault tolerance in copy activity https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-fault-tolerance This would also log erroneous rows in a file .you can then copy those into a table if need be
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.