Hi @Relay
How does ADF ensure a file is already processed or not processed to avoid reprocessing or missing new files?
Use a control table in Azure SQL (e.g., ProcessedFiles) to track processed files.
Steps:
- Use the
Get Metadata activity
to list files and get theirlastModified timestamp.
- Use the
Lookup activity
to check if the file already exists in the control table. - Use an
If Condition
to process only new or updated files. - After successful processing, insert a record into
ProcessedFiles
with status'Processed'.
How to handle if a file is corrupt and move it to a quarantine folder?
Validate the file contents using Data Flow or custom validation logic.
If validation fails:
- Use a
Copy Data activity
to move the file from the input folder to a quarantine/ folder in ADLS Gen2. - Optionally, log this file in the control table with status 'Corrupt' and include the failure reason.
How to ensure error records are moved to an error table in Azure SQL?
To handle error records in Azure Data Factory, you can use Mapping Data Flows with error row handling enabled. Configure your Data Flow to redirect faulty or malformed rows to a separate sink, instead of failing the entire load.
These error records can then be written to an ErrorRecords table in Azure SQL Database, capturing useful metadata such as the source file name, the full row content (as a string or JSON), the specific error message encountered, and a timestamp.
This setup ensures robust error tracking without interrupting the pipeline for valid records.
I hope this information helps. Please do let us know if you have any further queries.
If this answers your query, do click Accept Answer
and Yes
for was this answer helpful. And, if you have any further query do let us know.
As your feedback is valuable and can assist others in the community facing similar issues.
Thank you.