Hi Gużewski, Jacek
To make sure that temporary files and folders are deleted in Azure Data Factory, you have to adjust different settings and activities within your pipeline or dataflow. This involves adding a Delete activity to your pipeline, configuring pipeline and data flow settings, setting up error handling mechanisms, and ensuring that the service principal or managed identity running the pipeline has the required permissions to delete files from your storage account. By properly configuring these aspects, you can establish a dependable cleanup process that eliminates temporary files and folders and stops them from piling up over time.
Reference:
https://azure.microsoft.com/en-us/blog/clean-up-files-by-built-in-delete-activity-in-azure-data-factory/
I hope this information helps you. Let me know if you have any further questions or concerns.
ADF pipeline/dataflow creates temporary GUID folders and files and not delete them
Gużewski, Jacek
135
Reputation points
Hi,
I have a problem that my processes, from time to time, create temporary GUID folders and/or files in them but, in the end, not deleting them.
Where is the problem/reason of that?
Accepted answer
-
Harishga 5,995 Reputation points Microsoft Vendor
2024-06-11T06:40:50.1633333+00:00
1 additional answer
Sort by: Most helpful
-
Amira Bedhiafi 28,536 Reputation points
2024-06-10T14:02:15+00:00 The common cases when it comes to the creation of temporary GUID folders and files :
- If an ADF pipeline fails or is canceled, it might leave behind temporary files and folders. These are often created as part of data staging or intermediate steps.
- Some activities in ADF, like Copy Activity or Data Flow, create temporary files and might not have proper cleanup mechanisms in place if not configured correctly.
- When pipelines run in parallel, temporary files might be left if one of the parallel tasks fails or is interrupted.
- If the service principal or managed identity running the pipeline does not have delete permissions on ADLS, cleanup steps might fail silently.
- Running Data Flows in debug mode can create temporary files that are not automatically cleaned up.
How to solve it ?
- Add an activity at the end of your pipeline to delete temporary files and folders. Use the Delete activity in ADF to clean up after your main processing steps.
- Implement error handling and retry logic. Use the On Failure and On Completion activities to ensure that cleanup happens even if a pipeline fails.
- Ensure that the identity running the ADF pipelines has sufficient permissions to delete files in ADLS. Verify the RBAC roles and access policies.
- Set up logging and monitoring for your pipelines to track where and when temporary files are created. Use Azure Monitor and log analytics to get insights into pipeline executions and failures.
- Implement a scheduled cleanup job that runs periodically to delete old temporary files and folders. This can be done using ADF, Logic Apps, or Azure Functions.