ADF pipeline/dataflow creates temporary GUID folders and files and not delete them

Gużewski, Jacek 95 Reputation points
2024-06-10T11:16:40.6933333+00:00

Hi,

I have a problem that my processes, from time to time, create temporary GUID folders and/or files in them but, in the end, not deleting them.

Where is the problem/reason of that?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,933 questions
0 comments No comments
{count} votes

Accepted answer
  1. Harishga 5,270 Reputation points Microsoft Vendor
    2024-06-11T06:40:50.1633333+00:00

    Hi Gużewski, Jacek
    To make sure that temporary files and folders are deleted in Azure Data Factory, you have to adjust different settings and activities within your pipeline or dataflow. This involves adding a Delete activity to your pipeline, configuring pipeline and data flow settings, setting up error handling mechanisms, and ensuring that the service principal or managed identity running the pipeline has the required permissions to delete files from your storage account. By properly configuring these aspects, you can establish a dependable cleanup process that eliminates temporary files and folders and stops them from piling up over time.
    Reference:
    https://azure.microsoft.com/en-us/blog/clean-up-files-by-built-in-delete-activity-in-azure-data-factory/
    I hope this information helps you. Let me know if you have any further questions or concerns.

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Amira Bedhiafi 18,186 Reputation points
    2024-06-10T14:02:15+00:00

    The common cases when it comes to the creation of temporary GUID folders and files :

    • If an ADF pipeline fails or is canceled, it might leave behind temporary files and folders. These are often created as part of data staging or intermediate steps.
    • Some activities in ADF, like Copy Activity or Data Flow, create temporary files and might not have proper cleanup mechanisms in place if not configured correctly.
    • When pipelines run in parallel, temporary files might be left if one of the parallel tasks fails or is interrupted.
    • If the service principal or managed identity running the pipeline does not have delete permissions on ADLS, cleanup steps might fail silently.
    • Running Data Flows in debug mode can create temporary files that are not automatically cleaned up.

    How to solve it ?

    • Add an activity at the end of your pipeline to delete temporary files and folders. Use the Delete activity in ADF to clean up after your main processing steps.
    • Implement error handling and retry logic. Use the On Failure and On Completion activities to ensure that cleanup happens even if a pipeline fails.
    • Ensure that the identity running the ADF pipelines has sufficient permissions to delete files in ADLS. Verify the RBAC roles and access policies.
    • Set up logging and monitoring for your pipelines to track where and when temporary files are created. Use Azure Monitor and log analytics to get insights into pipeline executions and failures.
    • Implement a scheduled cleanup job that runs periodically to delete old temporary files and folders. This can be done using ADF, Logic Apps, or Azure Functions.
    1 person found this answer helpful.