Need Guidance on Deleting Folders with Azure Data Factory in ADLS

Pravalika-randstad 240 Reputation points
2023-11-05T18:40:32.5433333+00:00

I have a pipeline designed to remove certain folders from ADLS. My folder structure is organized as follows:

/raw/MainFolder/SubfolderA/20230430/File.parquet

/raw/MainFolder/SubfolderA/20230415/File.parquet

/raw/MainFolder/SubfolderA/20230410/File.parquet

/raw/MainFolder/SubfolderB/20230430/File.parquet

/raw/MainFolder/SubfolderB/20230420/File.parquet

/raw/MainFolder/SubfolderB/20230405/File.parquet

The current pipeline deletes the ‘File.parquet’ within the date-specific subfolders. However, I need it to delete the entire folder with the date ‘20230430’. The pipeline currently deletes the parquet files successfully but encounters an error when attempting to delete an empty folder. I’m only providing the folder path, not the parquet file name, to my pipeline, and I’ve enabled recursion in the delete activity.

Error: “Failed to execute the delete activity with data source ‘AzureBlobStorage’ and the error message ‘The required Blob is missing. Folder path: raw/MainFolder/SubfolderA/20230430/’.”

How can I configure the delete activity to delete the entire folder and not just the files within it? I’m relatively new to Azure Data Factory and would greatly appreciate your guidance.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,030 questions
{count} votes

Accepted answer
  1. Smaran Thoomu 18,385 Reputation points Microsoft Vendor
    2023-11-06T12:16:13.95+00:00

    Hi @Pravalika-randstad ,

    Thank you for reaching out to us with your query.   

    To delete an entire folder with Azure Data Factory, you can try using the following approach:

    • This is my folder structure:
    raw
        MainFolder
            SubfolderA
                20230425
                    //files
                20230427
                    //files
                20230429
                    //files
                20230523
                    //files
            SubfolderB
                20230425
                    //files
                20230427
                    //files
                20230429
                    //files
                20230523
                    //files
    
    • As you want to delete the folders which are more than 7 days old, first I have created a dates array using a ForEach with @range(0,7). This expression gives the array[0,1,2,3,4,5,6].
    • Inside ForEach I have used append variable activity to an array to append the date in yyyyMMdd format with the below expression. @formatDateTime(subtractFromTime(utcNow(),item(),'Day'),'yyyyMMdd')
    • This gives the dates array for the last 7 days list as below.
      User's image
    • This is my pipeline flow:
      User's image
    • Use a Get Meta data activity first to get the subfolders list(SubfolderA,SubfolderA) and pass this child items array to ForEach.
    • Inside ForEach, use another Get Meta data activity(in path give the @item().name) to get the date folders list.
    • Now, use filter on these child items. Here we are filtering the date folders by checking our dates array contains the folder name or not.
    • Get the child items which are more than 7 days from the filter. Here we need to iterate through this array. So, use Execute pipeline activity by passing the current subfolder name and its corresponding child items array.
    • In the child pipeline, iterate through the child items and use delete activity on it.

    Use the dataset with a parameter like below:

    User's image

    My Parent pipeline JSON:

    {
        "name": "parent",
        "properties": {
            "activities": [
                {
                    "name": "get subfolders",
                    "type": "GetMetadata",
                    "dependsOn": [
                        {
                            "activity": "ForEach1",
                            "dependencyConditions": [
                                "Succeeded"
                            ]
                        }
                    ],
                    "policy": {
                        "timeout": "0.12:00:00",
                        "retry": 0,
                        "retryIntervalInSeconds": 30,
                        "secureOutput": false,
                        "secureInput": false
                    },
                    "userProperties": [],
                    "typeProperties": {
                        "dataset": {
                            "referenceName": "sourcecsv",
                            "type": "DatasetReference",
                            "parameters": {
                                "folderpath": "MainFolder"
                            }
                        },
                        "fieldList": [
                            "childItems"
                        ],
                        "storeSettings": {
                            "type": "AzureBlobFSReadSettings",
                            "enablePartitionDiscovery": false
                        },
                        "formatSettings": {
                            "type": "DelimitedTextReadSettings"
                        }
                    }
                },
                {
                    "name": "iterate subfolders",
                    "type": "ForEach",
                    "dependsOn": [
                        {
                            "activity": "get subfolders",
                            "dependencyConditions": [
                                "Succeeded"
                            ]
                        }
                    ],
                    "userProperties": [],
                    "typeProperties": {
                        "items": {
                            "value": "@activity('get subfolders').output.childItems",
                            "type": "Expression"
                        },
                        "isSequential": true,
                        "activities": [
                            {
                                "name": "get date folders",
                                "type": "GetMetadata",
                                "dependsOn": [],
                                "policy": {
                                    "timeout": "0.12:00:00",
                                    "retry": 0,
                                    "retryIntervalInSeconds": 30,
                                    "secureOutput": false,
                                    "secureInput": false
                                },
                                "userProperties": [],
                                "typeProperties": {
                                    "dataset": {
                                        "referenceName": "sourcecsv",
                                        "type": "DatasetReference",
                                        "parameters": {
                                            "folderpath": {
                                                "value": "@concat('MainFolder/',item().name)",
                                                "type": "Expression"
                                            }
                                        }
                                    },
                                    "fieldList": [
                                        "childItems"
                                    ],
                                    "storeSettings": {
                                        "type": "AzureBlobFSReadSettings",
                                        "enablePartitionDiscovery": false
                                    },
                                    "formatSettings": {
                                        "type": "DelimitedTextReadSettings"
                                    }
                                }
                            },
                            {
                                "name": "Execute Pipeline1",
                                "type": "ExecutePipeline",
                                "dependsOn": [
                                    {
                                        "activity": "Filter1",
                                        "dependencyConditions": [
                                            "Succeeded"
                                        ]
                                    }
                                ],
                                "userProperties": [],
                                "typeProperties": {
                                    "pipeline": {
                                        "referenceName": "child",
                                        "type": "PipelineReference"
                                    },
                                    "waitOnCompletion": true,
                                    "parameters": {
                                        "date_folder": {
                                            "value": "@activity('Filter1').output.value",
                                            "type": "Expression"
                                        },
                                        "path": {
                                            "value": "@concat('MainFolder/',item().name)",
                                            "type": "Expression"
                                        }
                                    }
                                }
                            },
                            {
                                "name": "Filter1",
                                "type": "Filter",
                                "dependsOn": [
                                    {
                                        "activity": "get date folders",
                                        "dependencyConditions": [
                                            "Succeeded"
                                        ]
                                    }
                                ],
                                "userProperties": [],
                                "typeProperties": {
                                    "items": {
                                        "value": "@activity('get date folders').output.childItems",
                                        "type": "Expression"
                                    },
                                    "condition": {
                                        "value": "@not(contains(variables('daysarr'),item().name))",
                                        "type": "Expression"
                                    }
                                }
                            }
                        ]
                    }
                },
                {
                    "name": "ForEach1",
                    "type": "ForEach",
                    "dependsOn": [],
                    "userProperties": [],
                    "typeProperties": {
                        "items": {
                            "value": "@range(0,7)",
                            "type": "Expression"
                        },
                        "isSequential": true,
                        "activities": [
                            {
                                "name": "Append variable1",
                                "type": "AppendVariable",
                                "dependsOn": [],
                                "userProperties": [],
                                "typeProperties": {
                                    "variableName": "daysarr",
                                    "value": {
                                        "value": "@formatDateTime(subtractFromTime(utcNow(),item(),'Day'),'yyyyMMdd')",
                                        "type": "Expression"
                                    }
                                }
                            }
                        ]
                    }
                }
            ],
            "variables": {
                "counter": {
                    "type": "String"
                },
                "daysarr": {
                    "type": "Array"
                },
                "temp": {
                    "type": "String"
                },
                "new": {
                    "type": "Array"
                }
            },
            "annotations": [],
            "lastPublishTime": "2023-05-02T07:27:09Z"
        },
        "type": "Microsoft.DataFactory/factories/pipelines"
    }
    
    
    

    Child Pipeline JSON:

    {
        "name": "child",
        "properties": {
            "activities": [
                {
                    "name": "ForEach1",
                    "type": "ForEach",
                    "dependsOn": [],
                    "userProperties": [],
                    "typeProperties": {
                        "items": {
                            "value": "@pipeline().parameters.date_folder",
                            "type": "Expression"
                        },
                        "isSequential": true,
                        "activities": [
                            {
                                "name": "Delete1",
                                "type": "Delete",
                                "dependsOn": [],
                                "policy": {
                                    "timeout": "0.12:00:00",
                                    "retry": 0,
                                    "retryIntervalInSeconds": 30,
                                    "secureOutput": false,
                                    "secureInput": false
                                },
                                "userProperties": [],
                                "typeProperties": {
                                    "dataset": {
                                        "referenceName": "sourcecsv",
                                        "type": "DatasetReference",
                                        "parameters": {
                                            "folderpath": {
                                                "value": "@concat(pipeline().parameters.path,'/',item().name)",
                                                "type": "Expression"
                                            }
                                        }
                                    },
                                    "enableLogging": false,
                                    "storeSettings": {
                                        "type": "AzureBlobFSReadSettings",
                                        "recursive": true,
                                        "enablePartitionDiscovery": false
                                    }
                                }
                            }
                        ]
                    }
                }
            ],
            "parameters": {
                "date_folder": {
                    "type": "array"
                },
                "path": {
                    "type": "string"
                }
            },
            "annotations": []
        }
    }
    

    Folders before pipeline execution:

    User's image

    You can see the folders which are more than 7 days folders were deleted after pipeline execution.

    User's image

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.