Rename subfolder files using ADF

Thulasi Arumugam 20 Reputation points
2023-03-01T20:02:40.61+00:00
I have a blob container with parent folder and multiple sub folders and each sub folder having files. I would like to get the file names only (and not subfolder name) and rename the files names using ADF.

Parent folder -> subfolder->multiple subfolder-> multiple files.pdf
Parent folder-> subfolder->multiple subfolder->multiple files.pdf
parent folder->subfolder->multiple subfolder->multiple files

                       
I just need to rename the pdf files like file1_paper.pdf.

Any suggestions would me much appreciated. Thanks!
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,624 questions
{count} votes

Accepted answer
  1. KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator
    2023-03-10T08:49:11.1433333+00:00

    @Thulasi Arumugam,

    Thanks for your patience. Since you would like to preserve the hierarchy of the folder structure but would like to just rename the file names in those folders, that best and efficient (cost & performance) approach would be to utilize Copy activity for copying files to destination using CopyBehavior = PreserveHierarchy feature and once the copy is completed then have a custom activity or Azure function or an Azure Databricks notebook where you can write a custom code to deep traverse through the folders and subfolders and rename the files as needed.

    The reason I recommend going with custom code for renaming the files is because, If you would like to do this in a regular ADF pipeline, then it requires total of 4 pipelines and each pipeline contains minimum a GetMetadata activity, ForEach activity, Inside ForEach we need another GetMetadata activity and followed by an Execute pipeline activity and the flow looks like below:

    User's image

    1. ParentPipeline1 - > GetYearFolderNames -> ForEachYearFolder -> GetMonthFolderNames - >ExecuteMonthLevelPipeline
    2. ChildPipline1 - > ForEachMonthFolder->GetDateNamesForEachMonth->ExecuteDateLevelPipeline
    3. ChildPipeline2 -> ForEachDateFolder -> GetFileNamesForEachDateFolder - >ExecuteFinalFileCopylevel
    4. Final_ChildPipeline3 - >ForEachFileName-> CopyEachFileWithRenamedFileNamePreservingHierarchy

    Above will be the high level flow. I did try it out a sample with few folders and files and noticed that the performance is not as expected as the overall flow involves multiple activity executions. If I have this business requirement, I wouldn't prefer this approach as it is not cost/performant efficient while there is a better way by copying the files as is preserving folder hierarchy using copy activity and then will try to implement some custom code/application to just rename the files in those folders.

    Hope this info helps.


    Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Amira Bedhiafi 33,071 Reputation points Volunteer Moderator
    2023-03-02T15:33:17.4533333+00:00

    Start with creating a dataset (with binary format) that points directly to your blob containers and contains the full paths of the files in your folder hierarchy.

    Then create a pipeline a "Get Metadata" activity to get the list of files in the blob container. Set the metadata path to the parent folder that contains the subfolders and files.

    Use a "ForEach" activity to loop through the list of files retrieved in the previous step.

    In the "ForEach" activity, you will need an "If Condition" to filter for the PDF files by checking the file extension using the following expression :

    @endsWith(item().name, '.pdf')
    

    Finally use a "Copy Data" activity to proceed with the copying the PDF files to the destination already provided.

    You need to set the source dataset the one you created in 1st step and set the sink dataset to a new dataset pointing to the folder where you want to copy your files.

    For the mapping you will need this expression :

    @concat('file', uniqueString(), '_test.pdf')
    

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.