How to read files from sub folders in Azure data factory

ankit kumar 101 Reputation points
2020-10-05T14:27:38.72+00:00

I have a blob container with a parent folder and multiple sub folders and each folders having files. I would like to get the file names only (and not the sub folder name) and rename the file names.

I see that you can use get metadata activity but how can I make it dynamic when you don't have same layer of sub-folders. what i mean is i have a structure something like this:

ParentFolder --> SubFolder1 --> Test.csv
ParentFolder---> SubFolder2 --> Test2.json
ParentFolder---> Subfolder2---> SubFolder3---> Test4.dat
ParentFolder---> Subfolder4--> Subfolder5---> Test3.txt

Using ADF I need to rename Test.csv, Test2.json, Test4.dat, Test3.txt, Is it possible to read only the files inside a Parent folder , considering I will have multiple Parent Folders , each having different structure.

Thanks ,
zzz

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,651 questions
{count} votes

2 answers

Sort by: Most helpful
  1. HarithaMaddi-MSFT 10,146 Reputation points
    2020-10-06T13:10:46.247+00:00

    Hi @ankit kumar ,

    Welcome to Microsoft Q&A Platform. Thanks for posting the query.

    This is complicated to achieve in data factory if the folder structure is dynamic and also there is no activity directly available to rename the file name in data factory. Below GIF shows an workaround approach to loop through folders and separate files and folders in them. Later, the files can be passed to child pipeline which can use dataflow to copy the files with new name and delete the source file using the option shown in snap below. Since we cannot use foreach inside a foreach , parent and child pipelines are used here.

    Parent Pipeline:
    30441-parentpipelineadf.gif

    Dataflow option in child pipeline:
    30451-image.png

    I would recommend to upvote below feedback items that are related to this requirement. Also, please post any new idea in this feedback forum that is closely monitored by data factory product team and will be implemented in future releases.

    4041042-rename-blobs-without-needing-to-copy-them
    39756814-add-foreach-loop-nested-capability

    Hope this helps! Please let us know if this is not aligning with the requirement and we will be glad to assist further.

    2 people found this answer helpful.

  2. Alexandre Martins Neiva 1 Reputation point
    2022-10-12T11:56:29.74+00:00

    Hi @HarithaMaddi-MSFT

    I have the same problem. I have a blob storage and read files in sub level event hub capture. Follow my solution and it's working, would it be correct? Could it improve?

    A problem I identified was when an error occurs within the until task does not stop

    249754-image.png

    Create variables: filename string, pos string, posAux string, pathFolder array, pathFolderAux array, containeer string

    249668-image.png

    Pipeline:

    Inicialize conatineer variable with root path.

    1. Append variable Container (pathFolder): @variables('container')

    249666-image.png

    2) create task until
    UntilFolder with condition: @greaterOrEquals(int(variables('pos')), length(variables('pathFolder')))

    249731-image.png

      2.1) Set variable Containeer (container): @variables('pathFolder')[int(variables('pos'))]  
      2.2) Set variable PosAux (posAux): @variables('pos')  
      2.3) Set variable Pos (pos): @string(add(int(variables('posAux')), 1))  
      2.4) Get Metadata File: directory = @variables('container'), filename = *.avro and Field list = Child items  
    

    249687-image.png

      2.5) FilterFile  
             2.5.1) If Condition File: Expression = @greater(length(activity('FilterFilename').output.value), 0)  
                       2.5.1.1) If true: read file  
    
      2.6) FilterFolder: items = @activity('Get Metadata File').output.childItems and Condition = @equals(item().type, 'Folder')  
             2.6.1) Set variable PathFolder Aux(pathFolderAux): @variables('pathFolder')  
             2.6.2) Set variable PathFolder(pathFolder):   
    

    @union(variables('pathFolderAux'),
    split(
    replace(
    replace(
    replace(
    replace(
    replace(
    replace(
    replace(
    string(activity('FilterFolder').output.value)
    , ',"type":"Folder"','')
    ,'"name":',concat(variables('container'), '/'))
    ,'"','')
    ,'{','')
    ,'}','')
    ,'[','')
    ,']','')
    ,',')
    )

    249724-image.png

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.