Access all files from folder and subfolders in blob storage by pipeline/data-flow

Question

Access all files from folder and subfolders in blob storage by pipeline/data-flow

Gerald Rupp 130

Hi everybody,

I have a folder in the blob storage that contains many subfolders with json files. Each of these json files has the attribute "Type". The attribute type can be "A", "B" or "C". I want to filter the json files by these types and store them in the blob storage folder "Type_A", "Type_B" or "Type_C".

My problem is to access all the json files, from the actual folder and(!) from the subfolders and conduct a data flow for filtering.

I tried to use getmetadata*,* but my subfolders have several layers: year - month - day - json_files

I tried to implement for_each, but I cannot implement a for_activity inside a for_each-Activity

Does anybody has another idea?
Thanks a lot for your help.

Kind regards,
Gerald

KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator

2023-02-28T00:29:12.4766667+00:00

@Gerald Rupp Just checking in to see if the below information was helpful. If it answers your query, please do click Accept Answer and Yes for "was this answer helpful", as it might be beneficial to other community members reading this thread. If you have any further query, do let us know.

Thank you

Accepted answer

0 additional answers

Your answer

KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator

2023-02-28T00:29:12.4766667+00:00

@Gerald Rupp Just checking in to see if the below information was helpful. If it answers your query, please do click Accept Answer and Yes for "was this answer helpful", as it might be beneficial to other community members reading this thread. If you have any further query, do let us know.

Thank you

Answer 1

Hi @Gerald Rupp ,

Welcome to Microsoft Q&A forum and thanks for reaching out here.

As per my understanding you have json files located in your storage account in the below folder structure and sample file looks like below, please correct me if I'm wrong.

Folder Structure:

Container

 Root

     Year

        Month

           Date

                 File1.json

                 ....

                 ....

                FileN.json

Assuming your sample file looks like below:

[{
    "Type": "A",
    "Data": {
        "Attribute11" : "Value11",
        "Attribute21" : "Value21"
    }
}, 
{
    "Type": "B",
    "Data": {
        "Attribute31" : "Value31",
        "Attribute41" : "Value41"
    }
}, 
{
    "Type": "C",
    "Data": {
        "Attribute51" : "Value51",
        "Attribute61" : "Value61"
    }
}
]

And you would like to seperate the data based on the Type Attribute (which could be A, B or C) and create a folder with that name and save them in that folder. Please correct me if I'm wrong anywhere.

To achieve the above requirement, the best way is to use Mapping data flow as it will reduce the complexity and also you can transform the data as per your custom requirement.

Steps to be followed: In dataset configuration, just provide the container name and leave directory and filename empty as you will configure them in data flow using wild card path.

User's image

Then in Mapping data flow source settings, please select source options as below using wild card paths and provide Partition root path and select the document form of your source.

User's image

Next, have a conditional split transformation, which will be used to split the data in all your source files based on each Type Attribute and write them to respective folders in your storage account.

User's image

Then have Select transformation to select only the columns that you would like to copy to each sink folder based on the Type Attribute as shown above.

Hope this helps. Kindly let us know if your requirement is different than my understanding. If that's case, please share few additional details about the requirement and we would be happy to assist accordingly.

Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.

Gerald Rupp 130 Reputation points

2023-02-28T13:30:10.58+00:00

Hallo KranthiPakala,

thanks for your answer and help. That helps a lot!! :)

Kind regards,
Gerald

Share via

Access all files from folder and subfolders in blob storage by pipeline/data-flow

0 additional answers

Your answer