Remove objects from the output of a web or filter activity in ADF

Julius 40 Reputation points
2023-03-17T18:39:01.68+00:00

Hello everyone,

I have a ADF pipeline which uses a web activity to list object data from an API endpoint. The output is then filtered and gives a result following this schema:

{
    "ItemsCount": 886,
    "FilteredItemsCount": 8,
    "Value": [
        {
            "path": "FolderName1/FolderName2/EndFolderName3/FileName1",
            "path_type": "object"
        },
        {
            "path": "FolderName1/FolderName2/EndFolderName3/FileName2",
            "path_type": "object"
        },
        {     
            "path": "FolderName1/FolderName2/EndFolderName4/FileName3",
            "path_type": "object"
        },
        {
            "path": "FolderName1/FolderName2/EndFolderName4/FileName4",
            "path_type": "object"          
        },
        {
            "path": "FolderName1/FolderName2/EndFolderName5/FileName5",
            "path_type": "object"
        },
        {
            "path": "FolderName1/FolderName2/EndFolderName5/FileName6",
            "path_type": "object"
        },
        {
            "path": "FolderName1/FolderName2/EndFolderName6/FileName7",
            "path_type": "object"
        },
        {
            "path": "FolderName1/FolderName2/EndFolderName6/FileName8",
            "path_type": "object"
        }
    ]
}

As you can see. There are multiple "Endfolders" which contain multiple objects.

How can I adjust this output so that I have the same schema but only have a single object of each "Endfolder"?

To be more clear: The target Schema is supposed to look like this:

{
    "ItemsCount": 886,
    "FilteredItemsCount": 8,
    "Value": [
        {
            "path": "FolderName1/FolderName2/EndFolderName3/FileName1",
            "path_type": "object"
        },
        {     
            "path": "FolderName1/FolderName2/EndFolderName4/FileName3",
            "path_type": "object"
        },                   
        {
            "path": "FolderName1/FolderName2/EndFolderName5/FileName5",
            "path_type": "object"
        },
        {
            "path": "FolderName1/FolderName2/EndFolderName6/FileName7",
            "path_type": "object"
        }
    ]
}

The target is to have every "Endfolder" only once in the output since the output is used for a loop activity which causes bugs if the loop activities are performed several times over the same "Endfolder". It doesn't matter which file in an "Endfolder" is left. It is only important that it is just one. Also the amount of folder layers and objects inside an "Endfolder" may vary.

How can I do this with Azure Data Factory?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,196 questions
0 comments No comments
{count} votes

Accepted answer
  1. KranthiPakala-MSFT 46,442 Reputation points Microsoft Employee
    2023-03-21T01:38:03.3133333+00:00

    Hi @Julius ,

    Welcome to Microsoft Q&A forum and thanks for reaching out here.

    My assumption is that your folder and subfolder hierarchy remain consistent. If yes, then you can create an array variable which will be used in Append variable activity which will be used in condition check to filter down to distinct paths.

    First use a ForEach activity to loop through each object and then have an If condition activity to check for condition if the EndFolder exists in the current path item using the below condition expression:

    @contains(string(variables('varDistinctPath')),split(item().path, '/')[2])
    

    If it doens't exist, then append that path value to the append variable and continue this check for all the iterations. Once the checks are completed, your append variable will have unique path list as below:

    {
        "name": "var_listOfPaths",
        "value": [
            "FolderName1/FolderName2/EndFolderName3/FileName1",
            "FolderName1/FolderName2/EndFolderName4/FileName3",
            "FolderName1/FolderName2/EndFolderName5/FileName5",
            "FolderName1/FolderName2/EndFolderName6/FileName7"
        ]
    }
    

    Then outside of ForEach activity have FilterActivity and assign the source input as Filter items and then write a filter condition to check if the filter items path is present in append variable arrray, if yes, then filter them. This will result the final filtered list as below:

    Filter condition:

    @contains(variables('varDistinctPath'), item().path)
    
    

    Final filtered list:

    {
        "ItemsCount": 8,
        "FilteredItemsCount": 4,
        "Value": [
            {
                "path": "FolderName1/FolderName2/EndFolderName3/FileName1",
                "path_type": "object"
            },
            {
                "path": "FolderName1/FolderName2/EndFolderName4/FileName3",
                "path_type": "object"
            },
            {
                "path": "FolderName1/FolderName2/EndFolderName5/FileName5",
                "path_type": "object"
            },
            {
                "path": "FolderName1/FolderName2/EndFolderName6/FileName7",
                "path_type": "object"
            }
        ]
    }
    

    Here is the sample pipeline flow looks like and below is the JSON payload which you can copy paste and try it out yourself.

    User's image

    {
        "name": "pl_FilterWebOutputToDistinctFilePaths",
        "properties": {
            "description": "https://learn.microsoft.com/en-us/answers/questions/1190817/remove-objects-from-the-output-of-a-web-or-filter",
            "activities": [
                {
                    "name": "ForEachPath",
                    "type": "ForEach",
                    "dependsOn": [],
                    "userProperties": [],
                    "typeProperties": {
                        "items": {
                            "value": "@pipeline().parameters.webResponseObject.Value",
                            "type": "Expression"
                        },
                        "isSequential": true,
                        "activities": [
                            {
                                "name": "If Condition1",
                                "type": "IfCondition",
                                "dependsOn": [],
                                "userProperties": [],
                                "typeProperties": {
                                    "expression": {
                                        "value": "@contains(string(variables('varDistinctPath')),split(item().path, '/')[2])",
                                        "type": "Expression"
                                    },
                                    "ifFalseActivities": [
                                        {
                                            "name": "Append variable1",
                                            "type": "AppendVariable",
                                            "dependsOn": [],
                                            "userProperties": [],
                                            "typeProperties": {
                                                "variableName": "varDistinctPath",
                                                "value": {
                                                    "value": "@item().path",
                                                    "type": "Expression"
                                                }
                                            }
                                        }
                                    ]
                                }
                            }
                        ]
                    }
                },
                {
                    "name": "FilterDistinctPaths",
                    "type": "Filter",
                    "dependsOn": [
                        {
                            "activity": "ForEachPath",
                            "dependencyConditions": [
                                "Succeeded"
                            ]
                        }
                    ],
                    "userProperties": [],
                    "typeProperties": {
                        "items": {
                            "value": "@pipeline().parameters.webResponseObject.Value",
                            "type": "Expression"
                        },
                        "condition": {
                            "value": "@contains(variables('varDistinctPath'), item().path)",
                            "type": "Expression"
                        }
                    }
                },
                {
                    "name": "set_listOfPaths",
                    "type": "SetVariable",
                    "dependsOn": [
                        {
                            "activity": "ForEachPath",
                            "dependencyConditions": [
                                "Succeeded"
                            ]
                        }
                    ],
                    "userProperties": [],
                    "typeProperties": {
                        "variableName": "var_listOfPaths",
                        "value": {
                            "value": "@variables('varDistinctPath')",
                            "type": "Expression"
                        }
                    }
                }
            ],
            "parameters": {
                "webResponseObject": {
                    "type": "object",
                    "defaultValue": {
                        "ItemsCount": 886,
                        "FilteredItemsCount": 8,
                        "Value": [
                            {
                                "path": "FolderName1/FolderName2/EndFolderName3/FileName1",
                                "path_type": "object"
                            },
                            {
                                "path": "FolderName1/FolderName2/EndFolderName3/FileName2",
                                "path_type": "object"
                            },
                            {
                                "path": "FolderName1/FolderName2/EndFolderName4/FileName3",
                                "path_type": "object"
                            },
                            {
                                "path": "FolderName1/FolderName2/EndFolderName4/FileName4",
                                "path_type": "object"
                            },
                            {
                                "path": "FolderName1/FolderName2/EndFolderName5/FileName5",
                                "path_type": "object"
                            },
                            {
                                "path": "FolderName1/FolderName2/EndFolderName5/FileName6",
                                "path_type": "object"
                            },
                            {
                                "path": "FolderName1/FolderName2/EndFolderName6/FileName7",
                                "path_type": "object"
                            },
                            {
                                "path": "FolderName1/FolderName2/EndFolderName6/FileName8",
                                "path_type": "object"
                            }
                        ]
                    }
                }
            },
            "variables": {
                "varDistinctPath": {
                    "type": "Array"
                },
                "var_listOfPaths": {
                    "type": "Array"
                }
            },
            "annotations": []
        }
    }
    

    Hope this helps.


    Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful