ADF Filter doesn't seem to filter out values.

Abhijit Shrikhande 377 Reputation points
2023-08-16T18:47:39.1766667+00:00

I have a pipeline where I am copying data from csv files into a SQL server.
The first step in the pipeline is "Get MetaData" which is listing all the files and folders in the azure storage container. I want to process all CSV files and ignore other items in the folder.

In my scenario, I had 3 files and 1 folder in the azure storage container. When I see the output, I see that 4 items have been processed. My understanding of filters is that just 3 items will be processed since the filter will FILTER OUT the folder. This is my filter syntax.

{
                "name": "FilterFiles",
                "description": "Process only files",
                "type": "Filter",
                "dependsOn": [
                    {
                        "activity": "Get Metadata1",
                        "dependencyConditions": [
                            "Succeeded"
                        ]
                    }
                ],
                "userProperties": [],
                "typeProperties": {
                    "items": {
                        "value": "@activity('Get Metadata1').output.childItems",
                        "type": "Expression"
                    },
                    "condition": {
                        "value": "@equals(item().type,'File')",
                        "type": "Expression"
                    }
                }
            

This is the metadata activity

  {
                "name": "Get Metadata1",
                "description": "Gets a list of all files including folders from the Uploaded Folder in azure storage",
                "type": "GetMetadata",
                "dependsOn": [],
                "policy": {
                    "timeout": "0.12:00:00",
                    "retry": 0,
                    "retryIntervalInSeconds": 30,
                    "secureOutput": false,
                    "secureInput": false
                },
                "userProperties": [],
                "typeProperties": {
                    "dataset": {
                        "referenceName": "CSVFileListUploadFolder",
                        "type": "DatasetReference"
                    },
                    "fieldList": [
                        "childItems"
                    ],
                    "storeSettings": {
                        "type": "AzureBlobStorageReadSettings",
                        "enablePartitionDiscovery": false
                    },
                    "formatSettings": {
                        "type": "BinaryReadSettings"
                    }
                }
            }

I notice that if I keep the filter or not, it still attemps to process 4 items, that is 1 FOLDER and 1 File. How do I avoid the Folder from getting processed?

ProcessImageFiles

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,444 questions
{count} votes

1 answer

Sort by: Most helpful
  1. KranthiPakala-MSFT 46,632 Reputation points Microsoft Employee
    2023-08-23T20:28:27.3833333+00:00

    @Abhijit Shrikhande Welcome to Microsoft Q&A forum and thanks for reaching out here.

    To Filter out only files and skip folder then you can use any of the below expression in your Filter activity condition:

    @equals(item().type, 'File')
    

    OR

    @not(equals(item().type, 'Folder'))
    

    OR

    @not(contains(item().type, 'Folder'))
    

    OR

    @contains(item().type, 'File')
    

    I just tested all these expressions, and everything is working as expected.

    Hope this info helps. In case if you are still blocked and these expressions doesn't work for you, kindly share you get metadata output JSON payload so that we can investigate further.

    Here is an old thread related to same requirement: Azure Data Factory - Use Filter Activity to filter .txt files from Folder


    Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.