Multiple filter conditions in a pipeline

CoffeeCanuck 26 Reputation points
2021-05-22T14:51:29.387+00:00

Hi is it possible to add multiple filter components in a pipeline. For instance if I want to filter a Data Lake folder first using a metadata activity and then filtering on Items with a specific word in the file, a date and file type = 'csv':
@activity('GetFileList').output.childItem and @equals(item().type, 'File')
I can use a @ANDO function to get and startswith, endswith function to get filename and filetype, but this seems wrong for instance. Is there a better way to filter for file types. The intention is to use a copy activity to move files of a specific type to their own folder. Any help would be appreciated.
Thank you.

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,426 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,196 questions
0 comments No comments
{count} votes

Accepted answer
  1. KranthiPakala-MSFT 46,442 Reputation points Microsoft Employee
    2021-05-24T22:35:22.043+00:00

    Hi @CoffeeCanuck ,

    Welcome to Microsoft Q&A forum and thanks for reaching out.

    As per my understanding your requirement is to move specific type of files to their respective folders in sink. If that is the case.

    Option 1:
    The simplest way is to create a pipeline with multiple parallel copy activities and use binary dataset on both source and sink side. Then use wildcard filename path in source to filter a particular type of files from a folder in ADLS Gen2. For this implementation you will have to use a copy activity for each file type (*.csv or *.txt , so on...)

    99271-image.png

    Option 2:
    In case if would want to make this more dynamic with single copy activity, then an alternate would be to have a reference SQL table or reference File with two columns (FileType, SinkFilepath) to lookup the values.

    Sample table as below:

    99229-image.png

    Then have a lookup activity to lookup on the reference table/file and retrive the file type values and sink file path then pass the lookup output to subsequent ForEach activity which contains a Copy activity and map the lookup output values in copy activity source wild card file name settings and sink file path settings.

    Note:

    1. Please make sure to uncheck Recursively option under copy activity source settings so that only file under specified folder are moved to sink.
    2. Make sure you select binary dataset inorder to delete the files from source after copying to sink location (nothing but moving files)

    The reason for the above solution is because you can avoid multiple activities like Filter activity, ForEach activity in your pipeline which would save cost as well. If you would like to use filter activity to filter by fileType as File and FileName endsWith '.csv' , then you will have to have a filter activity one for each file type (.csv/.txt/.json) then followed by ForEach activity containing a copy activity.

    Hope this info help. Do let us know if you have further query.

    ----------

    Please don’t forget to Accept Answer and Up-Vote wherever the information provided helps you, this can be beneficial to other community members.


0 additional answers

Sort by: Most helpful