Share via

How do I copy a large number of files that match conditional logic checks against file name values without involving ForEach activity in ADF?

Roy 1 Reputation point
2021-08-13T15:29:07.237+00:00

I have 4000 files each averaging 30Kb in size landing in a folder on our on premise file system each day. I want to apply conditional logic (several and/or conditions) against details in their file names to only move files matching the conditions into another folder. I have tried linking a meta data activity which gets all files in the source folder with a filter activity which applies the conditional logic with a for each activity with an embedded copy activity. This works but it is taking hours to process the files. When running the pipeline in debug the output window appears to list each file copied as a line item. I’ve increased the batch count setting in the for each to 50 but it hasn’t improved things. Is there a way to link the filter activity directly to the copy activity without using for each activity? Ie pass the collection from the filter straight into copy’s source. Alternatively, some of our other pipelines just use the copy activity pointing to a source folder and we configure its filefilter setting with a simple regex using a combination of * and ?, which is extremely fast. However, in this particular scenario, my conditional logic is more complex and I need to compare attributes in each file’s name with values to decide if the file should be moved. The filefilter setting allows dynamic content so I could remove the filter activity completely, point the copy to the source folder and put the conditional logic in the filefilter’s dynamic content area but how would I get a reference to the file name to do the conditional checks?

Azure Data Factory
Azure Data Factory

An Azure service for ingesting, preparing, and transforming data at scale.

{count} votes

1 answer

Sort by: Most helpful
  1. ShaikMaheer-MSFT 38,636 Reputation points Microsoft Employee Moderator
    2021-08-16T07:25:09.51+00:00

    Hi @Roy ,

    Thank you for posting query in Microsoft Q&A Platform.

    Copy activity supports wildcard file path which will direct filter files in your path. But this feature supports only documented list of filters. Click here to know more about them.

    In your case, you would like to have a custom filter logic directly invoked in to Copy activity, Which is unfortunately, not supported at this moment.

    I would encourage you to create a feedback item for this feature. Product team actively monitor all feedbacks and consider them for feature releases.

    Please Note, If you like to store your Filter activity output to some file. Then you can try to save that filter activity output value in to some variable and then in side copy activity use addition column feature in source dataset. This will add an additional column in your sink file with the value passed. Try if this will help you in your way.

    Hope this will help. Please let us know if any further queries. Thank you.

    ------------------

    • Please accept an answer if correct. Original posters help the community find answers faster by identifying the correct answer. Here is how.
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification.
    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.