How to identify and copy the most recently added files in Azure Data Factory when there are multiple sub-folders?

AHMAD, SYED [Deloitte] 21 Reputation points
2021-05-02T18:44:21.393+00:00

The way folders are structured in DF is something like this:

Parentfolder/Subfolder1/Subfolder12/Subfolder13/File1

Parentfolder/Subfolder2/Subfolder22/Subfolder23/File2

Parentfolder/Subfolder3/Subfolder32/Subfolder33/File3

The Goal is create a pipeline that can identify the file that was most recently added under the Parentfolder and copy only that file and move to Sink. This may require multiple nested pipelines & foreach loops but I have not been able to get to a solution

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
6,332 questions
No comments
{count} votes

Accepted answer
  1. KranthiPakala-MSFT 33,141 Reputation points Microsoft Employee
    2021-05-03T18:52:55.317+00:00

    Hi @AHMAD, SYED [Deloitte] ,

    Welcome to Microsoft Q&A forum and thanks for reaching out.

    In order to copy the last modified file from a folder you can follow the steps described in this thread - ADF: copy last modified blob

    If you are having multiple subfolders under a parent folder and you want to copy the latest file from each subfolder, then you will have to use a parent pipeline in which you use GetMetaData activity to get the list of subfolders and then pass the output to a subsequent ForEach activity to iterate through the list of sub folder names and then inside ForEach activity have an Execute pipeline activity which will execute the copy the last modified pipeline for each subfolder.

    Below GIF is just to copy last modified file from a folder:

    93236-getlastmodifiedfile.gif

    Hope this helps. Do let us know if you have further query.


0 additional answers

Sort by: Most helpful