Latest files based on last modified date of files in Datalake using Azure data factory

pankaj chaturvedi 86 Reputation points
2022-08-24T12:31:52.847+00:00

Hi Team,

I am unable to extract latest files based on last modified date of the files from datalake storage and the provided solution is not working as expected. I followed below link , not sure where I gone wrong.
https://learn.microsoft.com/en-us/answers/questions/349584/index.html

Could you please help me out here. Please find my test screenshot for the same.
234512-image.png
234430-image.png
234429-image.png
234479-image.png
234428-image.png

I have two kind of files TEST1 folder a.)customer_20220820122033.csv,customer_20220820132324.csv
TEST2 folder b.) that has multiple customer.csv files with different last modified date and need to extract latest files from it based last modified date.

So i need to extract the latest files for both the scenario. Please help me out asap . Thanks

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,335 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,486 questions
{count} votes

Accepted answer
  1. AnnuKumari-MSFT 30,361 Reputation points Microsoft Employee
    2022-08-25T06:25:23.167+00:00

    Hi @pankaj chaturvedi ,

    Thankyou for using Microsoft Q&A platform and thanks for posting your query.

    As I understand your query, you are trying to find out the latestModified file from two folders: TEST1 and TEST2. Please let me know if that's not the requirement.

    First of all the solution which you are following would perfectly work if in case you want to copy the latestfile from one folder not from multiple folders. So, for your requirement , we need to introduce another pipeline which would loop through the two folders and execute the already created child pipeline.

    There are couple of changes you need to accomodate in the child pipeline as well. I will point it out later.

    For the parent pipeline, kindly follow the following steps:

    1. Use Get metadata activity and use Child Items in the field list . In the dataset, don't provide any file path. Keep it blank so that it would fetch all the folder names present in the storage account.
    2. Use Filter activity and provide @activity('Get Metadata1').output.childItems in the Items and @or(equals(item().name,'TEST1'),equals(item().name,'TEST2')) in the condition . (Note: In my case , in below gif I am checking for folders having 'adls' in the foldername)
    3. Use ForEach activity and provide @activity('Filter1').output.value in the Items.
    4. Inside ForEach activity, use Execute Pipeline to invoke the child pipeline . In the child pipeline, Create a parameter called SinkFileName and in settings tab of execute pipeline activity of parent pipeline , provide the value for the parameter as @item().name
    5. In the child pipeline, in the last copy activity parameterize the sink dataset by creating a parameter called FileName and use that in the filePath of connection tab as @concat(dataset().FileName,'.csv') . In the sink tab of copy activity, provide the value for the parameter as
      @pipeline().parameters.SinkFileName
    6. In the first Getmetadata activity of child pipeline, kindly parameterize the dataset by creating parameter called FolderName and provide value as @pipeline().parameters.SinkFileName .
    7. In the sink dataset of child pipeline , you are using wildcard file path which is not correct. You should be using file path in dataset .

    Parent Pipeline:

    234730-getlatestfile1.gif

    Child Pipeline:

    234739-getlatestfile2.gif

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png button and take the satisfaction survey whenever the information provided helps you.
      Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

0 additional answers

Sort by: Most helpful