Managing Daily CSV Files in Retrieving Latest Files for Azure Blob Storage

Brianna C 120 Reputation points
2023-11-14T20:28:29.0333333+00:00

Within the S3 bucket folder, CSV files like 'TableA_20230802' arrive daily, featuring the table name and a data suffix (YYYYMMDD). After successfully copying files from S3 to a specific Azure Blob Storage container, I aim to progress by automating the transfer of the latest daily files dropped in S3. While attempting to use the 'get metadata' activity in Azure Data Factory (ADF) to retrieve the last modified date, I encountered difficulties. Despite awareness of the filter by the last modified date option, it proved ineffective. I have explored looping through child items and attempting to get the last modified date for each, but this approach fell short. Seeking an efficient solution, I want to know the best way to incrementally move files from AWS S3 to Blob Storage.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,028 questions
0 comments No comments
{count} votes

Accepted answer
  1. Smaran Thoomu 18,385 Reputation points Microsoft Vendor
    2023-11-15T12:02:40.21+00:00

    Hi @Brianna C

    Thank you for reaching out to us with your query. 

    As you mentioned that every new file of yours will be in TableA_YYYYMMDD format, you can use wildcard path in the copy activity source to get the latest files.

    Here, I have taken my source as ADLS gen2, but the process is same.

    These are my source csv files:

    image

    Here I used Set variable activity to store the current date in YYYYMMDD format in a string variable.

    @utcNow('yyyyMMdd')

    image (1)

    Now use copy activity to retrieve the latest files from storage account based on date.

    Here, my files end with .csv and thats why I have given * at the end of the expression. So, you can keep it if there is any file extension for your files. Now, use this in giving wildcard file path like below.

    *@{variables('mydate')}*
    
    
    

    image (2)

    Use the Recursively option, if there is any requirement of copying files recursively from sub folders.

    In the sink file, give your target folder path.

    image (3)

    Debug the pipeline, once it successful you will get the latest file as below:

    User's image

    User's image

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.