Azure Data Factory - We have a requirement to select only specific files from a folder and copy into another folder in ADLS and also need to copy only new and last modified files

Abdulla Mahammad Khan Dawood 186 Reputation points
2021-06-07T14:09:24.107+00:00

Hi All,

Good Day,

We have a requirement to select only specific files from a folder and copy into another folder in ADLS and also need to copy only new and last modified files apart from selecting only files from the source folder. Also there will no file extention in source folder however the data is in JSON format.

There will be two different file names in source folder

BackendREQ_
BackendRESP_

Here we need to copy only BackendRESP_ files in destination folder with incremental approach which are newly created and last modified.

Can anyone help me in providing the Data Factory ingestion pipeline solution to meet the requirement, here source and sink both are ADLS Gen2 and file format is JSON.

103091-sourcefiles.jpg

Thank you in anticipation!!

Regards,
Mahammad Khan

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,558 questions
0 comments No comments
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA-MSFT 88,636 Reputation points Microsoft Employee
    2021-06-08T11:41:58.913+00:00

    Hello @Abdulla Mahammad Khan Dawood ,

    Thanks for the question and using MS Q&A platform.

    To copy specific file from folder we can use Wild card file path option in copy activity source tab.

    Also, to filter to latest files we can use "Filter by Last Modified" filed to specify start time and end time.

    In below screenshot I specified configurations to take files with name starts with BackendRESP & filter to only files which last modified between 7th June 2021 00:00:00 to 8th June 2021 00:00:00.

    103471-image.png

    Hope this helps. Do let us know if you any further queries.

    ---------------------------------------------------------------------------

    Please "Accept the answer" if the information helped you. This will help us and others in the community as well.


1 additional answer

Sort by: Most helpful
  1. PRADEEPCHEEKATLA-MSFT 88,636 Reputation points Microsoft Employee
    2021-06-09T12:43:09.11+00:00

    Hello @Abdulla Mahammad Khan Dawood ,

    For dynamically fetching new files we should implement through Pipeline parameters.

    Please follow the below steps for the detailed implementation.

    Step1: Create two parameters in pipeline as "StartTime" & "EndTime"

    103854-image.png

    Step2: In Copy Activity Source Tab for "Filter by last modified" pass above created parameters dynamically as below.

    103902-image.png

    Step3: Create a tumbling window trigger on this pipeline and pass Window start time(@Trigger ().outputs.windowStartTime) and Window end time(@Trigger ().outputs.windowEndTime) dynamically in to pipeline.

    103856-image.png

    For example, I created a tumbling window trigger which runs daily once from 5th June 2021. Then on for 9th JUN 2021 execution, trigger will pass 8th June as window start time & 9th June as window end time.

    103922-image.png

    Make sure to publish the changes.

    To know more about tumbling window trigger please check below documentation: https://learn.microsoft.com/en-us/azure/data-factory/how-to-create-tumbling-window-trigger

    Hope this helps. Do let us know if you any further queries.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.