adf/synapse - how to copy all files based on given lastmodified date from source container and save in sink container in respective folder based on extension

Wasim 0 Reputation points
2024-09-17T20:33:13.42+00:00

Req: Copy files from one container and store in another container- I am specifying the last modified date , i want to copy files only before that last last modified date. The files are of any type. csv,xml,json,txt etc.

Also, in source container my files could be present anywhere , there is no defined hierarchy where the files are stored, some files rightaway in cotainer, some in container/foldera/files, container/folderb/folderb1/files/.

I want to scan all these based on lastmodified condition , and save the files in sink container like -

container/ddata/yyyy/mm/dd/csv/files ,like wise xml,json etc inplace of csv.

Already, i was able to make pipeline that copies files from continer/files. and save into sink container/ddata/yyyy/mm/dd/extension/files.

what i want to do further is- keep the sink structure like- container/somefolder/allfilesinhere.(anyextension,based on lastmodified)

I tried just with copy activity- using container/**/*.csv and using filter last modified option in copyactivity, it copies only files,and is creating the same folder structure as source in the sink.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,921 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,703 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Vinodh247 20,976 Reputation points
    2024-09-18T05:35:13.67+00:00

    Hi Wasim,

    Thanks for reaching out to Microsoft Q&A.

    Try thte following approach which will allow you to maintain a flat folder structure in the sink container based on file ext. while scanning files from a complex hierarchy in the source container.

    1. Get Metadata Activity:
      • Use a Get Metadata activity to recursively list all the files in the source container. In this activity, set the option to list the "Child Items" to retrieve all files, including those in subfolders.
    2. Filter by Last Modified Date:
      • In the Get Metadata output, you'll have details of each file. Use a Filter activity to filter files based on the lastModified property. The filter condition can be set to include only those files whose lastModified date is earlier than your specified date.
    3. For Each Activity:
      • After filtering the files, use a For Each activity to loop through the filtered files.
    4. Determine File Extension:
      • Inside the For Each, use an If Condition or Switch activity to check the file extension (for ex: .csv, .xml, etc.) and create the appropriate path for saving in the sink container based on the file extension.
    5. Copy Activity:
      • Use a Copy Activity to copy the files from the source to the sink container. In the destination container path, dynamically build the folder structure as:
             
             container/somefolder/allfilesinhere/<extension>
        
    6. File Name Mapping:
      • In the copy activity, you can use dynamic content expressions to map the file names appropriately and avoid copying the folder structure. For example, in the sink's file path, you can use expressions to concatenate the year, month, and day based on the file's last modified date, and the file's extension for organizing files.

    This approach ensures that the folder structure from the source is not recreated in the sink. Files are saved based on their extension and last modified date in the desired format.

    Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will benefit other community members who face the same issue.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.