Copy activity in Azure synapse

Bikramjit Sanjenbam 21 Reputation points
2022-12-06T03:01:40.327+00:00

We are performing copy activity using SFTP to azure ADLS Gen 2 storage. It is schedule trigger (every 1 hr). Is there any way to skip those files, if it is already copied? Currently, it is overriding those files in ADLS Gen2 storage. One way, I see is to use delete activity and delete those files after copy activity is done. But I want to check if there is any other better way of implementing the same.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
0 comments No comments
{count} votes

Answer accepted by question author
  1. Nandan Hegde 36,716 Reputation points MVP Volunteer Moderator
    2022-12-06T03:18:48.033+00:00

    Hey,
    Assuming you have multiples files piled up in SFTP and have to filter out specific file to copy , the best way would be to archive the files from source folder in SFTP to another folder or delete them post Copy activity for better performance in future.

    Need to check whether Get meta data activity is supported for SFTP:
    Else you can use get meta data activity to get list of all files and filter that list based on file timestamp with a watermark which you would have to preserve after copying the files.


1 additional answer

Sort by: Most helpful
  1. Bikramjit Sanjenbam 21 Reputation points
    2022-12-06T03:52:33.63+00:00

    Thank you very much. I think I will incline to the delete approach.
    For the watermark, irrespective of the source timestamp in the SFTP folder, when it is copied to the ADLS, the modified time is always the current pipeline run time.
    e.g. say file1 in SFTP is 12/5/2022 1 PM and the pipeline is run at 12/5/2022 2 PM, the file1 modified time in ADLS is 12/5/2022 2PM

    Another way, I am thinking is to use tumbling window trigger and create two parameters
    LastModified_From = @Trigger ().outputs.windowStartTime
    LastModified_To = @Trigger ().outputs.windowEndTime
    Use the above two times in my source (SFTP) COPY ACTIVITY - "Filter by last modified" Start time (UTC) and End time (UTC).
    https://learn.microsoft.com/en-us/azure/data-factory/solution-template-copy-new-files-last-modified-date


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.