How to filter by modified time when copy files from FTP to ADLS in Data Facoty

Bruce Sheng Jun Wu 20 Reputation points
2024-08-12T06:20:36.11+00:00

I want to use Data Factory to copy data from the local file system to ADLS, if the source is shared folder or SFTP, I can set the filter by file modified time, but if the source is FTP then I cannot set the filter by file modified time because it's not available.

User's image

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,645 questions
0 comments No comments
{count} votes

Accepted answer
  1. Smaran Thoomu 24,260 Reputation points Microsoft External Staff Moderator
    2024-08-12T16:57:47.74+00:00

    @Bruce Sheng Jun Wu - Thank you for posting query in Microsoft Q&A Platform.

    Directly to load from FTP server using Last modified files files inside copy activity is not possible for FTP source at this moment. Click here to know about available fields for FTP source in copy activity.

    You should have logic of filtering your last files first using GetMetaData activity, ForEach activity & IF activity etc. and then perform copy of that files alone.

    Below link has similar implementation discussed. Kindly check it. https://stackoverflow.com/questions/50298122/azure-data-factory-incremental-data-load-from-sftp-to-blob

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Pinaki Ghatak 5,600 Reputation points Microsoft Employee Volunteer Moderator
    2024-08-13T13:44:05.09+00:00

    Hello @Bruce Sheng Jun Wu

    To filter files by modified time when copying from FTP to ADLS in Data Factory, you can use the modifiedDatetimeStart and modifiedDatetimeEnd properties in the dataset definition.

    However, as you mentioned, FTP does not support filtering by modified time. In this case, you can use a workaround by first copying all the files from FTP to a staging location, and then using the modifiedDatetimeStart and modifiedDatetimeEnd properties to filter the files in the staging location before copying them to ADLS. Here's an example of how you can define the dataset for the staging location:

    { 
    	"name": "ftpStagingDataset", 
    	"properties": { 
    		"linkedServiceName": { 
    			"referenceName": "ftpLinkedService", 
    			"type": "LinkedServiceReference" 
    			}, 
    		"folderPath": "ftpFolderPath", 
    		"modifiedDatetimeStart": "2022-01-01T00:00:00Z", 
    		"modifiedDatetimeEnd": "2022-01-31T23:59:59Z", 
    		"structure": [ 
    			{ "name": "fileName", "type": "String" } 
    			], 
    		"type": "FileSystem" 
    		} 
    }
    

    In this example, the modifiedDatetimeStart and modifiedDatetimeEnd properties are set to filter files modified between January 1, 2022, and January 31, 2022. You can then use this dataset as the source for your copy activity to copy the filtered files to ADLS.


    I hope that this response has addressed your query and helped you overcome your challenges. If so, please mark this response as Answered. This will not only acknowledge our efforts, but also assist other community members who may be looking for similar solutions.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.