Continuous incremental copy from FTP to Azure Blob Storage using ADF pipeline

Rahele Allahverdi 1 Reputation point
2021-07-28T08:40:44.047+00:00

Hi

I'd like to copy new files from an FTP server using ADF pipeline and load them into Azure blob storage.

The files get dropped in the FTP server every 15 minutes and stay there for 2 hours.

There is a current ADF pipeline created that has a schedule on it to run every 15 minutes and copy every file matching the defined Wildcard to Azure Blob storage. The problem with this approach is that the pipeline copies the new files as well as the old files as long as they exist in the FTP location each time which is not optimized.

I want to create a pipeline to copy only the new files.

I was thinking of getting the file names using Get Metadata activity and compare with those already copied so that I only copy the new ones. However, there are around 5,000 files existing in the FTP path and I only want to get 2 files each time. According to MS documentation, I can't use Wildcard in Metadata activity to get the names of those particular files. This means that if I get the file names, it will bring all the 5000 files existing there.

It is worth mentioning that these files have timestamps in their name so this might be useful for a workaround.

I also tried using MS template to copy new files using LastModifiedDate but unfortunately, it doesn't work as LastModifiedDate is not supported on FTP. This is the error:

ErrorCode=UserErrorLastModifiedNotSupported,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Cannot get 'LastModified' infomation from file 'IDN65901_20210727000511.hcs'. Please make sure this connector has supported relevant properties and source file is in valid status.,Source=Microsoft.DataTransfer.ClientLibrary,'

Could anyone please help me to achieve this?

Also, other suggestion than ADF is also welcome :)

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,803 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.