Hello shankar vishal,
Welcome to the MS Q&A platform.
You can use copy activity by selecting SFTP connector as your source.
You need to choose Binary as the dataset file format and select compression type as "gzip(.gz)"
and yes, there is no need for any transformations here.
Regarding the region, having the source and target in the same region is recommended to reduce the latency.
Here are a few things you can try to improve the performance of your copy activity:
- Increase the number of concurrent copy threads in ADF. This can be done in the "Settings" tab of the copy activity. Try to increase the number of threads to see if that improves performance.
- When using"Binary" option, it skips metadata validation and can improve performance.
- With a different Azure region for your ADF instance. If your ADF instance is in a different region than your storage account, this can cause slower copy times due to network latency.
Other quicker alternative options are by using AzCopy command or using Azure file sync.
Using Azcopy:
The copy activity in ADF can be slower than other tools like AzCopy because it is optimized for reliability and not performance. ADF prioritizes data integrity and consistency over speed, which can result in slower copy times.
You could consider using Azure Blob Storage as an intermediary location. You could first copy the data from the SFTP server to Azure Blob Storage and then use ADF custom activity to execute AzCopy command using a batch script, PowerShell.
Azure file sync:
Another option to copy large files from an SFTP server to ADLS more efficiently could be to use Azure File Sync.
Azure File Sync allows you to synchronize files between your on-premises file server and Azure Files, which can be used as a staging location for ADLS.
Please see the below document for more details.
Reference document:
https://learn.microsoft.com/en-us/azure/data-factory/connector-sftp?tabs=data-factory
https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-performance
https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-performance-troubleshooting
https://learn.microsoft.com/en-us/azure/storage/file-sync/file-sync-introduction
I hope this helps. Please let me know if you have any further questions.
If this answers your question, please consider accepting the answer by hitting the Accept answer and up-vote as it helps the community look for answers to similar questions