Copy data from SFTP to ADLS Gen 2 via ADF

Anonymous 0 Reputation points
2023-03-16T06:21:50.2266667+00:00

I have around 200GB of data in SFTP server in .gz format. I need to copy that into ADLS via ADF. Can it be done.? what would be dataset file format that needs to be taken. The file format should also be the same .gz format in adls. There is no need for any transformation here.

Aslo, would the region of the data matter here? if Yes, how quicker it would be if I have the source and target in the same region?

Is there any other quicker way other than ADF to achieve this?

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,355 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
1,947 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,643 questions
{count} votes

1 answer

Sort by: Most helpful
  1. BhargavaGunnam-MSFT 26,496 Reputation points Microsoft Employee
    2023-03-17T22:55:34.45+00:00

    Hello shankar vishal,

    Welcome to the MS Q&A platform.

    You can use copy activity by selecting SFTP connector as your source.

    You need to choose Binary as the dataset file format and select compression type as "gzip(.gz)"

    and yes, there is no need for any transformations here.

    Regarding the region, having the source and target in the same region is recommended to reduce the latency.

    Here are a few things you can try to improve the performance of your copy activity:

    • Increase the number of concurrent copy threads in ADF. This can be done in the "Settings" tab of the copy activity. Try to increase the number of threads to see if that improves performance.
    • When using"Binary" option, it skips metadata validation and can improve performance.
    • With a different Azure region for your ADF instance. If your ADF instance is in a different region than your storage account, this can cause slower copy times due to network latency.

    Other quicker alternative options are by using AzCopy command or using Azure file sync.

    Using Azcopy:

    The copy activity in ADF can be slower than other tools like AzCopy because it is optimized for reliability and not performance. ADF prioritizes data integrity and consistency over speed, which can result in slower copy times.

    You could consider using Azure Blob Storage as an intermediary location. You could first copy the data from the SFTP server to Azure Blob Storage and then use ADF custom activity to execute AzCopy command using a batch script, PowerShell.

    Azure file sync:

    Another option to copy large files from an SFTP server to ADLS more efficiently could be to use Azure File Sync.

    Azure File Sync allows you to synchronize files between your on-premises file server and Azure Files, which can be used as a staging location for ADLS.

    Please see the below document for more details.

    Reference document:
    https://learn.microsoft.com/en-us/azure/data-factory/connector-sftp?tabs=data-factory
    https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-performance

    https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-performance-troubleshooting

    https://learn.microsoft.com/en-us/azure/storage/file-sync/file-sync-introduction

    User's image

    I hope this helps. Please let me know if you have any further questions.

    If this answers your question, please consider accepting the answer by hitting the Accept answer and up-vote as it helps the community look for answers to similar questions

    1 person found this answer helpful.