Copy Method in ADF - Polybase / Copy Command / Bulk Insert

M, Murugeswari (Cognizant) 456 Reputation points
2023-02-02T07:19:52.46+00:00

Hi,

We have data of million rows in delimited files placed in Azure Blob storage. We need to transfer to dedicated sql pool of Azure Synapse via Azure Data Factory

So which copy method would be suitable in ADF - Copy command / polybase / Bulk Insert. Please let me know

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,696 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,196 questions
{count} votes

1 answer

Sort by: Most helpful
  1. KranthiPakala-MSFT 46,442 Reputation points Microsoft Employee
    2023-02-03T00:02:22.0966667+00:00

    Hi Murugeswari,

    Thank you for using Microsoft Q&A forum and posting your query.

    To achieve best performance, ADF product team recommends using PolyBase or COPY statement to load data into Azure Synapse Analytics when compared to bulk insert as it is a slower load method.

    Usage of Polybase vs CopyCommand: It depends on the scenario. If you want to load data from a data store other than Blob storage, you can activate data copying via interim staging Blob storage. In that case, Data Factory performs the required data transformations to ensure that it meets the requirements of PolyBase. Then it uses PolyBase to load data into Azure Synapse Analytics. This is known as hybrid data movement.

    On the other hand, if you want to load data from a data store that is already in Blob storage, you can use the COPY statement in Azure Synapse Analytics to load data directly into Azure Synapse Analytics. This is known as direct data movement.

    Both methods have high performance.

    Polybase is the most efficient way to move data into Azure Synapse Analytics with high throughput. PolyBase is the best choice when you are loading large volumes of data, or you need faster performance. Use the staging blob feature to achieve high load speeds from all types of data stores, including Azure Blob storage and Data Lake Store. (Polybase supports Azure Blob storage and Azure Data Lake Store by default.)

    For more information about these feature please refer to this documentation: ADF - Azure Synapse Analytics as sink

    Here is tutorial on how to load 1TB of data into Synapse Analytics under 15 mins: Load 1 TB into Azure Synapse Analytics under 15 minutes with Data Factory

    Hope this info helps.


    Please don’t forget to Accept Answer and Up-Vote wherever the information provided helps you, this can be beneficial to other community members.

    2 people found this answer helpful.
    0 comments No comments