Parallelizing an ADF copy activity from a Synapse view to a Synapse table

pmscorca 792 Reputation points
2024-03-21T18:57:44.7833333+00:00

Hi,

is it possible to parallelize a Data Factory copy activity in order to read data from a Synapse Analitics view to write data to a Synapse table?

Thanks

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,316 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,443 questions
0 comments No comments
{count} votes

Accepted answer
  1. Harishga 3,095 Reputation points Microsoft Vendor
    2024-03-22T06:39:31.82+00:00

    In addition to the above answer
    Hi @pmscorca

    Welcome to Microsoft Q&A platform and thanks for posting your question here.

    When you run a Data Factory copy activity, it reads data from a source and writes it to a destination. By default, the copy activity runs a single query to read data from the source and write it to the destination. However, when dealing with large amounts of data, this can be slow and inefficient.

    Parallelizing a Data Factory Copy Activity in Azure Synapse Analytics

    To improve the performance of a Data Factory copy activity, you can parallelize it. This means that the copy activity runs multiple queries in parallel to read data from the source and write it to the destination. In Azure Synapse Analytics, the Azure Synapse Analytics connector in the copy activity allows for built-in data partitioning to copy data in parallel.

    Enabling Partitioned Copy in Azure Synapse Analytics

    To enable partitioned copy in Azure Synapse Analytics, you can use the "parallelCopies" setting on the copy activity. This setting specifies the degree of parallelism for the copy activity, which determines how many parallel queries will be generated and run against the Azure Synapse Analytics source to load data by partitions.

    Example Scenario

    Let's say you have a large table in Azure Synapse Analytics with 1 million rows of data. You want to copy this data to another table in Azure Synapse Analytics using a copy activity. To enable partitioned copy, you can set the "parallelCopies" setting to 4. This means that the copy activity will generate and run 4 parallel queries against the Azure Synapse Analytics source to load data by partitions.

    Retrieving Data by Partitions

     Each query will retrieve a portion of the data from the Azure Synapse Analytics source, based on your specified partition option and settings.

    For example, you could partition the data based on a hash function applied to a specified column or based on a specified range of values in a column. 

    Optimizing the Copy Activity

     It's important to note that the optimal value for the "parallelCopies" property depends on the size of the data, the available resources, and the network bandwidth. Additionally, the copy activity can be further optimized by configuring the batch size and timeout settings, as well as the source and sink settings.

     
    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Amira Bedhiafi 14,251 Reputation points
    2024-03-21T20:47:01.9566667+00:00

    Based on the documentation :

    You can set the parallelCopies property to indicate the parallelism you want the copy activity to use. Think of this property as the maximum number of threads within the copy activity. The threads operate in parallel. The threads either read from your source, or write to your sink data stores.

    0 comments No comments