Azure Data Factory - sink and settings configuration in Copy data activity to copy 300 million rows from one SQL table to another

Question

Azure Data Factory - sink and settings configuration in Copy data activity to copy 300 million rows from one SQL table to another

Prakhar Parimal 36

Hi,

I have a requirement to copy a table from one SQL DB to another. The source table have 300 million rows which sums up to 61 GB of data. With default configuration in Copy Data activity in ADF it takes around 13-14 hours to complete the copy task.

How can I determine the values of following fields for better performance -

Write batch size
Data integration unit
Degree of copy parallelism

Or is there any better way in ADF to achieve this?

Thanks

MartinJaffer-MSFT 26,236 Reputation points

2021-12-13T17:57:31.627+00:00

@Prakhar Parimal did my response help you? If it solved your issue, please mark as accepted answer. Otherwise let me know how I may better assist.
Bill Gesora 20 Reputation points

2024-04-03T13:31:39.05+00:00

Hey, how did you optimize this?

Accepted answer

0 additional answers

Your answer

MartinJaffer-MSFT 26,236 Reputation points

2021-12-13T17:57:31.627+00:00

@Prakhar Parimal did my response help you? If it solved your issue, please mark as accepted answer. Otherwise let me know how I may better assist.
Bill Gesora 20 Reputation points

2024-04-03T13:31:39.05+00:00

Hey, how did you optimize this?

Answer 1

Hello @Prakhar Parimal and welcome to Microsoft Q&A. Looks like you have done some reading already and found the main 'levers to pull'. Generally, the more Data integration unit and the more degree of copy parallelism, the better. However, like anything in life, too much is not a good thing, as the database has a limit on concurrent connections. When values are not specified, Data Factory tries to find optimal settings.

There is a factor you have left out. Number of data partitions in your table(s). The Data Integration units and Degree of copy parallelism are portioned to each data partition. See below excerpts.

Copy from partition-option-enabled data stores (including Azure Database for PostgreSQL, Azure SQL Database, Azure SQL Managed Instance, Azure Synapse Analytics, Oracle, Netezza, SQL Server, and Teradata): 2-256 when writing to a folder, and 2-4 when writing to one single file. Note per source data partition can use up to 4 DIUs.

When copying data into Azure SQL Database or Azure Cosmos DB, default parallel copy also depend on the sink tier (number of DTUs/RUs).
When copying data from partition-option-enabled data store (including Azure SQL Database, Azure SQL Managed Instance, Azure Synapse Analytics, Oracle, Amazon RDS for Oracle, Netezza, SAP HANA, SAP Open Hub, SAP Table, SQL Server, Amazon RDS for SQL Server and Teradata), default parallel copy is 4.

The Write batch size is something I have less experience with. It should be at least large enough to contain an entire row / record.

I recommend you take a look at the Copy activity monitoring to see where the bottlenecks may be happening.

Recommended articles:
copy-activity-performance
copy-activity-performance-features
copy-activity-performance-troubleshooting
Azure SQL specific performance troubleshooting

Share via

Azure Data Factory - sink and settings configuration in Copy data activity to copy 300 million rows from one SQL table to another

0 additional answers

Your answer