ADF copy activity waiting for source to return data

Question

ADF copy activity waiting for source to return data

Jacky 41

Hello,

When my pipeline for copy activity is running, it took very long for the query to return data. Can anyone help advise on how to improve the performance of the run?

Thank you

User's image

Answer accepted by question author

0 additional answers

Your answer

Answer 1

Hello Jacky,

Welcome to the MS Q&A platform.

Several factors can impact the performance of your copy activity pipeline, such as the size of your data, network bandwidth, and the resources of your self-hosted integration runtime.

Since you are using a cloud-based data source, using Azure IR in the same or close to your source data source region is recommended.

Here are a few other things you can consider to improve the copy activity performance.

Check the performance of your SHIR: Make sure that the machine running the SHIR has enough resources, such as CPU and memory, to handle the workload and ensure the SHIR is installed on a machine that is close to the source and sink data stores to minimize network latency.
Optimize your source database: You can improve the performance of your query by optimizing your source database. This may include creating indexes on the tables you are querying, tuning the query to avoid unnecessary joins or subqueries, and using the appropriate data types.
If the data size you want to copy is large, you can adjust your business logic to partition the data further using the slicing mechanism in Data Factory. Then, schedule Copy Activity to run more frequently to reduce the data size for each Copy Activity run
Check network bandwidth: Ensure that your bandwidth is sufficient to handle the data you are copying.
Parallel copy: You can set parallel copy (parallelCopies property in the JSON definition of the Copy activity, or Degree of parallelism setting in the Settings tab of the Copy activity properties in the user interface) on copy activity to indicate the parallelism that you want the copy activity to use. You can think of this property as the maximum number of threads within the copy activity that read from your source or write to your sink data stores in parallel.
Use staging in the destination Azure Data Lake Storage Gen2 (ADLS Gen2) to store data temporarily before loading it into the final destination.
Use binary format: If you copy large amounts of data, consider using a binary format such as ORC or Parquet. These formats can compress data and reduce the amount of data transferred during the pipeline run

Additionally, you can establish a baseline, test against representative data samples, and monitor copy activity performance to tune the performance further.

This document has Performance tuning tips and troubleshooting copy activity performance issues.

Share via

ADF copy activity waiting for source to return data

0 additional answers

Your answer