Why my Copy Data Activity is getting Low Throughput and Connection Failures in Azure Synapse Analytics ?

Question

Why my Copy Data Activity is getting Low Throughput and Connection Failures in Azure Synapse Analytics ?

Vedant Desai 656

Mar 17, 2025, 12:36 p.m.

Scenario:

Large volumes of data are being pulled from an on-prem SQL Server into Azure Data Lake using an Azure Synapse pipeline.

The data is partitioned into 30 batches of 20 million records each.
The Copy Data activity loops through these partitions and writes them to Parquet format in ADLS.
Some partitions succeed, while others fail with different errors.

Errors Observed:

Transport-Level Error:

"A transport-level error has occurred when receiving results from the server. The connection was forcibly closed by the remote host."

SQL Connection Failure:

"A network-related error occurred while establishing a connection to SQL Server. Could not open a connection to SQL Server."

How does network bandwidth impact the Copy Data activity in Synapse?

Is throughput dependent on the on-premises network or Integration Runtime (IR) bandwidth?
If using ExpressRoute, are there any data transfer limitations in Synapse?

What are the best practices to improve throughput in Synapse pipelines?

Should batch size and parallelism be increased?
Would adjusting the Integration Runtime configuration help?
Should staging be enabled for better performance?
Are there Synapse-specific optimizations for handling large-scale data movement?

J N S S Kasyap 1,065 Reputation points Microsoft External Staff

Mar 24, 2025, 1:30 p.m.

@Vedant Desai

Just checking in to see if the below answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

1 answer

Your answer

J N S S Kasyap 1,065 Reputation points Microsoft External Staff

Mar 24, 2025, 1:30 p.m.

@Vedant Desai

Just checking in to see if the below answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Answer 1

Hi @vedant desai

Low throughput and connection failures in Azure Synapse Copy Data Activity when transferring large data volumes from an on-premises SQL Server can stem from various factors, such as network limitations, integration runtime constraints, and SQL Server performance bottlenecks.

How does network bandwidth impact the Copy Data activity in Synapse?

Network bandwidth directly influences the speed and reliability of data transfer between your on-premises SQL Server and Azure Data Lake Storage (ADLS). Insufficient bandwidth can lead to slow data movement and increase the likelihood of connection timeouts or failures. Ensuring adequate network capacity is essential for optimal performance.

Is throughput dependent on the on-premises network or Integration Runtime (IR) bandwidth?

Yes, throughput is influenced by both the on-premises network and the Integration Runtime (IR) bandwidth.

If using ExpressRoute, are there any data transfer limitations in Synapse?

Azure ExpressRoute has no strict data transfer limits, but your circuit's bandwidth (e.g., 1 Gbps, 10 Gbps) determines the maximum transfer speed. Higher bandwidth tiers enable faster data movement, improving Synapse performance. Refer to pricing details for information on bandwidth rates.

What are the best practices to improve throughput in Synapse pipelines?

Should batch size and parallelism be increased?

Optimizing batch sizes and increasing parallelism can enhance data movement performance. Smaller batch sizes can reduce memory consumption and improve performance, while larger batch sizes may increase throughput but require more memory. Increasing the degree of parallelism allows multiple data partitions to be processed simultaneously, leading to faster data movement.

Would adjusting the Integration Runtime configuration help?

Tuning the IR configuration can significantly impact performance. Allocating more Data Integration Units (DIUs) can increase throughput, with the allowed DIUs for a copy activity ranging between 2 and 256. Ensuring that the self-hosted IR has sufficient resources like CPU, memory, and network bandwidth is vital for efficient data movement.

Should staging be enabled for better performance?

Enabling staging can improve performance, especially when dealing with large datasets. When loading data into Azure Synapse Analytics, using PolyBase or the COPY statement can significantly enhance performance by allowing for batch loading in bulk instead of row-by-row insertion, drastically reducing load times.

Are there Synapse-specific optimizations for handling large-scale data movement?

For Azure SQL Database sources, enabling source partitioning can improve read times by allowing parallel connections on the source system. Specify the number of partitions and use a partition column with high cardinality.
Set up a cache for a better user experience and avoid returning millions of records to a dashboard. Use scheduled refreshes to prevent parallel query executions that can drain SQL serverless pool resources.

Please refer below Microsoft Docs:
https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-performance
https://learn.microsoft.com/en-us/azure/expressroute/expressroute-faqs
https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-performance#troubleshoot-copy-activity-performance

Hope this helps. Do let us know if you any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

J N S S Kasyap 1,065 Reputation points Microsoft External Staff

Mar 18, 2025, 4:28 p.m.

@vedant desai
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.
J N S S Kasyap 1,065 Reputation points Microsoft External Staff

Mar 19, 2025, 4:13 p.m.

@vedant desai
We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.
Vedant Desai 656 Reputation points

Mar 24, 2025, 1:05 p.m.

Hey @J N S S Kasyap Thank you for your detailed response. Your insights on network bandwidth, Integration Runtime optimization, and staging strategies were really helpful.

After reviewing your suggestions and analyzing our setup, I was able to identify the root cause of the issue. The bandwidth was too low due to insufficient Self-Hosted IR Configurations.

With this understanding, I now have a clear path forward to optimize our data movement process. I appreciate your help in guiding me toward the right troubleshooting approach!, Please add your comment as Answer so that i can accept it.

Thanks again for your support.

Share via

Why my Copy Data Activity is getting Low Throughput and Connection Failures in Azure Synapse Analytics ?

1 answer

Your answer