Share via

Why my Copy Data Activity is getting Low Throughput and Connection Failures in Azure Synapse Analytics ?

Vedant Desai 656 Reputation points
Mar 17, 2025, 12:36 p.m.

Scenario:

Large volumes of data are being pulled from an on-prem SQL Server into Azure Data Lake using an Azure Synapse pipeline.

  • The data is partitioned into 30 batches of 20 million records each.
  • The Copy Data activity loops through these partitions and writes them to Parquet format in ADLS.
  • Some partitions succeed, while others fail with different errors.

Errors Observed:

Transport-Level Error:

  • "A transport-level error has occurred when receiving results from the server. The connection was forcibly closed by the remote host."

SQL Connection Failure:

  • "A network-related error occurred while establishing a connection to SQL Server. Could not open a connection to SQL Server."

How does network bandwidth impact the Copy Data activity in Synapse?

  • Is throughput dependent on the on-premises network or Integration Runtime (IR) bandwidth?
  • If using ExpressRoute, are there any data transfer limitations in Synapse?

What are the best practices to improve throughput in Synapse pipelines?

  • Should batch size and parallelism be increased?
  • Would adjusting the Integration Runtime configuration help?
  • Should staging be enabled for better performance?
  • Are there Synapse-specific optimizations for handling large-scale data movement?
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,263 questions
{count} votes

1 answer

Sort by: Most helpful
  1. J N S S Kasyap 1,065 Reputation points Microsoft External Staff
    Mar 17, 2025, 5:52 p.m.

    Hi @vedant desai

    Low throughput and connection failures in Azure Synapse Copy Data Activity when transferring large data volumes from an on-premises SQL Server can stem from various factors, such as network limitations, integration runtime constraints, and SQL Server performance bottlenecks.

    How does network bandwidth impact the Copy Data activity in Synapse?

    Network bandwidth directly influences the speed and reliability of data transfer between your on-premises SQL Server and Azure Data Lake Storage (ADLS). Insufficient bandwidth can lead to slow data movement and increase the likelihood of connection timeouts or failures. Ensuring adequate network capacity is essential for optimal performance.

    Is throughput dependent on the on-premises network or Integration Runtime (IR) bandwidth?

    Yes, throughput is influenced by both the on-premises network and the Integration Runtime (IR) bandwidth.

    If using ExpressRoute, are there any data transfer limitations in Synapse?

    Azure ExpressRoute has no strict data transfer limits, but your circuit's bandwidth (e.g., 1 Gbps, 10 Gbps) determines the maximum transfer speed. Higher bandwidth tiers enable faster data movement, improving Synapse performance. Refer to pricing details for information on bandwidth rates.

    What are the best practices to improve throughput in Synapse pipelines?

    Should batch size and parallelism be increased?

     Optimizing batch sizes and increasing parallelism can enhance data movement performance. Smaller batch sizes can reduce memory consumption and improve performance, while larger batch sizes may increase throughput but require more memory. Increasing the degree of parallelism allows multiple data partitions to be processed simultaneously, leading to faster data movement.

    Would adjusting the Integration Runtime configuration help?

    Tuning the IR configuration can significantly impact performance. Allocating more Data Integration Units (DIUs) can increase throughput, with the allowed DIUs for a copy activity ranging between 2 and 256. Ensuring that the self-hosted IR has sufficient resources like CPU, memory, and network bandwidth is vital for efficient data movement.

    Should staging be enabled for better performance?

    Enabling staging can improve performance, especially when dealing with large datasets. When loading data into Azure Synapse Analytics, using PolyBase or the COPY statement can significantly enhance performance by allowing for batch loading in bulk instead of row-by-row insertion, drastically reducing load times.

    Are there Synapse-specific optimizations for handling large-scale data movement?

    For Azure SQL Database sources, enabling source partitioning can improve read times by allowing parallel connections on the source system. Specify the number of partitions and use a partition column with high cardinality.
    Set up a cache for a better user experience and avoid returning millions of records to a dashboard. Use scheduled refreshes to prevent parallel query executions that can drain SQL serverless pool resources.

    Please refer below Microsoft Docs:
    https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-performance
    https://learn.microsoft.com/en-us/azure/expressroute/expressroute-faqs
    https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-performance#troubleshoot-copy-activity-performance

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    1 person found this answer helpful.

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.