ADF transfer rate is low

Yang Chowmun 411 Reputation points
2021-10-01T02:35:33.52+00:00

I am getting the data from google big query using self hosted IR and save it into azure MySQL database using Azure Manged IR with VNET integration. Currently the transfer rate is slow. It took about an hour to transfer 1G of data.

I have read the official website about tuning the performance by adjusting the DIU and parallel copies. I have try to increase these parameters and the throughput does not really change much.

I have checked on azure MySQL database, the memory consumption is about 50% for 2vcore. I did try to increase to 4vcore and the memory consumption is reduced to about 25%. So I presume that it should not be the bottleneck for the process.

How should I identify the bottleneck of the process and improve the throughput? 136814-untitled.png
Any advice would be greatly appreciated!

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,521 questions
{count} votes

Accepted answer
  1. ShaikMaheer-MSFT 38,441 Reputation points Microsoft Employee
    2021-10-04T05:17:34.153+00:00

    Hi @Yang Chowmun ,

    Thank you for posting query on Microsoft Q&A Platform.

    Data movement throughput can be depends on many factors. Such as,

    • Network bandwidth between source and destination data stores.
    • Source or destination data store input/output operations per second (IOPS) and bandwidth
    • In case of Azure IR, what DIUs defined.
    • In case of self-hosted IR, Machine capacity and no. of nodes using.
    • How we are copying data? Either using single copy activity or using multiple copy activities by Partitioning data. etc..

    Below are few recommendations which you can try to increase data movement throughput.

    • When using Azure integration runtime (IR), you can specify up to 256 data integration units (DIUs) for each copy activity, in a server less manner.
    • When using self-hosted IR, you can take either of the following approaches:
      * Manually scale up the machine.
      * Scale out to multiple machines (up to 4 nodes), and a single copy activity will partition its file set across all nodes.
    • Including ForEach to partition and spawn off multiple concurrent copy activities.

    In case of Self-hosted IR, recommendation is to use a dedicated machine to host IR. The machine should be separate from the server hosting the data store. Start with default values for parallel copy setting and using a single node for the self-hosted IR.

    Please check below documentation, where many recommendation's listed for copy activity performance increase.
    https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-performance

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Poreddy Sandya 0 Reputation points
    2024-03-04T08:42:29.6266667+00:00

    azure data factory

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.