Self hosted Integration run time performance for copy activity on premise files

Philip Mabon 21 Reputation points
2021-10-06T14:46:50.747+00:00

We want to use a copy activity where are source is multiple large file on premise (10GB+) to sink destination that is on the same on premise file system but a different location.

Is the Self hosted integration run time smart enough to avoid transferring file to the ADF and then back down?
Want to avoid Ingress cost and have great transfer speed.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,518 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. MartinJaffer-MSFT 26,021 Reputation points
    2021-10-07T20:19:00.62+00:00

    Hello @Philip Mabon and welcome to Microsoft Q&A.

    I think the SHIR should be smart enough to avoid the transfer, given you select it for that dataset pair. To be sure I propose the following test. Tell me whether you think this test is good enough.

    I will install SHIR on my computer, and create 2 datasets for different locations on my computer file system. I make a file of a significant, known size.
    I will schedule a run, then turn off all my other applications and watch the traffic via the Task Manager or similar tool.
    I expect a small bit of traffic to fetch instructions for the task. The traffic should be much less than the file size.

    If the traffic is less than file size, and copy succeeds, then SHIR is smart enough not to upload / download.
    If traffic is close to file size, and copy succeeds, then SHIR is not smart enough.

    Sound good?

    Update: I have confirmed the outbound network b/s is less than the disk write b/s to the best of my ability.

    1 person found this answer helpful.
    0 comments No comments