Share via

Is integration runtime shared across dataflow runs?

Pawel Minkina 20 Reputation points
2024-11-05T11:02:07.35+00:00

Hi everyone,

We have started to use Azure Data Factory recently and we are trying to find the best way to utilize Integration Runtimes.
We are using Azure Public IR without managed VN, but we are defining region on our own for performance purposes (North Europe, so it's not auto resolve). We are also setting Compute size with custom setting, like 64 (+ 16 driver Cores). We want to utilize our Integration runtime for which we are paying in the most optimal way possible.

We also have few pipelines running in one timer trigger. Each of the pipelines is executing heavy transformation operations. We know that each pipeline has vCore-hour for Data flow associated with it, we for simplification we can call cost of pipeline run.

Please help me understand few things.

  1. When we use the same integration runtime for all our pipelines are the resources of it shared, i.e. when 4 pipelines are running on 4 core machine is each getting roughly 25% of available machine, or does each pipeline get own 4 core machine? Please consider that we are not using auto-resolve but one region defined integration runtime. Question we derive from that is, if pipeline running alone on the machine takes 10 minutes, can it be slower because other pipelines are also using this machine?
  2. If 1st point is true, so 1 machine is shared through dataflow, are we going to use the same amount of vCore-hour for pipelines running together as if we were to run them one by one?
    Let's imagine each of the pipeline runs take 10 vCore-hour. We have 3 pipelines so that gives us 30 vCore-hour. Is it 30 vCore-hour no matter whether we run them on 1 integration runtime at the same time or 1 integration runtime, but only one after the other. In other words question is, are we going to pay the same amount of money for running 3 pipelines together at one shared machine, compared to running 3 pipelines one after the other is finished, but also on the same machine?

I'm aware of TTL for integration runtime including startup time, but for this scenario let's remove it from the equation and only focus on dataflow cost with vCore-hour.

If you have any questions about our case, please let me know.

Azure Data Factory
Azure Data Factory

An Azure service for ingesting, preparing, and transforming data at scale.

0 comments No comments

Answer accepted by question author

Amira Bedhiafi 42,941 Reputation points MVP Volunteer Moderator
2024-11-05T18:15:54.5266667+00:00

Is integration runtime shared across dataflow runs?

In ADF, an IR can indeed be used across multiple dataflows or pipelines, but its resources may be allocated differently based on whether the IR is auto-resolved or set to a fixed region, as in your case with North Europe. When using a fixed-region IR, resources are allocated within that region and shared among dataflows or pipelines, which can impact performance if multiple heavy data transformations run simultaneously.

Does each pipeline receive an equal share of resources when running concurrently?

If you are running multiple pipelines on a single IR, the resources (like CPU cores and memory) are effectively shared. For example, if your IR is configured with 4 cores and 4 pipelines are running simultaneously, they may each receive a portion of those resources, resulting in a reduced share for each. This means each pipeline might experience a performance hit, as available cores are distributed across them, potentially causing the process time to increase if multiple pipelines compete for resources. Unlike dedicated clusters, IR resources are dynamically allocated, which can lead to varying performance outcomes based on concurrent usage.

Does running multiple pipelines on the same IR impact vCore-hour billing?

Your concern about vCore-hour costs when running pipelines concurrently or sequentially is valid. Generally, vCore-hour billing applies to the amount of computational power consumed, irrespective of whether pipelines run concurrently or sequentially. In other words, if you have 3 pipelines each consuming 10 vCore-hours, you will still incur a total cost of 30 vCore-hours, regardless of whether they run simultaneously on a shared IR or one after another. The vCore-hour metric is based on the actual processing power used, so there would be no cost savings from adjusting the pipeline run order on a single IR.

Would running pipelines sequentially or in parallel on the same IR influence costs if TTL settings were disregarded?

If we set aside TTL considerations, the cost remains the same in both scenarios, as ADF charges are based on vCore-hour usage tied to the processing needs of each dataflow or pipeline. Running them sequentially might result in longer processing times but wouldn’t impact the total vCore-hours billed. However, TTL configurations could influence costs slightly, as they determine when IRs spin up or down, affecting minor additional costs, but this factor is disregarded in this scenario as per your question.

Was this answer helpful?

1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.