Azure Datafactory Cost and Optimization

Question

Azure Datafactory Cost and Optimization

Sudhakar P 165

Hello,

We have an Azure Data Factory (ADF) pipeline that runs daily at 7 AM and performs the following steps:

Ingests data from an on-premises SQL Server into the Landing Zone (ADLS Gen2).
Uses a Lookup activity to fetch active tables (based on a watermark table). Some tables are incremental, others are full load.
Triggers a Databricks notebook to load data to the Bronze layer.
Sequentially runs Databricks notebooks for Bronze to Silver, and then Silver to Gold (for both dimension and fact tables).

User's image

The total pipeline run time is about 2 hours and 30 minutes daily.

We want to explore parallelization of these notebook activities to reduce the overall duration. However, we’re also cautious about the cost impact especially considering ADF activity costs.

Could you please suggest:

Best practices for optimizing pipeline performance through parallel execution.

Guidelines to manage and monitor cost impact when running multiple notebook activities in parallel.

Any recommended architectural patterns or configurations for improving performance in ADF-Databricks integrated workflows?

Thank you in advance for your help!

Chandra Boorla 15,475 Reputation points Microsoft External Staff Moderator

2025-07-30T06:22:49.8033333+00:00

Sudhakar P

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

1 answer

Your answer

Chandra Boorla 15,475 Reputation points Microsoft External Staff Moderator

2025-07-30T06:22:49.8033333+00:00

Sudhakar P

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Answer 1

Chandra Boorla 15,475 Microsoft External Staff Moderator

Hi Sudhakar P

Thanks for the detailed overview of your Azure Data Factory (ADF) and Databricks pipeline. It's great that you're looking to optimize both performance and cost. Based on your current setup and the screenshot you shared, here are some suggestions:

Performance optimization with Parallel Execution

You are already on the right track using a ForEach loop with Sequential unchecked, which enables parallel execution.
To avoid overwhelming the Databricks cluster or exceeding ADF concurrency limits, I recommend setting a controlled Batch count - for example, start with 5 or 10 depending on the cluster size and table workload.
You can dynamically process tables in parallel by grouping them logically (e.g., dimensions vs. facts, or Bronze vs. Silver vs. Gold layers) and creating separate ForEach loops for each group.

Cost management tips

Each Databricks Notebook activity in ADF incurs compute and activity cost. Running too many in parallel can spike costs.
Use ADF's cost monitoring (via Azure Cost Management or Log Analytics) to track how activity parallelism impacts daily spend.
To optimize Databricks usage:

Reuse existing clusters with auto-scaling and auto-termination enabled.
Avoid spinning up separate clusters per notebook if not required.
Use Job Clusters only when isolation is needed.

Architecture & Design best practices

Consider breaking the pipeline into modular stages: e.g., one pipeline for Bronze load, one for Silver, and one for Gold. Trigger each stage conditionally or sequentially.
Use pipeline parameters and metadata-driven design so that logic remains dynamic and maintainable.
For high-volume tables, isolate them into their own ForEach loop or child pipeline to prevent bottlenecks.

I hope this information helps. Please do let us know if you have any further queries.

Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.

Thank you.

Sudhakar P 165 Reputation points

2025-07-28T05:20:25.0366667+00:00
Hi @Chandra Boorla ,

I'm working with Azure Data Factory, where a ForEach loop is used to process around 36 tables. The batch count was initially set to 10 to allow parallel execution. While it worked during manual runs, I started encountering the following error during scheduled (automated) daily runs:

Transaction (Process ID xxx) was deadlocked on lock resources with another process and has been chosen as the deadlock victim.

To troubleshoot, I reduced the batch count to 5, and the pipeline succeeded once. However, the issue reoccurred on the next scheduled run. I then set the batch count to 4, and so far, the pipeline is running without errors.

My questions are:

Why does this deadlock issue happen mainly during scheduled runs and not consistently across all executions?

Does reducing the batch count to 4 actually resolve the problem, or is it just suppressing the underlying issue?

Any insights or recommendations to ensure stability and avoid deadlocks in such cases would be greatly appreciated.

Thanks in advance.
Chandra Boorla 15,475 Reputation points Microsoft External Staff Moderator

2025-07-29T01:35:07.28+00:00
Sudhakar P

Thanks for sharing the details, this is a common challenge when using parallel processing in ADF with SQL Server or other relational systems.

Why deadlocks happen mostly during scheduled runs:

Timing & System load - Scheduled runs often coincide with other automated jobs or peak usage hours, increasing resource contention on the database. Manual runs tend to happen during off-peak times, so contention is lower.

Parallelism pressure - With a higher batch count (e.g., 10), multiple parallel queries might be trying to access or modify overlapping resources (tables, indexes, metadata), increasing the likelihood of deadlocks.

Does reducing batch count to 4 solve It?

Reducing the batch count lowers the risk by reducing concurrency, but it doesn't eliminate the root cause.

It's a temporary workaround that improves stability but may increase total run time.

If the same database logic is involved (e.g., accessing control/watermark tables), the underlying lock conflicts could still occur under load.

Recommendations to ensure long-term stability:

Review the SQL Logic:

Ensure minimal use of shared resources (e.g., control tables, logging tables).

Keep transactions short and isolated.

Avoid unnecessary locking hints (like TABLOCK).

Analyze deadlock details:

Enable deadlock trace flags in SQL Server (1222 or 1204) or use Extended Events to capture deadlock graphs.

Identify which operations/tables are conflicting.

Group and stagger tables:

Separate high-volume or critical tables into smaller groups.

Process high-contention tables sequentially while allowing parallelism for low-risk ones.

Implement retry logic:

In ADF, configure retry policies on activities that interact with the database to gracefully handle transient deadlocks.

Adjust pipeline schedule:

If possible, shift scheduled runs to off-peak hours to reduce competition with other workloads.

I hope this information helps.
Chandra Boorla 15,475 Reputation points Microsoft External Staff Moderator

2025-07-31T04:52:31.3833333+00:00

Sudhakar P

Just checking in to see if the above suggestion helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

As your feedback is valuable and can assist others in the community facing similar issues.
Sudhakar P 165 Reputation points

2025-08-04T04:41:31.55+00:00

Hi @chandra boorla,

The Pipeline was failed for past 3 days, but today the pipeline was succeeded. I don't know why. the batch s in 4 count. The error msg (Operation on target itr-lookup-list-wfx-base-tables failed: Activity failed because an inner activity failed; Inner activity name: sp-update-water-mark-table, Error: Transaction (Process ID 63) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.)
Chandra Boorla 15,475 Reputation points Microsoft External Staff Moderator

2025-08-04T09:52:17.32+00:00
Hi Sudhakar P

Thanks for sharing the recent pipeline run history and the deadlock error details.

Observation:

The pipeline failed on August 1st, 2nd, and 3rd during the scheduled runs (triggered by tgr_Incremental_WFX), all due to a deadlock in the stored procedure sp-update-water-mark-table. Interestingly, the pipeline succeeded today (Aug 4th), even though the Batch Count was still set to 4.

Why it succeeded today:

Deadlocks are often intermittent and timing-based, they depend on:

The timing and overlap of parallel threads trying to access shared resources (e.g., control or watermark tables).

The overall database load at the time of execution.

Today’s run may have succeeded simply because there was less contention on the database, or the conflicting process didn’t run at the same time.

Recommendations to avoid future failures:

Add Retry Logic in ADF:

Configure the sp-update-water-mark-table activity to automatically retry (e.g., 3 times with a delay). This helps recover from transient deadlocks.

Review and Optimize the Stored Procedure:

Keep transactions short and isolated.

Avoid updating shared rows across parallel iterations.

Use SET DEADLOCK_PRIORITY LOW if applicable.

Stagger or Control Parallelism:

If the procedure updates a shared resource, consider running that step sequentially or move it outside the ForEach loop.

Check Other DB Activity:

Investigate if other scheduled jobs or processes are overlapping with the pipeline between 7:30–8:30 AM.

I hope this information helps.

Share via

Azure Datafactory Cost and Optimization

1 answer

Your answer