How to parameterize pipeline concurrency (number of parallel runs) in Azure Data Factory

Azurin4848484 0 Reputation points
2024-04-25T11:49:35.2833333+00:00

I have two pipelines:

parent pipeline to read configuration file (lookup activity), filter it with filter activity and pass the resulting array (e.g. 50 items) to ForEach activity to run child pipeline to perform per-item work in a predefined number of parallel runs (e.g. 3 at once).

The child pipeline is executing its tasks using SQL servers,

and different servers have different number of session allowed to run in parallel for this solution, e.g.

  • 1 session for SQLsrv1
  • 3 sessions for SQLsrv2
  • 5 sessions for SQLsrv3

so I need to somehow parameterize that number of parallel runs for the child pipeline, but:

  • It could be done using child pipeline's concurrency parameter if I had only 1 SQL server,

but what to do with many if pipeline's concurrency parameter does not accept dynamic content?

  • ForEach activity does not allow to parameterize its batch count either

Should I opt to:

  • creating 3rd pipeline for offloading just ForEach activity to it,
  • then clone this 3rd pipeline to have e.g. 5 pipelines with different number of parallel runs in ForEach activity (1...5)
  • adding switch activity to parent pipeline to have e.g. 5 branches for 1...5 parallels and add ExecutePipeline activity to trigger one of those '3rd' pipelines clones?

Is there a better/simplier solution?

Thanks!

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,600 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 15,676 Reputation points
    2024-04-25T20:32:26.5033333+00:00

    ADF currently lacks direct support for dynamically setting the concurrency parameter of the ForEach activity based on external factors such as SQL server limits.

    Instead of having multiple pipelines or a complex branching strategy, you could use a combination of Lookup and If Condition activities to determine which concurrency setting to apply:

    1. Lookup Activity: to read the configuration for the SQL server (which might include the maximum number of sessions allowed).
    2. Set Variable Activity: Use the result from the Lookup to set a variable that holds the maximum concurrency level for the current execution context.
    3. If Condition or Switch Activity: Use an If Condition or Switch activity to check the variable and determine the concurrency parameter to apply to the ForEach activity.
    4. ForEach Activities: Have separate ForEach activities configured with fixed concurrency parameters (1, 3, 5,...) and execute the one that matches your requirements based on the server's session limit.

    Another approach is to use an Azure Function or an API call from ADF.

    As you suggested, creating dedicated pipelines for each type of server could work but might lead to maintenance challenges (each with a fixed concurrency setting that matches the server's limits) and then use a Switch activity in the main pipeline to decide which dedicated pipeline to trigger based on the server type or configuration.

    https://medium.com/@shuklaprashant9264/azure-data-factory-concurrency-and-dependencies-2f7c76f977f7

    0 comments No comments