Running a ForEach activity in parallel mode

pmscorca 792 Reputation points
2024-03-15T16:03:21.3666667+00:00

Hi,

I've implemented a pipeline having a Lookup activity to read a Synapse table to get the parameters to pass a stored procedure inside the next ForEach activity.

Inside the ForEach activity I've use some Set variable activities to get the parameter values for the procedure (using item().myparam).

Now, could I execute the ForEach activity in parallel mode successfully without a variable was set with the value referred to another iteration item?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,546 questions
{count} votes

Accepted answer
  1. Amira Bedhiafi 15,216 Reputation points
    2024-03-15T17:48:47.66+00:00

    The ForEach activity in ADF allows you to iterate over a collection and perform certain actions for each item in the collection. When you enable parallel execution, the ForEach activity can process multiple items in the collection concurrently, which can significantly improve the performance of your pipeline, especially when dealing with large collections.

    1. Isolation of Iterations: When you run iterations in parallel, each iteration is isolated from the others. This means that the setting of a variable within one iteration will not affect the value of the variable in another iteration if the variable is defined within the scope of the ForEach loop. However, if you are using global variables (variables defined outside the ForEach activity), you need to be careful as concurrent modifications to these variables can lead to unexpected behaviors.

    To avoid conflicts with variables when running in parallel, ensure that you either:

    • Use local variables within the scope of each iteration, or
    • If you must use global variables, manage access to these variables carefully, for instance, by using them in a read-only manner within the ForEach or by employing mechanisms to ensure that only one iteration can modify the variable at a time, though this latter approach can negate some benefits of parallel processing.

    In the settings of the ForEach activity, you can specify the "Batch Count," which determines how many iterations can run in parallel. This allows you to control the degree of parallelism according to the capabilities of your environment and the requirements of your process.

    Since you are using item().myparam to access the parameters for each iteration, each iteration will inherently work with its specific set of data fetched from the Lookup activity.


1 additional answer

Sort by: Most helpful
  1. AnnuKumari-MSFT 30,751 Reputation points Microsoft Employee
    2024-03-20T09:46:50.63+00:00

    Hi pmscorca ,

    I understand your query is if we can execute a ForEach activity in parallel mode without a variable being set with the value referred to another iteration item. Please let me know if that is not your ask.

    Variables are scoped at the pipeline level. This means that they're not thread safe and can cause unexpected and undesired behavior if they're accessed from within a parallel iteration activity such as a ForEach loop, especially when the value is also being modified within that foreach activity.

    Try using sequential option if you are using variables. However, parameters are much more safe for parallel execution .

    Kindly checkout the documentation here: https://learn.microsoft.com/en-us/azure/data-factory/control-flow-set-variable-activity

    Hope it helps. Kindly accept the answer by clicking on Accept answer button. Thankyou