How to use variable for delta loading in foreach pipeline

zmsoft 575 Reputation points
2025-12-18T03:12:50.38+00:00

Hi there,

I currently have an incremental loading pipeline that retrieves data incrementally using the baserequesid.

I implemented the incremental loading logic for each table during the foreach activity.

I use the lookup activity to first find the largest requestid in the target table, then record this value through the variable activity, and then query the incremental data and copy it using this variable.

Now I have found a problem. When multiple tables are incrementally loaded simultaneously, the values recorded by this variable become chaotic and cannot precisely correspond to each individual table.

When I set the foreach to be sequential, no problems occurred. However, when using parallel mode, there was a problem where the variables kept being replaced. Could anyone please share some good solutions?

Thanks & Regards,

zmsoft

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
0 comments No comments
{count} votes

Answer accepted by question author
  1. VRISHABHANATH PATIL 2,635 Reputation points Microsoft External Staff Moderator
    2025-12-18T05:20:17.8333333+00:00

    Hi @zmsoft

    Thank you for reaching out on Microsoft Q&A. We’ve reviewed your question, and here are a few practical steps that may help you address the issue:

    Avoid pipeline variables inside parallel ForEach

    Instead of variables, use:

    Option A — Use item() properties directly

    Pass values through the ForEach items array rather than relying on a shared variable.

    Often the best design is to embed the needed metadata directly in the items source being iterated.

    Use an activity-scoped output instead of a variable

    ADF supports referencing outputs of activities per iteration, which are iteration-scoped.

    Example: @activity('LookupMaxReqId').output.firstRow.requestid

    Each iteration runs its own Lookup, so this avoids global state.

    Use a child pipeline

    Create a child pipeline that:

    • Accepts parameters (like requestId, tableName)
    • Processes one table at a time Then call it from the parent pipeline’s ForEach.

    This gives clean per-iteration isolation.

    Use sequential ForEach when correctness > performance

    If the number of tables is small or volume is low, the simplest fix is to force:

    ForEach -> Sequential = true

    You sacrifice speed but ensure correctness.

    Store iteration-specific values in a data structure

    Instead of pipeline variables:

    • Use an array variable that stores an array of objects
    • Append values per iteration (this still risks concurrency unless ForEach is sequential)

    So this works only if sequential mode is acceptable.

    Most Healthy Method

    Use a child pipeline with parameters This is the recommended enterprise-grade approach because isolation is guaranteed and parallelism is maintained without shared state conflicts.

    This reinforces that parallel loops + shared variables = race conditions across Azure workflow systems.

    Conclusion

    The issue is not a bug — it is a designed behavior: ADF pipeline variables are global, not per-iteration.

    To fix the problem while keeping parallel execution, you must eliminate pipeline variables from the loop and rely instead on iteration-specific outputs, parameters, or child pipelines.

    Reference - https://learn.microsoft.com/en-us/azure/data-factory/control-flow-for-each-activity 

     

    You found this answer helpful.
    0 comments No comments

Answer accepted by question author
  1. Vinodh247 40,141 Reputation points MVP Volunteer Moderator
    2025-12-18T04:19:58.7066667+00:00

    Hi ,

    Thanks for reaching out to Microsoft Q&A.

    Pipeline variables are global. They are not safe in parallel ForEach.

    What can you do instead?

    Do not use variables Use the Lookup output directly in Copy: @activity('LookupMaxId').output.firstRow.max_id

    Use ForEach item scope Pass { tableName, maxId } as the ForEach item and read @item().maxId

    Use child pipeline + parameters Execute Pipeline per table. Parameters are isolated and parallel-safe.

    Fact: Parallel ForEach + variables = race condition. Sequential works only because there is no overlap.

    Please 'Upvote'(Thumbs-up) and 'Accept' as answer if the reply was helpful. This will be benefitting other community members who face the same issue.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.