Getting ADF Foreach inner acitivity all iteration output outside loop

Shobhit Maheshwari 1 Reputation point
2022-07-01T11:18:55.817+00:00

Hi Team,

I hope you are having a great week!

My question is about Azure Data Factory For Each Activity inner activity output.

I have web Activity inside the foreach activity, which calls API GET to data one my one form 3500 ids. So the inner activity in the loop will run 3500 times.
How I can get the output of all the inner activity iterations at one go outside for each activity and later process it in one go.
Right now, when I try to do that, I am getting a strange error like the below whenever I try to hit debug or trigger the pipeline.

You can find my setup below in screenshots.

Error  
Notifications  
{  
"code": "BadRequest",  
"message": null,  
"target": "pipeline//runid/a046723c-bd69-4dc9-8518-94f281fcd312",  
"details": null,  
"error": null  
}  

216857-screenshot-2022-07-01-at-65824-pm.png

216951-screenshot-2022-07-01-at-70627-pm.png

216971-screenshot-2022-07-01-at-70601-pm.png

The idea is to get the output of all iterations of inner activity outside for each so that we can save on running upload activity one by one inside loop activity or appending variables 3000 times which will cost me 3500 +3500 =7000, which is $7 if I can extract output at one go it will 3500+1 which will just around 3.5$ (50% of saving).

We are doing a lot of activities like this. If we could implement the above pattern, it would be a lot of cost-saving. Also, copy activity is not an option because it is way costly to do things one by one.

So in my case.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,191 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,252 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. skumarrana 321 Reputation points
    2022-07-01T16:13:20.763+00:00

    @Shobhit Maheshwari

    You are getting that pipeline error on hitting debug because, you are trying to access the output (@activity('Get Project Data').output[0].resourceIdentifier) outside the for each loop as part of a set variable activity. You will need to move that set variable activity inside of the for each loop for it to be able to access it.

    The output of inner activity inside the for each loop has to be persisted somewhere for every iteration (either write the output to a variable and keep appending or persist to some type of storage). This is the only way you can access all of that output in one go outside the for each loop.

    Some other ways I would try:

    1. Use an external call transformation in ADF mapping data flow. Send the 3500 Id's in row by row manner. It will spit out output for all 3500 Ids in row by row manner.
    2. Check if the API end point can accept a batch of Id's and provide you the output for all those batch of Id's at the same time.

    Hope it helps!

    0 comments No comments

  2. Shobhit Maheshwari 1 Reputation point
    2022-07-06T02:29:48.283+00:00

    Hi @skumarrana ,

    Thanks for your detailed response.

    However, I am thinking of below few things .

    1. Adding append activity is not an option for me as it will add another 3500 Activity run which will add an additional $3.5 to my pipeline run, and this cost I want to avoid ,(however I am doing this now as its way cheaper than copy activity 3500 run which is costing me around 20$ but with 1 Web and 1 Storage Account upload using web activity just cost me $7)
    2. I tried with data flow external call but I am facing the issue mentioned in this StackOverflow question https://stackoverflow.com/questions/71595275/in-adf-i-am-not-able-to-define-the-response-body-definition-in-rest-call-ext which when I am trying to do import projection it's failing with217972-image.png

    Do let me know your thoughts


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.