How to process more than 5000 records in Lookup Activity?

Prashanth T 26 Reputation points Microsoft Employee
2020-07-02T07:05:28.473+00:00

My scenario is--> look up a table from database and post in service now one record at a time. So, I am using "lookup activity" to fetch all records and using "for each activity" to pass one record. Inside "for each" there is "web activity" to POST in service now. Now I want to run this pipeline end of the month. But my lookup table may have more than 5000 records. In docs "Lookup activity" having limitation of process 5000 records or 2MB data at a time. How do I process more than 5000 or 2MB?

--My lookup table having json data with one column

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,487 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Courtney Haedke 151 Reputation points
    2020-11-10T20:10:30.8+00:00

    Yes the instructions weren't too clear.
    I ended up not needing to use a pipeline in the while loop for this too.
    Perquisites:
    Create the three variables:
    Iterations - set to 0
    index - set to 0
    count -set to 0
    Instructions:

    1. I set the iterations variable outside of the while loop ( I determine the amounts of iterations I need by a lookup task):
    2. Set up an until loop, I use the following expression
      @equals(variables('count'),variables('iterations'))
    3. In this I didn't need to use a for each loop any more and created a look up in the until loop using the count variable as my offset. In your case you would probably input the pipeline that uses the foreach loop 5000 + records. In your foreach you loop pipeline you could create a count parameter and pass in the count variable value from the while loop pipeline. Then use the count parameter in your offset.
      I plan to fetch 100 rows at a time so I multiply the 100 by the count variable(in your case count parameter). In your case you could probably do 5000 times the count variable
      ORDER BY fieldname ASC OFFSET ',string(mul(100,int(variables('count')))),' ','ROWS FETCH NEXT 100 ROWS ONLY)a')
    4. After activities in the while loop I set the index variable and increment it by 1
      @ hide (add(int(variables('count')),1))
    5. Then I set the count variable the index variable (this is in the while loop)
      I tried adding my screen shots but Microsoft is saying they're all too big for some reason.
      Hope this helps!
    3 people found this answer helpful.
    0 comments No comments

  2. ChiragMishra-MSFT 951 Reputation points
    2020-07-06T06:23:13.987+00:00

    Hi @PrashanthT-2849,

    The limitation of 5000 records for a Lookup activity is by design and there's no in-house way to get past this limitation. In your case, you can implement a workaround as follows :

    Create a new pipeline with 2 integer variables: iterations and count with 0 as defaults.

    First determine the needed number of iterations. Do a lookup to determine the total number of datasets. In your query divide this by 5000, add one and round it upwards. Set the value of the iterations variable to this value using the set variable activity.

    Next, add a while loop with expression something like @less(variables('count'),variables('iterations')). in this while loop call your current pipeline and pass the count variable as a parameter. After the execute pipeline activity, set the count variable to +1.

    In your current pipeline you can use the limit/offset clause in combination with the passed parameter in a SQL query to get the first 0-5000 results for your first iteration, 5000-10000 for your second iteration etc.

    Ref - https://stackoverflow.com/a/55856185/10653466

    Hope this helps.