How to use startAt parameter and breakup into multiple queris to call JIRA rest api (JIRA Cloud) in adf

Amar Agnihotri 926 Reputation points
2022-09-27T05:43:15.06+00:00

Hi,
I am calling JIRA api and it is returning this result
245025-image.png

We can see that the total results are 20000 but it is returning only 100 results in a single page . I searched a lot and came to know that this is the restriction of JIRA RESt api and the max records in one call will be 100 only.
I thought of using startAT parameter and then break it into multiple queries as

Query 1

https://[yoursitename].atlassian.net/rest/api/2/search?jql=ORDER%20BY%20Created&maxResults=100&startAt=0

Query 2

https://[yoursitename].atlassian.net/rest/api/2/search?jql=ORDER%20BY%20Created&maxResults=100&startAt=100

Query 3

https://[yoursitename].atlassian.net/rest/api/2/search?jql=ORDER%20BY%20Created&maxResults=100&startAt=200

and so on.

I want to implement the above logic in adf to get all the result set by different iterations based on startAT value and then combining all the result sets into a single json file but i am confused about the work flow . Can anybody suggest how can we implement the above logic inside adf ? Which activities do we really need and how to pass the value to startAT dynamically.

Thanks

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,625 questions
0 comments No comments
{count} votes

Accepted answer
  1. MartinJaffer-MSFT 26,236 Reputation points
    2022-09-28T04:33:54.153+00:00

    Hello @Amar Agnihotri and welcome back to Microsoft Q&A.

    While you could get the data using the Copy activity REST pagination feature, it looks like you want to split into multiple copy activities to do things faster because of small page size and large result set. Is this correct?

    Do note that splitting up into multiple copy activities also means making many small output files because ADF doesn't do append type writes to files. To end up with a single file, have another copy activity afterwards to merge the files.

    245357-image.png
    245309-image.png

    So there are a few parts to doing this. I'm assuming you want to put the copy activity inside a for each loop and do it very parallel.
    The first step is to generate a set of items to iterate over. This would be a value for each page/call. For this we need to know how many results total there are. Use a web activity to get the total results like in your screenshot.

    245320-image.png

    Create a list/array type variable. Use Set Variable to create all the elements at once, using the range function.
    total / maxResults = number of pages
    if ( 0 < (total % maxResults) ) then we need to add another page for remainder
    Put together, this takes the form of:

    @range(0,  
        if(less(0,mod(int(variables('total')),pipeline().parameters.maxResults)),  
        add(1,div(int(variables('total')),pipeline().parameters.maxResults)),  
        div(int(variables('total')),pipeline().parameters.maxResults))  
    )  
    

    This would output an array of integers [0,1,2,3 ... 202, 203, 204]
    This is the startAt value divided by maxResults (100). We pass this array to the ForEach. In the copy activity we will need to multiply by maxresults (100).

    @mul(item(), pipeline().parameters.maxResults)  
    

    This value is not only used in the api call, but also in the parameterized sink dataset filename, so we don't overwrite the results of every other call.

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Diptesh Bose 1 Reputation point
    2022-12-13T13:38:38.397+00:00

    @MartinJaffer-MSFT : Will it be possible to share some more snapshots of the variables and how you have setup the variables within the web connector.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.