How to configure ADF copy data REST API pagination with offset in POST request body
Hi. I'm trying to configure pagination for a REST API source in ADF copy data activity. None of the documented options seem to suit my situation which is for the offset and limit values to be sent as part of a JSON structure in the body of the POST request.
The API I am working with is this one https://developers.podio.com/doc/items/filter-items-4496747
I expect I can get it to work with a pipeline loop using variables but I was hoping to use the built-in copy data pagination capability if at all possible.
thanks in advance for any pointers.
Azure Data Factory
-
MartinJaffer-MSFT 26,061 Reputation points
2023-03-13T18:26:40.3366667+00:00 @Don Hello and welcome to Microsoft Q&A.
As I understand you want help with pagination on REST API copy activity. From your link I think you want to do multiple post requests like...
Call 1 body { "sort_by": The sort order to use, "sort_desc": True to sort descending, false otherwise, "filters": The filters to apply { "{key}": The value for the key filtering, ... (more filters) }, "limit": 30, "offset": 0, "remember": false } Call 2 body { "sort_by": The sort order to use, "sort_desc": True to sort descending, false otherwise, "filters": The filters to apply { "{key}": The value for the key filtering, ... (more filters) }, "limit": 30, "offset": 30, "remember": false } Call 3 body { "sort_by": The sort order to use, "sort_desc": True to sort descending, false otherwise, "filters": The filters to apply { "{key}": The value for the key filtering, ... (more filters) }, "limit": 30, "offset": 60, "remember": false }
Where the response returns
$.total
as the total number of items, so we want to stop when total >= offset.Is this correct?
.
.
.
Oh... I see, the pagination rules don't seem to have an easy way to write to the post body. Weird how this hasn't come up before, I thought surely it must be possible. Pagination to body is not a strange ask.
Hmm, I'm testing the pagination-rule-to-parameters-like-
{id}
. If there is a way, I suspect this. So far, no luck. I was hoping I could do like:I checked the legacy format, that isn't much help either.
Alright. I'll inquire internally about this, possibly a feature request. In the meantime I'll help you with the pipeline loop, if you would like? I have a couple models.
-
Don 20 Reputation points
2023-03-13T23:25:57.98+00:00 Hi Martin. Yes, your assumption about the multiple requests with the offset and limit in the POST body is correct. That is indeed what I am hoping to achieve.
I was also surprised that feature isn't built into the pagination options.
I do already have a pipeline loop working using an Until loop with pipeline variables to track the offset so I don't need help with that thanks ... unless you have a model using a For each loop which can do multiple batches in parallel. I did come across that parallel idea somewhere on the web but can't seem to find it again.
-
MartinJaffer-MSFT 26,061 Reputation points
2023-03-15T19:32:07.2033333+00:00 Well I do have a faster way than until loop.
First make a call using web activity so we can get total number of records.
Once we have total number of records, divide that by the number of records per page. I.e. total = 100, 20 per page, makes 5 pages.
Then use Set Variable to assign to array type variable page numbers using the
range
function. So we get[0,1,2,3,4]
. This we pass to a ForEach loop. We can use ForEach to do these in parallel, unlike Until loop which is in sequence.Inside the ForEach loop, make the copy activity, and in the body, we do math to determine the record offset. i.e. 0 x 20=0, 1 x 20=20 ... 4 x 20 = 80. Given we are doing 20 per page, the last one is records 80-100.
-
MartinJaffer-MSFT 26,061 Reputation points
2023-03-15T19:38:50.7566667+00:00 I have submitted a feature request, but you can also help gain traction by submitting your own. You can submit through the ADF / Synapse workspace using icon shown below.
-
Don 20 Reputation points
2023-03-15T23:38:51.4233333+00:00 I found two existing feature requests for pagination within POST body so I voted for those rather than add another feature request for the same thing.
https://feedback.azure.com/d365community/idea/d274c15d-5a8a-ec11-a81b-000d3a7cc2c1
and
https://feedback.azure.com/d365community/idea/02085203-21ab-ed11-a81b-00224850345d
-
Janne Kujanpää 226 Reputation points
2023-04-18T07:22:03.2433333+00:00 @MartinJaffer-MSFT is there any news about feature request? Using loop construct is really bad solution: on our testing it increased runtime 60x and cost 90x.
-
Bensun cambell Fong 0 Reputation points
2023-10-02T15:02:02.23+00:00 @MartinJaffer-MSFT Hi there Martin. I am hoping to follow up on the discussion above, and see if there has been any progress on the request body front.
You mentioned that you might have a couple models for the workaround. For my specific use case (similarly a GraphQL POST against a REST endpoint), I don't have the option to offset, but rather the only way to paginate is through a cursor that must be included as part of the POST request body.
It follows the specification here https://relay.dev/graphql/connections.htm, in short limiting me to paginate by changing the value of "Cursor" I would get in the response payload:
(first: 100, after: "Cursor")
I have 50000 records (and counting) with a maximum page size of 100, and a JWT that expires after only 15 minutes. The built-in pagination would really have helped me here.
Beyond a sequential loop (storing cursor as a variable and using it as input for the next loop) is there a solution or potential model here that you could see? (Of course, I would much rather the devs had included an offset ability, let alone simply followed RFC standard.)
-
Janne Kujanpää 226 Reputation points
2023-10-02T15:18:38.2866667+00:00 I've communicated about this issue through TPD service request. Last time I hear about them was in the May when they told that they contacted product group and will give more information when PG answers.
Currently email of the person handling the request(CAS-1079535-H0G8D7) does not work. trying emailing again tomorrow to see if there is just transient issue.
Sign in to comment