How to Get Data from an API Using Azure Data Factory (ADF) Without Dataflows and Handle Cursor-Based Pagination

Question

Onkar More 0

Hi, I'm working with Azure Data Factory (ADF) and need to fetch data from an API that uses cursor-based pagination. I want to:

Make an initial API call to get the first set of data.
Apply pagination using the cursor field to fetch subsequent pages.
Extract the time_entries data from each page's API response.
Store the data in a different file for each page (for example, each page’s data should be written to a separate file in Blob Storage or any other destination).
Avoid using Dataflows for this process.

Could someone help me with the following?

How to set up pagination in ADF (using next URL) to fetch all pages of data?
How to extract the time_entries from the API response and use it in a Copy Activity?
How can I configure ADF to write the data from each page to a separate file (e.g., using dynamic file names)?

Any advice or example configurations would be greatly appreciated!

Thank you!Hi, I'm working with Azure Data Factory (ADF) and need to fetch data from an API that uses cursor-based pagination. I want to:

Make an initial API call to get the first set of data.
Apply pagination using the cursor field to fetch subsequent pages.
Extract the time_entries data from each page's API response.
Store the data in a different file for each page (for example, each page’s data should be written to a separate file in Blob Storage or any other destination).
Avoid using Dataflows for this process.

Could someone help me with the following?

How to set up pagination in ADF (using next URL) to fetch all pages of data?
How to extract the time_entries from the API response and use it in a Copy Activity?
How can I configure ADF to write the data from each page to a separate file (e.g., using dynamic file names)?

Any advice or example configurations would be greatly appreciated!

Thank you!

Ganesh Gurram 7,295 Reputation points Microsoft External Staff Moderator

2025-02-18T11:25:34.72+00:00

@Onkar More - Just checking in to see if the below answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

1 answer

Ganesh Gurram 7,295 Reputation points Microsoft External Staff Moderator

2025-02-18T11:25:34.72+00:00

@Onkar More - Just checking in to see if the below answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Answer 1

You can start by creating a new pipeline where you add a Web Activity to the pipeline to make the initial API call :

Then you need to parse the initial API response :

Add a Set Variable Activity to store the initial API response and det the variable APIResponse to @activity('WebActivity1').output
Use a Parse JSON Activity to extract the cursor and time_entries from the response where you need to define the schema in the Parse JSON activity to match the structure of your API response.

Now you need to set up the pagination :

Add an Until Activity to handle pagination.
- Set the condition to check if the cursor is null or empty (@empty(variables('cursor'))).
- Inside the Until Activity:
  - Add a Web Activity to fetch the next page of data using the cursor.
    - Set the URL dynamically using the cursor value (@concat('https://api.example.com/time_entries?cursor=', variables('cursor'))).
  - Use another Parse JSON Activity to extract the cursor and time_entries from the subsequent API response.
Inside the Until Activity, add a Copy Data Activity to write the time_entries data to a file.
- Configure the Source:
  - Use the JSON response from the Web Activity as the source.
  - Map the time_entries field to the source dataset.
- Configure the Sink:
  - Set the destination (e.g., Blob Storage).
  - Use dynamic content for the file name to ensure each page’s data is written to a separate file (@concat('time_entries_', pipeline().RunId, '_', variables('cursor'), '.json')).
In the Copy Data Activity, configure the sink file name dynamically.
- Use an expression like @concat('time_entries_', pipeline().RunId, '_', variables('cursor'), '.json') to generate unique file names for each page.ion!

Ganesh Gurram 7,295 Reputation points Microsoft External Staff Moderator

2025-02-19T11:10:55.1566667+00:00

@Onkar More - Following up to see if the above answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.