How to Get Data from an API Using Azure Data Factory (ADF) Without Dataflows and Handle Cursor-Based Pagination

Onkar More 0 Reputation points
2025-02-17T13:10:14.1266667+00:00

Hi, I'm working with Azure Data Factory (ADF) and need to fetch data from an API that uses cursor-based pagination. I want to:

  1. Make an initial API call to get the first set of data.
  2. Apply pagination using the cursor field to fetch subsequent pages.
  3. Extract the time_entries data from each page's API response.
  4. Store the data in a different file for each page (for example, each page’s data should be written to a separate file in Blob Storage or any other destination).
  5. Avoid using Dataflows for this process.

Could someone help me with the following?

  • How to set up pagination in ADF (using next URL) to fetch all pages of data?
  • How to extract the time_entries from the API response and use it in a Copy Activity?
  • How can I configure ADF to write the data from each page to a separate file (e.g., using dynamic file names)?

Any advice or example configurations would be greatly appreciated!

Thank you!Hi, I'm working with Azure Data Factory (ADF) and need to fetch data from an API that uses cursor-based pagination. I want to:

  1. Make an initial API call to get the first set of data.
  2. Apply pagination using the cursor field to fetch subsequent pages.
  3. Extract the time_entries data from each page's API response.
  4. Store the data in a different file for each page (for example, each page’s data should be written to a separate file in Blob Storage or any other destination).
  5. Avoid using Dataflows for this process.

Could someone help me with the following?

  • How to set up pagination in ADF (using next URL) to fetch all pages of data?
  • How to extract the time_entries from the API response and use it in a Copy Activity?
  • How can I configure ADF to write the data from each page to a separate file (e.g., using dynamic file names)?

Any advice or example configurations would be greatly appreciated!

Thank you!

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,624 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 33,071 Reputation points Volunteer Moderator
    2025-02-17T15:21:14.6966667+00:00

    You can start by creating a new pipeline where you add a Web Activity to the pipeline to make the initial API call :

    • Set the URL to the API endpoint
    • Use the GET method
    • Configure Headers and Authentication as required by the API
    • Store the response in a variable APIResponse

    Then you need to parse the initial API response :

    • Add a Set Variable Activity to store the initial API response and det the variable APIResponse to @activity('WebActivity1').output
    • Use a Parse JSON Activity to extract the cursor and time_entries from the response where you need to define the schema in the Parse JSON activity to match the structure of your API response.

    Now you need to set up the pagination :

    • Add an Until Activity to handle pagination.
      • Set the condition to check if the cursor is null or empty (@empty(variables('cursor'))).
      • Inside the Until Activity:
        • Add a Web Activity to fetch the next page of data using the cursor.
          • Set the URL dynamically using the cursor value (@concat('https://api.example.com/time_entries?cursor=', variables('cursor'))).
        • Use another Parse JSON Activity to extract the cursor and time_entries from the subsequent API response.
    • Inside the Until Activity, add a Copy Data Activity to write the time_entries data to a file.
      • Configure the Source:
        • Use the JSON response from the Web Activity as the source.
        • Map the time_entries field to the source dataset.
      • Configure the Sink:
        • Set the destination (e.g., Blob Storage).
        • Use dynamic content for the file name to ensure each page’s data is written to a separate file (@concat('time_entries_', pipeline().RunId, '_', variables('cursor'), '.json')).
    • In the Copy Data Activity, configure the sink file name dynamically.
      • Use an expression like @concat('time_entries_', pipeline().RunId, '_', variables('cursor'), '.json') to generate unique file names for each page.ion!

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.