Pagination with next_offset REST API

Ruben Betjes 55 Reputation points
2023-05-29T15:12:08.6033333+00:00

Hello,

I'm trying to load data from a REST API into Azure Data Factory. Everything is going well but I am stuck at the pagination part. The body of the API call returns a 'next_offset' which looks like this:

                "pii_cleared": "active",
                "channel": "web",
                "resource_version": 1683583371228,
                "deleted": false,
                "object": "customer",
                "card_status": "no_card",
                "promotional_credits": 0,
                "refundable_credits": 0,
                "excess_payments": 0,
                "unbilled_charges": 0,
                "preferred_currency_code": "EUR",
                "tax_providers_fields": []
            }
        }
    ],
    "next_offset": "[\"1682321640000\",\"37957245\"]"

I've been trying to find a way to do this all day, but no luck so far. I've used this documentation: https://learn.microsoft.com/en-us/azure/data-factory/connector-rest?tabs=data-factory#pagination-support.

Also adding the API documentation:

PAGINATION

Pagination can be controlled by following parameters:

  • limit: This limits the number of resources to be returned in the response. The value ranges from 1 to 100 and defaults to 10.
  • offset: If not specified, the first set of resources (number of resources limited by the limit parameter) will be returned. If more resources are present, a 'next offset' parameter is returned in the result. To fetch the next set (page) of resources, use this offset parameter. It's value should be the next_offset attribute returned in your previous list api invocation.

Hopefully someone can help me with this. Thanks in advance.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,624 questions
{count} vote

Accepted answer
  1. QuantumCache 20,366 Reputation points Moderator
    2023-05-31T01:28:13.7766667+00:00

    Hello @Ruben Betjes,

    Yes, you are on right path for implementing the Offset param.

    Coming to the implementation of incremental loading:

    One approach to handle incremental updates is to use a watermark. A watermark is a column in the source data that is incremental - it could be a datetime column (such as last modified time) or an auto-incrementing column. You would store the maximum watermark value each time data is loaded. During the next data load, you would filter the source data to only retrieve rows where the watermark column is greater than the stored value. Example: Let us take SQL Table as our Source DataSet.

    User's image

    Ref: Incrementally load data from a source data store to a destination data store

    User's image

    Add a Trigger: Lastly, you want to run this pipeline every night at 2 AM. To achieve this, you need to create a new Trigger. Set its Start time to 2 AM and its Recurrence to daily. Please let us know if you need any further help in this matter or we can close this thread.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.