I am splitting your question in 3 parts :
Q1 : In your case, you can configure the pagination rules in the dataset settings. Use parameters like offset
and limit
in your API calls to fetch chunks of data iteratively. Just make sure what you did for the pagination is correctly defined to capture all records by setting NextPage
to dynamically generate the next page URL based on the current page’s response. This how you can fetch large datasets in manageable chunks without overwhelming the API or the ADF pipeline.
Q2 : You can use the Copy Activity in ADF to store all the records in one CSV file. Configure the source to use pagination as described, and set the destination as a single CSV file in your preferred storage (Azure Blob Storage or Data Lake).
The Sink
settings in the Copy Activity need to be configured to append data to the CSV file rather than overwrite it (change the behavior fromCopy Behavior
to Append
).
This way, each batch of data fetched from the API will be added to the same CSV file, allowing you to consolidate all records into one file.
Q3 : To avoid hard mapping and make the Copy Activity generic for different REST API calls, configure the dataset schema to be flexible.
You can set the Schema
to None
in the dataset definition, allowing ADF to dynamically map the source data to the sink without predefined mappings. With this in hand, your pipeline can handle APIs with varying columns.
In the Copy Activity settings, enable Auto Mapping
which automatically maps source columns to sink columns based on their names and data types, accommodating different schemas dynamically without manual intervention.