How to Get REST API Data when return rows are more than 30,000 and store in one single csv

Avnish Yadav 0 Reputation points
2024-05-09T08:27:59.8866667+00:00

How to Get REST API Data when return rows are more than 30,000 and store in one single csv. I tried pagination rule with AbsoluteUrl to $.Response.

  1. How to get copy activity go through each record with offset and limit automatically?
  2. How to write all records within one csv file? without breaking the pipeline's copy activity?
  3. How to avoid Mapping? Means this copy activity will be generic and used for multiple REST API call. which could produce different column each time. so hard mapping is not an option in Copy Activity.

Any help greatly appreciate it

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,694 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 16,071 Reputation points
    2024-05-09T21:55:14.2333333+00:00

    I am splitting your question in 3 parts :

    Q1 : In your case, you can configure the pagination rules in the dataset settings. Use parameters like offset and limit in your API calls to fetch chunks of data iteratively. Just make sure what you did for the pagination is correctly defined to capture all records by setting NextPage to dynamically generate the next page URL based on the current page’s response. This how you can fetch large datasets in manageable chunks without overwhelming the API or the ADF pipeline.

    Q2 : You can use the Copy Activity in ADF to store all the records in one CSV file. Configure the source to use pagination as described, and set the destination as a single CSV file in your preferred storage (Azure Blob Storage or Data Lake).

    The Sink settings in the Copy Activity need to be configured to append data to the CSV file rather than overwrite it (change the behavior fromCopy Behavior to Append).

    This way, each batch of data fetched from the API will be added to the same CSV file, allowing you to consolidate all records into one file.

    Q3 : To avoid hard mapping and make the Copy Activity generic for different REST API calls, configure the dataset schema to be flexible.

    You can set the Schema to None in the dataset definition, allowing ADF to dynamically map the source data to the sink without predefined mappings. With this in hand, your pipeline can handle APIs with varying columns.

    In the Copy Activity settings, enable Auto Mapping which automatically maps source columns to sink columns based on their names and data types, accommodating different schemas dynamically without manual intervention.

    0 comments No comments