Efficient way for making millions of API calls using ADF

PS 401 Reputation points
2023-03-06T21:24:10.9933333+00:00

Hi All,

I am looking for some advice here.

I need to make 10's of million API calls to get the historical data from a 3rd party vendor and load into Azure Synapse. What is the efficient method to orchestrate the request using ADF? \ Is there a way to make concurrent calls?

also, am open for any advise out of ADF(like a custom solution).

Current Tech Stack - ADF, Azure Synapse

Thank you!

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,651 questions
{count} votes

Accepted answer
  1. KranthiPakala-MSFT 46,642 Reputation points Microsoft Employee Moderator
    2023-03-07T22:51:04.4333333+00:00

    Hi @PS ,

    Thanks for using MS Q&A forum and posting your query.

    When trying to make millions of API calls to a third-party vendor, it is important to ensure that the process is efficient and scalable. I agree with @Dillon Silzer 's input. In addition, it is also important to note that some API providers may have rate limits, data volumes returned from API or other restrictions that may affect the performance or scalability of your solution. Hence it is better to check the with API provider or their documentation and guidelines before implementing any solution. Based on their API limitation you can plan to choose solution accordingly in Azure.

    In addition to above below are are some options for making concurrent API calls using ADF or a custom solution:

    1. Azure Data Factory/Azure Synapse: You can use the ADF/Synapse Web Activity to call the vendor API in parallel. You can also use the batch processing technique to send multiple requests in a single HTTP request. Please note that it has a hard limit on response payload. The maximum supported output response payload size is 4 MB. Apart from ADF Web activity limits there are also additional API call limits imposed by Azure resource Manager and it applies to all Azure Services. Please refer to this doc to know about ADF limitations: Azure Data Factory limits User's image
    2. Azure Function: You can create an Azure Function that can make concurrent API calls using the HttpWebRequest class. This method allows you to send multiple requests in parallel and receive the response asynchronously.
    3. Custom Solution using Azure Batch: You can use Azure Batch to create a custom solution that can make millions of API calls in parallel. Azure Batch allows you to distribute the workload across multiple virtual machines and process the data in parallel.
    4. Custom Solution using Apache Spark: You can use Apache Spark to create a custom solution that can make millions of API calls in parallel. Spark allows you to distribute the workload across multiple nodes and process the data in parallel.

    Please note that Azure Batch and Apache Spark (Azure Synapse or Azure Databricks) solutions may be more suitable for your scenario but at the same time they can be very expensive with respect to your requirement and involves custom code implementation.

    Hope this helps.


    Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.