Suggest strategy/architecture for API Data Ingestion - Data factory or Function apps or something else?

Question

Suggest strategy/architecture for API Data Ingestion - Data factory or Function apps or something else?

Amrale, Siddhesh 40

Hi!

Description and Goal
I have 50 external APIs and 2 on Prem APIs right now.
So, I want to pull data from these APIs (some Apis can have pagination too) and put into some storage.
I might pull one API every 2-minute, one API every 10-minute, one API daily etc.
The APIs data can be complex or simple. It is approximately around 10KB to 100MB for each pull.
Data format is mostly Json or xml.

Example: I need to pull parking data from an API and put it into storage. After putting it into storage I will create my own API and provide this parking data to internal teams. My manager might create power BI dashboard on this data.

Should I have different layers?

what would be best for ingestion and then using that data, please suggest best strategy for my use case in detail. Like what azure technology I should use? what should be the file format when I store this data? How can I query this data? Should I create another layer which picks this raw data from storage and convert it into something?

Also, is there any scenario where data factory beats azure function app? Because whatever data factory would do in this scenario azure function can do it too in faster and cheaper way. Also, I would have more control and flexibility.

#AzureDataFactory #AzureFunctionApp

Smaran Thoomu 24,185 Reputation points Microsoft External Staff Moderator

2025-03-21T05:55:16.0166667+00:00

@Amrale, Siddhesh We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Accepted answer

1 additional answer

Your answer

Smaran Thoomu 24,185 Reputation points Microsoft External Staff Moderator

2025-03-21T05:55:16.0166667+00:00

@Amrale, Siddhesh We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Answer 1

Hi @Amrale, Siddhesh
Your use case involves ingesting data from multiple APIs with varying frequencies, handling different data formats (JSON/XML), and ensuring it is stored optimally for querying and analysis. Below is a recommended architecture using Azure services:

Data Ingestion Layer

Azure Function Apps vs. Azure Data Factory (ADF)

Criteria	Azure Function Apps	Azure Data Factory
Triggering Flexibility	Event-driven, supports CRON scheduling, and can handle frequent invocations (every 2 min, 10 min, etc.).	Best for batch-oriented workloads, not ideal for very high-frequency triggers.
Performance & Cost	Cost-effective for frequent small payloads, can scale dynamically.	Better suited for large-scale ETL processes with less frequent execution.
Complex Orchestration	More control over API calls, including custom pagination handling.	Ideal for workflows involving multiple dependencies, transformations, and monitoring.
Pagination & API Handling	Requires custom logic but provides full control over API request handling.	Some pagination support, but more limited in flexibility than Functions.
Monitoring & Debugging	Can integrate with Azure Application Insights for detailed logging.	Built-in monitoring with logs and execution history in Azure Portal.

For high-frequency, small-to-medium-sized API calls (e.g., every 2 mins, 10 mins) → Use Azure Function Apps due to cost efficiency, flexibility, and event-driven execution.
For scheduled batch pulls or complex workflows (e.g., daily ingestion from multiple APIs) → Use Azure Data Factory (especially if needing data transformations and orchestration).

Storage Layer

Raw Storage: Store data in Azure Data Lake Storage (ADLS) Gen2 for scalability and hierarchical structure.
File Format:
- JSON: If minimal transformation is needed before querying.
- Parquet: If optimized querying is required (better for Power BI and analytical queries).
Schema Evolution: Use Azure Synapse or Databricks if schema transformations or merging of different API data sources is needed.

Processing & Transformation Layer

Direct Querying: If querying raw JSON/XML, use Azure Synapse Serverless SQL to query directly from ADLS.
Transformation Needs:
- Databricks or Synapse Pipelines: If further transformation, aggregation, or normalization is required before exposing to internal APIs.
- Azure Function Apps: If lightweight transformation logic is needed.

API Exposure & Consumption Layer

Azure API Management (APIM): To expose the processed data as an internal API for consumers.
Power BI Integration: Power BI can connect directly to ADLS (using Synapse) or via an API exposed through APIM.

When does ADF outperform Azure Functions?

If dealing with large-scale batch processing with dependency chaining.
When built-in connectors simplify ingestion (e.g., database ingestion instead of API calls).
If monitoring, retry mechanisms, and logging via ADF UI are preferred.

Control & Flexibility

Azure Function Apps provide better flexibility for API interactions and custom logic.
ADF is more suitable when orchestration across multiple data sources is needed.

In summary of the points mentioned above:

Use Azure Function Apps for API ingestion due to the need for frequent, dynamic calls.
Store data in ADLS Gen2 in Parquet format for optimized querying.
If necessary, use Azure Synapse or Databricks for further transformations.
Expose processed data via Azure API Management for internal teams.
Power BI can connect to Synapse or APIs based on reporting needs.

I hope this helps. Please let us know if you need any further clarification.

Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.

Amrale, Siddhesh 40 Reputation points

2025-03-19T21:17:00.7533333+00:00

I get what you are saying. But things you have mentioned I can achieve them in function apps. I can even chain one function after another. I can have normalization and other things in function app. It would be just simpler or less code in adf.

but don’t see any scenario where adf beats function app.

function app wins in compute cost, speed, memory cost, flexibility, i can have 2-3 sinks instead of one.
Smaran Thoomu 24,185 Reputation points Microsoft External Staff Moderator

2025-03-20T06:33:23.1133333+00:00

@Amrale, Siddhesh Yes, you have made a great point - Azure Function Apps indeed provide flexibility, better compute efficiency, and cost advantages for API-based ingestion, especially when you need fine-grained control over chaining, transformations, and multiple sinks.

However, ADF still has advantages in certain scenarios:

Simplified Orchestration & Low Code Approach: ADF abstracts much of the complexity around scheduling, dependency management, and monitoring, making it a great choice when managing multiple API ingestions at scale.

Built-in Monitoring & Retry Mechanism: While Function Apps offer logging via Application Insights, ADF provides detailed run histories, failure handling, and built-in retry policies without additional setup.

Better for Large-Scale Batch Processing: ADF is optimized for handling massive data loads from multiple sources, especially when integrating with structured data sources (databases, warehouses, etc.).

Connectivity & Integration: If your architecture expands beyond APIs (e.g., pulling from databases, on-prem sources, or event-based triggers), ADF’s built-in connectors make integration smoother.

That being said, if Function Apps meet all your needs and provide more control with better cost efficiency, it makes perfect sense to continue with them. The best approach ultimately depends on your workload requirements and organizational preferences.

Let us know if you need more insights!

Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.
Amrale, Siddhesh 40 Reputation points

2025-03-21T06:08:41.93+00:00

Your comments were very helpful, thanks!

Thinking to implement function apps for ingestion for frequent triggers as it is cheaper and gives a lot of flexibility, also I can put function apps in VNET and use a static public i.p for outbound calls if 3rd party api accepts request only from whitelisted i.p only.

and for less frequent triggers like weekly or monthly and large batches of data might use data factory or container apps
Smaran Thoomu 24,185 Reputation points Microsoft External Staff Moderator

2025-03-21T06:16:12.72+00:00

@Amrale, Siddhesh That sounds like a well-thought-out approach! Azure Function Apps are great for frequent triggers, and ADF/container apps will handle batch workloads effectively. Let me know if you need any further insights.

If this discussion helped, please consider marking the Accept answer to help others in the community.

Answer 2

Nandan Hegde 36,151 MVP Volunteer Moderator

Azure Functions in the Consumption plan have a default timeout of 5 minutes and an upper limit of 10 minutes. If you need a longer timeout, you need to switch to a different plan, such as the Premium plan, which provides longer timeouts and dedicated resources.

So Azure function cant be a good fit as there might be cases where it might timeout.

So would suggest you either use Azure data factory in case if you want to use low code/no code way but you can also use Azure databricks for better performance aspect.

You can use a Medallion architecture way wherein you can have a bronze layer or a container within Azure blob storage to have the raw data as is as different paginated version of JSON files.

Then you can have a silver layer as a seperate container that can merge all individual files of single entity into a single file

Then you can have a gold layer container which would contain the data needed for reporting purpose

Amrale, Siddhesh 40 Reputation points

2025-03-19T21:09:04.4666667+00:00

I think moving same data twice is inefficient.

function app beats data factory in every aspect in my scenario. Premium or dedicated function app or durable function app. I won’t have any timeout issue.

instead of creating multiple json files . I can store data in azure storage table instead and keep on appending to that table.

Data factory sink can have only one sink.

in function app if I want to I can put data into blob container and append to azt table at same time.

so, here i am having 2 sinks.

Do you know any other scenario which i might run into issue?

Cuz i don’t see any issue with timeout.
Nandan Hegde 36,151 Reputation points MVP Volunteer Moderator

2025-03-20T10:49:13.9066667+00:00

Durable Functions are generally cheaper for complex, long-running workflows because they are billed on execution time rather than on a per-action basis like Azure data factory,logic app etc.

So in case if there is no major difference in the cost aspect across ADF and Durable function based on your scenario and calculation and assuming your skill/strength is coding, then Durable function would be the best way as stated by yourself wherein you can have a dataframe to have all data (even paginated ones) and save the cumulated data in different sinks parallelly

Share via

Suggest strategy/architecture for API Data Ingestion - Data factory or Function apps or something else?

1 additional answer

Your answer