recurring api call and process json files to add data in dedicated sql pool

Kumar, Amit 41 Reputation points
2023-05-29T12:17:19.1533333+00:00

Hi ,

I have below requirment for enterprise product . please suggest best design.

1-: Recurring API call need to be made to fetch json data .

2-: These json data should be written as json files to azure data lake storage .

3-: We have developed python code which will process these json files as per requirements .

4-: This python code should be executed on hourly basis to process all json files of one hour.

where we should deploy this code for best performance as one hour data would be around 7-8 GB and should be processed in max 15 minutes .

Can we link all these requirments in azure synapse pipeline or if we use azure logical apps , how python code woule be executed .

Please share best design as per azure expertise .

Regards

Amit

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,559 questions
Azure Logic Apps
Azure Logic Apps
An Azure service that automates the access and use of data across clouds without writing code.
3,542 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,373 questions
0 comments No comments
{count} votes

Accepted answer
  1. VasimTamboli 5,215 Reputation points
    2023-05-30T14:38:46.2266667+00:00

    Azure Synapse Analytics and Azure Logic Apps can be used together to create an effective solution. Here's a suggested design:

    Recurring API Calls:

    • Use Azure Logic Apps to schedule and trigger recurring API calls. Logic Apps provides built-in connectors to interact with various APIs.
    • Configure the Logic App to call the API at the desired interval (hourly in your case) and retrieve the JSON data.

    Writing JSON data to Azure Data Lake Storage:

    • After retrieving the JSON data, use the Azure Blob Storage connector in Logic Apps to write the JSON data as files to Azure Data Lake Storage (ADLS). ADLS provides a scalable and reliable storage solution for big data workloads.

    Processing JSON files with Python code:

    • Deploy your Python code as an Azure Function or Azure Databricks notebook. Both options support executing Python code in a scalable and managed environment.
    • Use the ADLS connector within your Python code to read the JSON files from ADLS and process them as per your requirements.

    Executing Python code on an hourly basis:

    • Create an Azure Synapse Pipeline to orchestrate the execution of your Python code on an hourly basis.
    • Use the "Schedule" trigger in the pipeline to define the recurrence interval (hourly).
    • Within the pipeline, add an activity (such as an Azure Databricks Notebook activity or an Azure Function activity) to run your Python code.

    Regarding performance considerations:

    • Azure Synapse Analytics provides a dedicated SQL pool (formerly SQL Data Warehouse) that can handle large data volumes and provide high-performance querying capabilities. You can consider storing the processed data in the dedicated SQL pool if it aligns with your requirements.
    • While processing the JSON files, you can leverage Azure Synapse Spark capabilities for distributed data processing to achieve better performance and scalability.

    Overall, this design allows you to schedule API calls, store JSON data in ADLS, process the data using Python code, and execute the Python code on an hourly basis with the help of Azure Logic Apps and Azure Synapse Analytics. It provides a scalable and managed environment to handle large data volumes efficiently.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.