How to generate ETL pipeline for Python code which gives 2 JSON file in response

Utsav Mori 20 Reputation points
2024-07-13T15:19:01.0733333+00:00

Requirement :

I have python code in my local pc folder which returns 2 JSON file as a output (that code is using API and try to store the data of website in JSON file),

I want to create automation for running that python code at particular time using ADF pipeline so that I don't need to run the code manually in my local PC manually .

Please help me in creating the pipeline for the same I need detailed explanation how will I achieve the following requirement ( I am new to ADF) .

Azure
Azure
A cloud computing platform and infrastructure for building, deploying and managing applications and services through a worldwide network of Microsoft-managed datacenters.
1,024 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,643 questions
0 comments No comments
{count} votes

3 answers

Sort by: Most helpful
  1. Deleted

    This answer has been deleted due to a violation of our Code of Conduct. The answer was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.

    2 deleted comments

    Comments have been turned off. Learn more

  2. Luis Arias 6,786 Reputation points
    2024-07-13T16:27:14.5966667+00:00

    Hi Utsav Mori,

    I would suggest you to start with the quick starter on Data Factory (https://learn.microsoft.com/en-us/azure/data-factory/quickstart-get-started) then you can set up a pipeline base on your requeriments . It would look like so:

    1. Create an Azure Batch Account and Pool.
    2. Upload your Python script to Azure Blob Storage.
    3. Create a Data Factory and a new pipeline in it.
    4. Add a Custom Activity in the pipeline to run your Python script.
    5. Define the input and output datasets linked to your Blob Storage.
    6. Validate and publish the pipeline.
    7. Create a schedule trigger for automation.
    8. Monitor the pipeline’s performance.

    Additional Microsoft how tos:

    If the information helped address your question, please Accept the answer.

    Luis


  3. NIKHILA NETHIKUNTA 2,155 Reputation points Microsoft Vendor
    2024-07-16T16:50:56+00:00

    Hi @Utsav Mori
    Thank you for your question and using the Microsoft Q&A platform

    To run a Python script in Azure Data Factory (ADF), you can use the following approaches:

    1. Azure Batch: You can use Azure Batch to run your Python script in parallel on multiple virtual machines (VMs) to improve performance. This approach is suitable for large-scale data processing.
    2. Azure Functions: You can use Azure Functions to run your Python script as a serverless function. This approach is suitable for small-scale data processing.
    3. Azure Databricks: You can use Azure Databricks to run your Python script in a distributed environment. This approach is suitable for large-scale data processing and machine learning workloads.

    For more information, please refer the below articles:

    It is possible to run a Python script in Azure Data Factory (ADF) without creating a Batch account and pool. You may checkout the below link to achieve the goal of automating the Python script execution and storing the JSON output using ADF with Azure Functions:
    https://learn.microsoft.com/en-us/azure/azure-functions/functions-create-function-app-portal?pivots=programming-language-python#create-a-function-app

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

     


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.