How to move Apache Airfow Dags to Azure ?

Mudassar 1 Reputation point
2021-07-21T23:32:50.463+00:00

How to move Apache Airflow Dags to Azure ?

I have Apache Airflow DAGs which are running on GCP and have python code in it and would like to know how can I move this to Azure .

Is there a ETL tool which will convert DAG to some other ETL program?

Should the solution be shifted as is or must be re-architected if so what would be tool of choice?

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
1,917 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
9,540 questions
SQL Server Integration Services
SQL Server Integration Services
A Microsoft platform for building enterprise-level data integration and data transformations solutions.
2,453 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. KranthiPakala-MSFT 46,422 Reputation points Microsoft Employee
    2021-07-22T22:52:10.687+00:00

    Hi @Mudassar ,

    Welcome to Microsoft Q&A forum and thanks for posting your query.

    In Azure, ADF (Azure Data Factory) is the cloud-based ETL and data integration service permitting data-driven work processes for arranging data development and changing data at scale. With Azure Data Factory, pipelines (schedule data-driven workflows) can ingest data from unique data stores. You can assemble complex ETL measures that change data outwardly with information streams or by utilizing register administrations like Azure HDInsight Hadoop, Azure Databricks, and Azure SQL Database.

    As per your requirement, I believe the service you are looking for might be Azure Databricks as Airflow provides tight integration between Azure Databricks and Airflow. The Airflow Azure Databricks integration lets you take advantage of the optimized Spark engine offered by Azure Databricks with the scheduling features of Airflow.

    For more details about how to install the Airflow Azure Databricks integration, configuring a Databricks connection, please refer to this article : Managing dependencies in data pipelines

    Also please refer to this article and see if it can be of help for your requirement : Deploying Apache Airflow in Azure to build and run data pipelines

    Hope this info helps.

    ----------

    Please don’t forget to Accept Answer and Up-Vote wherever the information provided helps you, this can be beneficial to other community members.