What is Azure Data Factory Managed Airflow?
APPLIES TO: Azure Data Factory Azure Synapse Analytics
Try out Data Factory in Microsoft Fabric, an all-in-one analytics solution for enterprises. Microsoft Fabric covers everything from data movement to data science, real-time analytics, business intelligence, and reporting. Learn how to start a new trial for free!
This feature is in public preview. For questions or feature suggestions, please send an email to ManagedAirflow@microsoft.com with the details.
Managed Airflow for Azure Data Factory relies on the open source Apache Airflow application. Documentation and more tutorials for Airflow can be found on the Apache Airflow Documentation or Community pages.
Azure Data Factory offers serverless pipelines for data process orchestration, data movement with 100+ managed connectors, and visual transformations with the mapping data flow.
Azure Data Factory's Managed Airflow service is a simple and efficient way to create and manage Apache Airflow environments, enabling you to run data pipelines at scale with ease. Apache Airflow is an open-source platform used to programmatically create, schedule, and monitor complex data workflows. It allows you to define a set of tasks, called operators, that can be combined into directed acyclic graphs (DAGs) to represent data pipelines. Airflow enables you to execute these DAGs on a schedule or in response to an event, monitor the progress of workflows, and provide visibility into the state of each task. It's widely used in data engineering and data science to orchestrate data pipelines, and is known for its flexibility, extensibility, and ease of use.
When to use Managed Airflow?
Azure Data Factory offers Pipelines to visually orchestrate data processes (UI-based authoring). While Managed Airflow, offers Airflow based python DAGs (python code-centric authoring) for defining the data orchestration process. If you have the Airflow background, or are currently using Apache Airflow, you may prefer to use the Managed Airflow instead of the pipelines. On the contrary, if you wouldn't like to write/ manage python-based DAGs for data process orchestration, you may prefer to use pipelines.
With Managed Airflow, Azure Data Factory now offers multi-orchestration capabilities spanning across visual, code-centric, OSS orchestration requirements.
Managed Airflow in Azure Data Factory offers a range of powerful features, including:
- Fast and simple deployment - You can quickly and easily set up Apache Airflow by selecting an Apache Airflow version when you create a Managed Airflow.
- Cloud scale - Managed Airflow automatically scales Apache Airflow nodes when required based on range specification (min, max).
- Azure Active Directory integration - You can enable Azure AD RBAC against your Airflow environment for a single sign on experience that is secured by Azure Active Directory.
- Managed Virtual Network integration (coming soon) - You can access your data source via private endpoints or on-premises using ADF Managed Virtual Network that provides extra network isolation.
- Metadata encryption - Managed Airflow automatically encrypts metadata using Azure-managed keys to ensure your environment is secure by default. It also supports double encryption with a Customer-Managed Key (CMK).
- Azure Monitoring and alerting - All the logs generated by Managed Airflow is exported to Azure Monitor. It also provides metrics to track critical conditions and help you notify if the need be.
Region availability (public preview)
- East Us
- South Central Us
- West Us
- UK South
- North Europe
- West Europe
- SouthEast Asia
- East US2 (coming soon)
- West US2 (coming soon)
- Germany West Central (coming soon)
- AustraliaEast (coming soon)
By GA, all ADF regions will be supported. The Airflow environment region is defaulted to the Data Factory region and is not configurable, so ensure you use a Data Factory in the above supported region to be able to access the Managed Airflow preview.
Supported Apache Airflow versions
Changing the Airflow version within an existing IR is not supported. Instead, the recommended solution is to create a new Airflow IR with the desired version
Apache Airflow integrates with Microsoft Azure services through microsoft.azure provider.
You can install any provider package by editing the airflow environment from the Azure Data Factory UI. It takes around a couple of minutes to install the package.
- Managed Airflow in other regions is available by GA.
- Data Sources connecting through airflow should be publicly accessible.
- Blob Storage behind VNet is not supported during the public preview.
- DAGs that are inside a Blob Storage in VNet/behind Firewall is currently not supported.
- Azure Key Vault isn't supported in LinkedServices to import dags.
- Airflow supports officially Blob Storage and ADLS with some limitations.