Breyta

Deila með


What is Workflow Orchestration Manager?

APPLIES TO: Azure Data Factory Azure Synapse Analytics

Tip

Try out Data Factory in Microsoft Fabric, an all-in-one analytics solution for enterprises. Microsoft Fabric covers everything from data movement to data science, real-time analytics, business intelligence, and reporting. Learn how to start a new trial for free!

Note

Apache Airflow is now accessible through Microsoft Fabric. Microsoft Fabric offers a wide range of Apache Airflow capabilities via Data Workflows. We recommend migrating your existing Workflow Orchestration Manager (Apache Airflow in ADF) based workflows to Data Workflows (Apache Airflow in Microsoft Fabric) for a broader set of features. Apache Airflow capabilities will be Genrally Available in Q1 CY2025 only in Microsoft Fabric. For new Apache Airflow projects, we recommend using Apache Airflow in Microsoft Fabric. More details can be found here. New users will not be allowed to create a new workflow orchestration manager in ADF, but existing users with a workflow orchestration manager may continue to use it but plan a migration soon.

Note

Workflow Orchestration Manager for Azure Data Factory relies on the open source Apache Airflow application. Documentation and more tutorials for Airflow can be found on the Apache Airflow Documentation or Community pages.

Azure Data Factory offers serverless pipelines for data process orchestration, data movement with 100+ managed connectors, and visual transformations with the mapping data flow.

Azure Data Factory's Workflow Orchestration Manager service is a simple and efficient way to create and manage Apache Airflow environments, enabling you to run data pipelines at scale with ease. Apache Airflow is an open-source platform used to programmatically create, schedule, and monitor complex data workflows. It allows you to define a set of tasks, called operators, that can be combined into directed acyclic graphs (DAGs) to represent data pipelines. Airflow enables you to execute these DAGs on a schedule or in response to an event, monitor the progress of workflows, and provide visibility into the state of each task. It's widely used in data engineering and data science to orchestrate data pipelines, and is known for its flexibility, extensibility, and ease of use.

Screenshot shows data integration.

When to use Workflow Orchestration Manager?

Azure Data Factory offers Pipelines to visually orchestrate data processes (UI-based authoring). While Workflow Orchestration Manager, offers Airflow based python DAGs (python code-centric authoring) for defining the data orchestration process. If you have the Airflow background, or are currently using Apache Airflow, you might prefer to use the Workflow Orchestration Manager instead of the pipelines. On the contrary, if you wouldn't like to write/ manage python-based DAGs for data process orchestration, you might prefer to use pipelines.

With Workflow Orchestration Manager, Azure Data Factory now offers multi-orchestration capabilities spanning across visual, code-centric, OSS orchestration requirements.

Features

Workflow Orchestration Manager in Azure Data Factory offers a range of powerful features, including:

  • Fast and simple deployment - You can quickly and easily set up Apache Airflow by selecting an Apache Airflow version when you create a Workflow Orchestration Manager.
  • Cloud scale - Workflow Orchestration Manager automatically scales Apache Airflow nodes when required based on range specification (min, max).
  • Microsoft Entra integration - You can enable Microsoft Entra RBAC against your Airflow environment for a single sign-on experience that is secured by Microsoft Entra ID.
  • Metadata encryption - Workflow Orchestration Manager automatically encrypts metadata using Azure-managed keys to ensure your environment is secure by default. It also supports double encryption with a Customer-Managed Key (CMK).
  • Azure Monitoring and alerting - All the logs generated by Workflow Orchestration Manager are exported to Azure Monitor. It also provides metrics to track critical conditions and help you notify if the need be.

Architecture

Screenshot shows architecture in Workflow Orchestration Manager.

Region availability (public preview)

  • East Us
  • South Central Us
  • West Us
  • Brazil South
  • UK South
  • North Europe
  • West Europe
  • SouthEast Asia

Note

The Airflow environment region is defaulted to the Data Factory region and is not configurable, so ensure you use a Data Factory in the above supported region to be able to access the Workflow Orchestration Manager preview.

Supported Apache Airflow versions

  • 2.6.3

Note

Changing the Airflow version within an existing IR is not supported. Instead, the recommended solution is to create a new Airflow IR with the desired version

Integrations

Apache Airflow integrates with Microsoft Azure services through microsoft.azure provider.

You can install any provider package by editing the airflow environment from the Azure Data Factory UI. It takes around a couple of minutes to install the package.

Screenshot shows airflow integration.

Limitations

  • Workflow Orchestration Manager in other regions is available by GA.
  • Data Sources connecting through airflow should be accessible through public endpoint (network).
  • DAGs that are inside a Blob Storage in VNet/behind Firewall is currently not supported. Instead we recommend using Git sync feature of Workflow Orchestration Manager. See, Sync a GitHub repository in Workflow Orchestration Manager
  • Importing Dags from Azure Key Vault isn't supported in LinkedServices.