What's new and planned for Data Factory in Microsoft Fabric

Important

The release plans describe functionality that may or may not have been released yet. The delivery timelines and projected functionality may change or may not ship. Refer to Microsoft policy for more information.

Data Factory in Microsoft Fabric combines citizen data integration and pro data integration capabilities into a single, modern data integration experience. It provides connectivity to more than 100 relational and nonrelational databases, lakehouses, data warehouses, generic interfaces like REST APIs, OData, and more.

Dataflows: Dataflow Gen2 enables you to perform large-scale data transformations, and supports various output destinations that write to Azure SQL Database, Lakehouse, Data Warehouse, and more. The dataflows editor offers more than 300 transformations, including AI-based options, and lets you transform data easily with better flexibility than any other tool. Whether you're extracting data from an unstructured data source such as a web page or reshaping an existing table in the Power Query editor, you can easily apply Power Query's Data Extraction By Example, that uses artificial intelligence (AI) and simplifies the process.

Data pipelines: Data pipelines offer the capability to create versatile data orchestration workflows that bring together tasks like data extraction, loading into preferred data stores, notebook execution, SQL script execution, and more. You can quickly build powerful metadata-driven data pipelines that automate repetitive tasks. For example, loading and extracting data from different tables in a database, iterating through multiple containers in Azure Blob Storage, and more. Furthermore, with data pipelines, you can access the data from Microsoft 365, using the Microsoft Graph Data Connection (MGDC) connector.

To learn more, see the documentation.

Investment areas

Over the next few months, Data Factory in Microsoft Fabric will expand its connectivity options and continue to add to the rich library of transformations and data pipeline activities. Moreover, it enables you to perform real-time, high-performance data replication from operational databases, and bring this data into the lake for analytics.

Feature Estimated release timeline
Data pipeline support for DBT Q1 2024
Fast Copy support in Dataflow Gen2 Q3 2024
Data source identity management (Managed Identity) Q3 2024
Data Factory Git integration for dataflows Q4 2024
Copilot for Data Factory (Data pipeline) Q4 2024
On-premises data gateway (OPDG) support added to data pipelines Shipped (Q3 2024)
Support for invoking cross-workspace data pipelines Shipped (Q3 2024)
Azure Data Factory in Fabric Shipped (Q3 2024)
Incremental refresh support in Dataflow Gen2 Shipped (Q3 2024)
Data pipeline support for Azure Databricks Jobs Shipped (Q3 2024)
Improved email notifications for Refresh failures Shipped (Q3 2024)
Copy Job Shipped (Q3 2024)
Copilot for Data Factory (Dataflow) Shipped (Q3 2024)
Staging defaults for Dataflow Gen 2 Output destination Shipped (Q2 2024)
Data pipeline support for Event-Driven Triggers Shipped (Q2 2024)
Data pipeline support for SparkJobDefinition Shipped (Q2 2024)
Data pipeline support for Azure HDInsight Shipped (Q2 2024)
New connectors for Copy Activity Shipped (Q2 2024)
Apache Airflow job: Build data pipelines powered by Apache Airflow Shipped (Q2 2024)
Data source identity management (SPN) Shipped (Q2 2024)
Get data experience improvements(Browse Azure Resources) Shipped (Q1 2024)
On-premises data gateway (OPDG) support added to data pipelines Shipped (Q1 2024)
Data Factory Git integration for data pipelines Shipped (Q1 2024)
Enhancements to output destinations in Dataflow Gen2 (query schema) Shipped (Q1 2024)
Fast Copy support in Dataflow Gen2 Shipped (Q1 2024)
Cancel refresh support in Dataflow Gen2 Shipped (Q4 2023)

Data pipeline support for DBT

Estimated release timeline: Q1 2024

Release Type: Public preview

DBT CLI Orchestration (Data Build Tool): Incorporates the data build tool (dbt) for data transformation workflows.

Fast Copy support in Dataflow Gen2

Estimated release timeline: Q3 2024

Release Type: General availability

We're adding support for large-scale data ingestion directly within the Dataflow Gen2 experience, utilizing the pipelines Copy Activity capability. This enhancement significantly scales up the data processing capacity of Dataflow Gen2 providing high-scale ELT (Extract-Load-Transform) capabilities.

Data source identity management (Managed Identity)

Estimated release timeline: Q3 2024

Release Type: Public preview

This enables Managed identity to be configured at a workspace level. You can use the Fabric managed identities to connect to your data source securely.

Data Factory Git integration for dataflows

Estimated release timeline: Q4 2024

Release Type: Public preview

You can connect to a Git repository and develop your dataflows. This capability enables integration with version control, and offers commits and pull requests.

Copilot for Data Factory (Data pipeline)

Estimated release timeline: Q4 2024

Release Type: Public preview

Copilot for Data Factory (Data pipeline) empowers customers to build data pipelines using natural language and provides troubleshooting guidance.

Shipped feature(s)

On-premises data gateway (OPDG) support added to data pipelines

Shipped (Q3 2024)

Release Type: General availability

This feature enables data pipelines to use Fabric data gateways to access data that is on-premises and behind a virtual network. For users using self-hosted integration runtimes (SHIR), they'll be able to move to on-premises data gateways in Fabric.

Support for invoking cross-workspace data pipelines

Shipped (Q3 2024)

Release Type: Public preview

Invoke Pipelines activity update: We are enabling some new and exciting updates to the Invoke Pipeline activity. In response to overwhelming customer and community requests, we are enabling running data pipelines across workspaces. You will now be able to invoke pipelines from other workspaces that you have access to execute. This will enable very exciting data workflow patterns that can utilize collaboration from your data engineering and integration teams across workspaces and across functional teams.

Azure Data Factory in Fabric

Shipped (Q3 2024)

Release Type: Public preview

Bring your existing Azure Data Factory (ADF) to your Fabric workspace! This is a new preview capability that allows you to connect to your existing ADF factories from your Fabric workspace.

You will now be able to fully manage your ADF factories directly from the Fabric workspace UI! Once your ADF is linked to your Fabric workspace, you’ll be able to trigger, execute, and monitor your pipelines as you do in ADF but directly inside of Fabric.

Incremental refresh support in Dataflow Gen2

Shipped (Q3 2024)

Release Type: Public preview

We're adding incremental refresh support in Dataflow Gen2. This feature enables you to incrementally extract data from data sources, apply Power Query transformations, and load into various output destinations.

Data pipeline support for Azure Databricks Jobs

Shipped (Q3 2024)

Release Type: Public preview

We are updating the Data Factory data pipelines Azure Databricks activities to now use the latest jobs API enabling exciting workflow capabilities like executing DLT jobs.

Improved email notifications for Refresh failures

Shipped (Q3 2024)

Release Type: Public preview

Email notifications allow Dataflow Gen2 creators to monitor the results (success/failure) of a dataflow’s refresh operation.

Copy Job

Shipped (Q3 2024)

Release Type: Public preview

Copy Job simplifies the experience for customers who need to ingest data, without having to create a Dataflow or Data pipeline. Copy Job supports full and incremental copy from any data sources to any data destinations. Sign-up for Private Preview now.

Copilot for Data Factory (Dataflow)

Shipped (Q3 2024)

Release Type: General availability

Copilot for Data Factory (Dataflow) empowers customers to express their requirements using natural language when creating data integration solutions with Dataflows Gen2.

Staging defaults for Dataflow Gen 2 Output destination

Shipped (Q2 2024)

Release Type: Public preview

Dataflow Gen2 provides capabilities to ingest data from a wide range of data sources into the Fabric OneLake. Upon staging this data, it can be transformed at high-scale leveraging the High-Scale Dataflows Gen2 engine (based on Fabric Lakehouse/Warehouse SQL compute).

The default behavior for Dataflows Gen2 is to stage data in OneLake to enable high-scale data transformations. While this works great for high-scale scenarios, it does not work as well for scenarios involving small amounts of data being ingested given that it introduces an extra hop (staging) for data before it is ultimately loaded into the dataflow output destination.

With the planned enhancements, we’re fine tuning the default Staging behavior to be disabled, for queries with an output destination that doesn’t require staging (namely, Fabric Lakehouse and Azure SQL Database).

Staging behavior can be manually configured on a per-query basis via the Query Settings pane or the query contextual menu in the Queries pane.

Data pipeline support for Event-Driven Triggers

Shipped (Q2 2024)

Release Type: Public preview

A common use case for invoking Data Factory data pipelines is to trigger the pipeline upon file events like file arrival and file delete. For customers coming from ADF or Synapse to Fabric, using ADLS/Blog storage events is very common as a way to either signal for a new pipeline execution or to capture the names of the files created. Triggers in Fabric Data Factory leverage Fabric platform capabilities including EventStreams and Reflex triggers. Inside of the Fabric Data Factory pipeline design canvas, you will have a Trigger button that you can press to create a Reflex trigger for your pipeline or you can create the trigger directly from the Data Activator experience.

Data pipeline support for SparkJobDefinition

Shipped (Q2 2024)

Release Type: General availability

Now you can execute your Spark code, including JAR files, directly from a pipeline activity. Just point to your Spark code and the pipeline will execute the job on your Spark cluster in Fabric. This new activity enables exciting data workflow patterns that leverages the power of Fabric's Spark engine while including the Data Factory control flow and data flow capabilities in the same pipeline as your Spark Jobs.

Data pipeline support for Azure HDInsight

Shipped (Q2 2024)

Release Type: General availability

HDInsight is the Azure PaaS service for Hadoop that enables developers to build very powerful big data solutions in the cloud. The new HDI pipeline activity will enable HDInsights job activities inside of your Data Factory data pipelines similar to the existing funcationality that you've enhoyed for years in ADF and Synapse pipelines. We've now brought this capability directly into Fabric data pipelines.

New connectors for Copy Activity

Shipped (Q2 2024)

Release Type: Public preview

New connectors will be added for Copy activity to empower customer to ingest from the following sources, while leveraging data pipeline: Oracle, MySQL, Azure AI Search, Azure Files, Dynamics AX, Google BigQuery.

Apache Airflow job: Build data pipelines powered by Apache Airflow

Shipped (Q2 2024)

Release Type: Public preview

Apache Airflow job (earlier referred to as Data workflows) are powered by Apache Airflow and offer an integrated Apache Airflow runtime environment, enabling you to author, execute, and schedule Python DAGs with ease.

Data source identity management (SPN)

Shipped (Q2 2024)

Release Type: General availability

Service principal - To access resources that are secured by an Azure AD tenant, the entity that requires access must be represented by a security principal. You'll be able to connect to your data sources with the service principal.

Get data experience improvements(Browse Azure Resources)

Shipped (Q1 2024)

Release Type: Public preview

Browsing Azure resources provides seamless navigation to browse Azure resources. You can easily navigate your Azure subscriptions and connect to your data sources through an intuitive user interface. It helps you quickly find and connect to the data you need.

On-premises data gateway (OPDG) support added to data pipelines

Shipped (Q1 2024)

Release Type: Public preview

This feature enables data pipelines to use Fabric data gateways to access data that is on-premises and behind a virtual network. For users using self-hosted integration runtimes (SHIR), they'll be able to move to on-premises data gateways in Fabric.

Data Factory Git integration for data pipelines

Shipped (Q1 2024)

Release Type: Public preview

You can connect to your Git repository to develop data pipelines in a collaborative way. The integration of data pipelines with the Fabric platform's Application Lifecycle Management (ALM) capability enables version control, branching, commits, and pull requests.

Enhancements to output destinations in Dataflow Gen2 (query schema)

Shipped (Q1 2024)

Release Type: Public preview

We're enhancing the output destinations in Dataflow Gen2 with the following highly requested capabilities:

  • Ability to handle query schema changes after configuring an output destination.
  • Default destination settings to accelerate dataflows creation.

To learn more, see Dataflow Gen2 data destinations and managed settings

Fast Copy support in Dataflow Gen2

Shipped (Q1 2024)

Release Type: Public preview

We're adding support for large-scale data ingestion directly within the Dataflow Gen2 experience, utilizing the pipelines Copy Activity capability. This supports sources such Azure SQL Databases, CSV, and Parquet files in Azure Data Lake Storage and Blob Storage.

This enhancement significantly scales up the data processing capacity of Dataflow Gen2 providing high-scale ELT (Extract-Load-Transform) capabilities.

Cancel refresh support in Dataflow Gen2

Shipped (Q4 2023)

Release Type: Public preview

We're adding support to cancel ongoing Dataflow Gen2 refreshes from the workspace items view.