What's new and planned for Data Factory in Microsoft Fabric

Important

The release plans describe functionality that may or may not have been released yet. The delivery timelines and projected functionality may change or may not ship. Refer to Microsoft policy for more information.

Data Factory in Microsoft Fabric combines citizen data integration and pro data integration capabilities into a single, modern data integration experience. It provides connectivity to more than 100 relational and nonrelational databases, lakehouses, data warehouses, generic interfaces like REST APIs, OData, and more.

Dataflows: Dataflow Gen2 enables you to perform large-scale data transformations, and supports various output destinations that write to Azure SQL Database, Lakehouse, Data Warehouse, and more. The dataflows editor offers more than 300 transformations, including AI-based options, and lets you transform data easily with better flexibility than any other tool. Whether you're extracting data from an unstructured data source such as a web page or reshaping an existing table in the Power Query editor, you can easily apply Power Query's Data Extraction By Example, that uses artificial intelligence (AI) and simplifies the process.

Data pipelines: Data pipelines offer the capability to create versatile data orchestration workflows that bring together tasks like data extraction, loading into preferred data stores, notebook execution, SQL script execution, and more. You can quickly build powerful metadata-driven data pipelines that automate repetitive tasks. For example, loading and extracting data from different tables in a database, iterating through multiple containers in Azure Blob Storage, and more. Furthermore, with data pipelines, you can access the data from Microsoft 365, using the Microsoft Graph Data Connection (MGDC) connector.

To learn more, see the documentation.

Investment areas

Over the next few months, Data Factory in Microsoft Fabric will expand its connectivity options and continue to add to the rich library of transformations and data pipeline activities. Moreover, it enables you to perform real-time, high-performance data replication from operational databases, and bring this data into the lake for analytics.

Feature Estimated release timeline
Data source identity management (SPN) Q2 2024
Data pipeline support for SparkJobDefinition Q2 2024
Data pipeline support for Azure HDinsight Q2 2024
Support for invoking cross-workspace data pipelines Q2 2024
Data pipeline support for Event-Driven Triggers Q2 2024
New connectors for Copy Activity Q2 2024
Data workflows: Build data pipelines powered by Apache Airflow Q2 2024
Copilot for Data Factory (Dataflow) Q3 2024
Staging defaults for Dataflow Gen 2 Output destination Q3 2024
Incremental refresh support in Dataflow Gen2 Q3 2024
Data pipeline support for DBT CLI Q3 2024
Data pipeline support for Azure Databricks Jobs Q3 2024
Copy Job Q3 2024
Copilot for Data Factory (Data pipeline) Q3 2024
Improved email notifications for Refresh failures Q3 2024
Dataflows Gen 2 Partition-based Parallel ingestion Q3 2024
Data source identity management (Managed Identity) Q3 2024
Data source identity management (Azure Key Vault) Q3 2024
Enabling customers to parameterize their connections Q4 2024
Cancel refresh support in Dataflow Gen2 Shipped (Q4 2023)
Get data experience improvements(Browse Azure Resources) Shipped (Q1 2024)
On-premises data gateway (OPDG) support added to data pipelines Shipped (Q1 2024)
Fast Copy support in Dataflow Gen2 Shipped (Q1 2024)
Data Factory Git integration for data pipelines Shipped (Q1 2024)
Enhancements to output destinations in Dataflow Gen2 (query schema) Shipped (Q1 2024)

Data source identity management (SPN)

Estimated release timeline: Q2 2024

Release Type: General availability

Service principal - To access resources that are secured by an Azure AD tenant, the entity that requires access must be represented by a security principal. You'll be able to connect to your data sources with the service principal.

Data pipeline support for SparkJobDefinition

Estimated release timeline: Q2 2024

Release Type: General availability

Now you can execute your Spark code, including JAR files, directly from a pipeline activity. Just point to your Spark code and the pipeline will execute the job on your Spark cluster in Fabric. This new activity enables exciting data workflow patterns that leverages the power of Fabric's Spark engine while including the Data Factory control flow and data flow capabilities in the same pipeline as your Spark Jobs.

Data pipeline support for Azure HDinsight

Estimated release timeline: Q2 2024

Release Type: General availability

HD Insight is the Azure PaaS service for Hadoop that enables developers to build very powerful big data solutions in the cloud. The new HDI pipeline activity will enable HDInsights job activities inside of your Data Factory data pipelines similar to the existing funcationality that you've enhoyed for years in ADF and Synapse pipelines. We've now brought this capability directly into Fabric data pipelines.

Support for invoking cross-workspace data pipelines

Estimated release timeline: Q2 2024

Release Type: Public preview

Invoke Pipelines activity update: We are enabling some new and exciting updates to the Invoke Pipeline activity. In response to overwhelming customer and community requests, we are enabling running data pipelines across workspaces. You will now be able to invoke pipelines from other workspaces that you have access to execute. This will enable very exciting data workflow patterns that can utilize collaboration from your data engineering and integration teams across workspaces and across functional teams.

Data pipeline support for Event-Driven Triggers

Estimated release timeline: Q2 2024

Release Type: Public preview

A common use case for invoking Data Factory data pipelines is to trigger the pipeline upon file events like file arrival and file delete. For customers coming from ADF or Synapse to Fabric, using ADLS/Blog storage events is very common as a way to either signal for a new pipeline execution or to capture the names of the files created. Triggers in Fabric Data Factory leverage Fabric platform capabilities including EventStreams and Reflex triggers. Inside of the Fabric Data Factory pipeline design canvas, you will have a Trigger button that you can press to create a Reflex trigger for your pipeline or you can create the trigger directly from the Data Activator experience.

New connectors for Copy Activity

Estimated release timeline: Q2 2024

Release Type: Public preview

New connectors will be added for Copy activity to empower customer to ingest from the following sources, while leveraging data pipeline: Oracle, MySQL, Azure MySQL Database, Azure AI Search, Azure Files, Dynamics AX, Azure Files, Google BigQuery.

Data workflows: Build data pipelines powered by Apache Airflow

Estimated release timeline: Q2 2024

Release Type: Public preview

Data workflows are powered by Apache Airflow and offer an integrated Apache Airflow runtime environment, enabling you to author, execute, and schedule Python DAGs with ease.

Copilot for Data Factory (Dataflow)

Estimated release timeline: Q3 2024

Release Type: General availability

Copilot for Data Factory (Dataflow) empowers customers to express their requirements using natural language when creating data integration solutions with Dataflows Gen2.

Staging defaults for Dataflow Gen 2 Output destination

Estimated release timeline: Q3 2024

Release Type: Public preview

Dataflow Gen2 provides capabilities to ingest data from a wide range of data sources into the Fabric OneLake. Upon staging this data, it can be transformed at high-scale leveraging the High-Scale Dataflows Gen2 engine (based on Fabric Lakehouse/Warehouse SQL compute).

The default behavior for Dataflows Gen2 is to stage data in OneLake to enable high-scale data transformations. While this works great for high-scale scenarios, it does not work as well for scenarios involving small amounts of data being ingested given that it introduces an extra hop (staging) for data before it is ultimately loaded into the dataflow output destination.

With the planned enhancements, we’re fine tuning the default Staging behavior to be disabled, for queries with an output destination that doesn’t require staging (namely, Fabric Lakehouse and Azure SQL Database).

Staging behavior can be manually configured on a per-query basis via the Query Settings pane or the query contextual menu in the Queries pane.

Incremental refresh support in Dataflow Gen2

Estimated release timeline: Q3 2024

Release Type: Public preview

We're adding incremental refresh support in Dataflow Gen2. This feature enables you to incrementally extract data from data sources, apply Power Query transformations, and load into various output destinations.

Data pipeline support for DBT CLI

Estimated release timeline: Q3 2024

Release Type: Public preview

DBT CLI Orchestration (Data Build Tool): Incorporates the data build tool (dbt) for data transformation workflows.

Data pipeline support for Azure Databricks Jobs

Estimated release timeline: Q3 2024

Release Type: Public preview

We are updating the Data Factory data pipelines Azure Databricks activities to now use the latest jobs API enabling exciting workflow capabilities like executing DLT jobs.

Copy Job

Estimated release timeline: Q3 2024

Release Type: Public preview

Copy Job simplifies the experience for customers who need to ingest data, without having to create a Dataflow or Data pipeline. Copy Job supports full and incremental copy from any data sources to any data destinations.

Copilot for Data Factory (Data pipeline)

Estimated release timeline: Q3 2024

Release Type: Public preview

Copilot for Data Factory (Data pipeline) empowers customers to build data pipelines using natural language and provides troubleshooting guidance.

Improved email notifications for Refresh failures

Estimated release timeline: Q3 2024

Release Type: Public preview

Email notifications allow Dataflow Gen2 creators to monitor the results (success/failure) of a dataflow’s refresh operation.

Dataflows Gen 2 Partition-based Parallel ingestion

Estimated release timeline: Q3 2024

Release Type: Public preview

Currently, Dataflows Gen2 containing queries against a data source that supports partitions will refresh the partitions within those queries sequentially. An example of this behavior is a query running against a Folder and ingesting all files within the folder (then, parsing them into tables, combining into a single table, etc.).

With the planned enhancements, we’re optimizing the orchestration of such queries so that processing each of the source partitions can be run in parallel. This optimization may bring significant reduction of the overall dataflow run durations.

Data source identity management (Managed Identity)

Estimated release timeline: Q3 2024

Release Type: Public preview

This enables Managed identity to be configured at a workspace level. You can use the Fabric managed identities to connect to your data source securely.

Data source identity management (Azure Key Vault)

Estimated release timeline: Q3 2024

Release Type: Public preview

Support for Azure Key Vault - You can store your keys and secrets in Azure Key Vault and connect to it. This way, you can manage your keys in a single place.

Enabling customers to parameterize their connections

Estimated release timeline: Q4 2024

Release Type: Public preview

Connections provide a common framework for defining connectivity and authentication for your data stores. These connections can be shared across different items. With parameterization support, you'll be able to build complex and reusable pipelines, notebooks, dataflows, and other item types.

Shipped feature(s)

Cancel refresh support in Dataflow Gen2

Shipped (Q4 2023)

Release Type: Public preview

We're adding support to cancel ongoing Dataflow Gen2 refreshes from the workspace items view.

Get data experience improvements(Browse Azure Resources)

Shipped (Q1 2024)

Release Type: Public preview

Browsing Azure resources provides seamless navigation to browse Azure resources. You can easily navigate your Azure subscriptions and connect to your data sources through an intuitive user interface. It helps you quickly find and connect to the data you need.

On-premises data gateway (OPDG) support added to data pipelines

Shipped (Q1 2024)

Release Type: Public preview

This feature enables data pipelines to use Fabric data gateways to access data that is on-premises and behind a virtual network. For users using self-hosted integration runtimes (SHIR), they'll be able to move to on-premises data gateways in Fabric.

Fast Copy support in Dataflow Gen2

Shipped (Q1 2024)

Release Type: Public preview

We're adding support for large-scale data ingestion directly within the Dataflow Gen2 experience, utilizing the pipelines Copy Activity capability. This supports sources such Azure SQL Databases, CSV, and Parquet files in Azure Data Lake Storage and Blob Storage.

This enhancement significantly scales up the data processing capacity of Dataflow Gen2 providing high-scale ELT (Extract-Load-Transform) capabilities.

Data Factory Git integration for data pipelines

Shipped (Q1 2024)

Release Type: Public preview

You can connect to your Git repository to develop data pipelines in a collaborative way. The integration of data pipelines with the Fabric platform's Application Lifecycle Management (ALM) capability enables version control, branching, commits, and pull requests.

Enhancements to output destinations in Dataflow Gen2 (query schema)

Shipped (Q1 2024)

Release Type: Public preview

We're enhancing the output destinations in Dataflow Gen2 with the following highly requested capabilities:

  • Ability to handle query schema changes after configuring an output destination.
  • Default destination settings to accelerate dataflows creation.

To learn more, see Dataflow Gen2 data destinations and managed settings