What's new and planned for Data Factory in Microsoft Fabric

Important

The release plans describe functionality that may or may not have been released yet. The delivery timelines and projected functionality may change or may not ship. Refer to Microsoft policy for more information.

Data Factory in Microsoft Fabric combines citizen data integration and pro data integration capabilities into a single, modern data integration experience. It provides connectivity to more than 100 relational and nonrelational databases, lakehouses, data warehouses, generic interfaces like REST APIs, OData, and more.

Dataflows: Dataflow Gen2 enables you to perform large-scale data transformations, and supports various output destinations that write to Azure SQL Database, Lakehouse, Data Warehouse, and more. The dataflows editor offers more than 300 transformations, including AI-based options, and lets you transform data easily with better flexibility than any other tool. Whether you're extracting data from an unstructured data source such as a web page or reshaping an existing table in the Power Query editor, you can easily apply Power Query's Data Extraction By Example, that uses artificial intelligence (AI) and simplifies the process.

Data pipelines: Data pipelines offer the capability to create versatile data orchestration workflows that bring together tasks like data extraction, loading into preferred data stores, notebook execution, SQL script execution, and more. You can quickly build powerful metadata-driven data pipelines that automate repetitive tasks. For example, loading and extracting data from different tables in a database, iterating through multiple containers in Azure Blob Storage, and more. Furthermore, with data pipelines, you can access the data from Microsoft 365, using the Microsoft Graph Data Connection (MGDC) connector.

To learn more, see the documentation and visit our announcement blog.

Investment areas

Over the next few months, Data Factory in Microsoft Fabric expands its connectivity options and continue to add to the rich library of transformations and data pipeline activities. Moreover, it enables you to perform real-time, high-performance data replication from operational databases, and bring this data into the lake for analytics.

Feature Estimated release timeline
Data Factory Git integration for data pipelines Shipped
Enhancements to output destinations in Dataflow Gen2 Shipped
Upsert operation support for data destinations in Dataflow Gen2 Q3 2024
Incremental refresh support in Dataflow Gen2 Q2 2024
Fast copy support in Dataflow Gen2 Shipped
Cancel refresh support in Dataflow Gen2 Shipped
Data source identity management Q4 2024
Get data experience improvements (Browse Azure resources) Shipped
Get data experience improvements (Browse SharePoint and OneDrive for Business) Q3 2024
On-premises data gateway (OPDG) support added to data pipelines Shipped
Copilot in Data Factory Shipped
Enabling customers to parameterize their connections Q4 2024
Data Factory Git integration for dataflows Q4 2024

Data Factory Git integration for data pipelines

Estimated release timeline: Shipped

You can connect to your Git repository to develop data pipelines in a collaborative way. The integration of data pipelines with the Fabric platform's Application Lifecycle Management (ALM) capability enables version control, branching, commits, and pull requests.

Enhancements to output destinations in Dataflow Gen2

Estimated release timeline: Shipped

We're enhancing the output destinations in Dataflow Gen2 with the following highly requested capabilities:

  • Ability to handle query schema changes after configuring an output destination.
  • Default destination settings to accelerate dataflows creation.

To learn more, see Dataflow Gen2 data destinations and managed settings

Upsert operation support for data destinations in Dataflow Gen2

Estimated release timeline: Q3 2024

Support for merging data in data destinations during subsequent refreshes (also known as Upsert).

Incremental refresh support in Dataflow Gen2

Estimated release timeline: Q2 2024

We're adding incremental refresh support in Dataflow Gen2. This feature enables you to incrementally extract data from data sources, apply Power Query transformations, and load into various output destinations.

Fast Copy support in Dataflow Gen2

Estimated release timeline: Shipped

We're adding support for large-scale data ingestion directly within the Dataflow Gen2 experience, utilizing the pipelines Copy Activity capability. This supports sources such Azure SQL Databases, CSV, and Parquet files in Azure Data Lake Storage and Blob Storage.

This enhancement significantly scales up the data processing capacity of Dataflow Gen2 providing high-scale ELT (Extract-Load-Transform) capabilities.

Cancel refresh support in Dataflow Gen2

Estimated release timeline: Shipped

We're adding support to cancel ongoing Dataflow Gen2 refreshes from the workspace items view.

Data source identity management

Estimated release timeline: Q4 2024

  • Support for Azure Key Vault - You can store your keys and secrets in Azure Key Vault and connect to it. This way, you can manage your keys in a single place.

  • Managed identity - Managed identity can be configured at a workspace level. You can use the Fabric managed identities to connect to your data source securely.

  • Service principal - To access resources that are secured by an Azure AD tenant, the entity that requires access must be represented by a security principal. You'll be able to connect to your data sources with the service principal.

Get data experience improvements (Browse Azure Resources)

Estimated release timeline: Shipped

Browsing Azure resources provides seamless navigation to browse Azure resources. You can easily navigate your Azure subscriptions and connect to your data sources through an intuitive user interface. It helps you quickly find and connect to the data you need.

Get data experience improvements (Browse Onedrive for Business & SharePoint Resources)

Estimated release timeline: Q3 2024

Browsing OneDrive for business and Sharepoint online will allow you to easily navigate through files, folders, lists, and connect to your data sources through an intuitive user interface. It helps you to quickly find and connect to the data you need.

On-premises data gateway (OPDG) support added to data pipelines

Estimated release timeline: Shipped

This feature enables data pipelines to use Fabric data gateways to access data that is on-premises and behind a virtual network. For users using self-hosted integration runtimes (SHIR), they'll be able to move to on-premises data gateways in Fabric.

Copilot in Data Factory

Estimated release timeline: Shipped

Copilot for Data Factory empowers customers to express their requirements using natural language when creating data integration solutions. Currently, you can utilize Copilot for Data Factory in Dataflows Gen2. In the future (Q2 CY2024), we'll also introduce Copilot for Data Factory in Data Pipelines.

Further reading: Official documentation for Copilot in Data Factory.

Enabling customers to parameterize their connections

Estimated release timeline: Q4 2024

Connections provide a common framework for defining connectivity and authentication for your data stores. These connections can be shared across different items. With parameterization support, you'll be able to build complex and reusable pipelines, notebooks, dataflows, and other item types.

Data Factory Git integration for dataflows

Estimated release timeline: Q4 2024

You can connect to a Git repository and develop your dataflows. This capability enables integration with version control, and offers commits and pull requests.