What's new and planned for Data Factory in Microsoft Fabric
Important
The release plans describe functionality that may or may not have been released yet. The delivery timelines and projected functionality may change or may not ship. Refer to Microsoft policy for more information.
Data Factory in Microsoft Fabric combines citizen data integration and pro data integration capabilities into a single, modern data integration experience. It provides connectivity to more than 100 relational and nonrelational databases, lakehouses, data warehouses, generic interfaces like REST APIs, OData, and more.
Dataflows: Dataflow Gen2 enables you to perform large-scale data transformations, and supports various output destinations that write to Azure SQL Database, Lakehouse, Data Warehouse, and more. The dataflows editor offers more than 300 transformations, including AI-based options, and lets you transform data easily with better flexibility than any other tool. Whether you're extracting data from an unstructured data source such as a web page or reshaping an existing table in the Power Query editor, you can easily apply Power Query's Data Extraction By Example, that uses artificial intelligence (AI) and simplifies the process.
Data pipelines: Data pipelines offer the capability to create versatile data orchestration workflows that bring together tasks like data extraction, loading into preferred data stores, notebook execution, SQL script execution, and more. You can quickly build powerful metadata-driven data pipelines that automate repetitive tasks. For example, loading and extracting data from different tables in a database, iterating through multiple containers in Azure Blob Storage, and more. Furthermore, with data pipelines, you can access the data from Microsoft 365, using the Microsoft Graph Data Connection (MGDC) connector.
To learn more, see the documentation and visit our announcement blog.
Investment areas
Over the next few months, Data Factory in Microsoft Fabric expands its connectivity options and continue to add to the rich library of transformations and data pipeline activities. Moreover, it enables you to perform real-time, high-performance data replication from operational databases, and bring this data into the lake for analytics.
Data Factory Git integration for data pipelines
Estimated release timeline: Shipped
You can connect to your Git repository to develop data pipelines in a collaborative way. The integration of data pipelines with the Fabric platform's Application Lifecycle Management (ALM) capability enables version control, branching, commits, and pull requests.
Enhancements to output destinations in Dataflow Gen2
Estimated release timeline: Shipped
We're enhancing the output destinations in Dataflow Gen2 with the following highly requested capabilities:
- Ability to handle query schema changes after configuring an output destination.
- Default destination settings to accelerate dataflows creation.
To learn more, see Dataflow Gen2 data destinations and managed settings
Upsert operation support for data destinations in Dataflow Gen2
Estimated release timeline: Q3 2024
Support for merging data in data destinations during subsequent refreshes (also known as Upsert).
Incremental refresh support in Dataflow Gen2
Estimated release timeline: Q2 2024
We're adding incremental refresh support in Dataflow Gen2. This feature enables you to incrementally extract data from data sources, apply Power Query transformations, and load into various output destinations.
Fast Copy support in Dataflow Gen2
Estimated release timeline: Shipped
We're adding support for large-scale data ingestion directly within the Dataflow Gen2 experience, utilizing the pipelines Copy Activity capability. This supports sources such Azure SQL Databases, CSV, and Parquet files in Azure Data Lake Storage and Blob Storage.
This enhancement significantly scales up the data processing capacity of Dataflow Gen2 providing high-scale ELT (Extract-Load-Transform) capabilities.
Cancel refresh support in Dataflow Gen2
Estimated release timeline: Shipped
We're adding support to cancel ongoing Dataflow Gen2 refreshes from the workspace items view.
Data source identity management
Estimated release timeline: Q4 2024
Support for Azure Key Vault - You can store your keys and secrets in Azure Key Vault and connect to it. This way, you can manage your keys in a single place.
Managed identity - Managed identity can be configured at a workspace level. You can use the Fabric managed identities to connect to your data source securely.
Service principal - To access resources that are secured by an Azure AD tenant, the entity that requires access must be represented by a security principal. You'll be able to connect to your data sources with the service principal.
Get data experience improvements (Browse Azure Resources)
Estimated release timeline: Shipped
Browsing Azure resources provides seamless navigation to browse Azure resources. You can easily navigate your Azure subscriptions and connect to your data sources through an intuitive user interface. It helps you quickly find and connect to the data you need.
Get data experience improvements (Browse Onedrive for Business & SharePoint Resources)
Estimated release timeline: Q3 2024
Browsing OneDrive for business and Sharepoint online will allow you to easily navigate through files, folders, lists, and connect to your data sources through an intuitive user interface. It helps you to quickly find and connect to the data you need.
On-premises data gateway (OPDG) support added to data pipelines
Estimated release timeline: Shipped
This feature enables data pipelines to use Fabric data gateways to access data that is on-premises and behind a virtual network. For users using self-hosted integration runtimes (SHIR), they'll be able to move to on-premises data gateways in Fabric.
Copilot in Data Factory
Estimated release timeline: Shipped
Copilot for Data Factory empowers customers to express their requirements using natural language when creating data integration solutions. Currently, you can utilize Copilot for Data Factory in Dataflows Gen2. In the future (Q2 CY2024), we'll also introduce Copilot for Data Factory in Data Pipelines.
Further reading: Official documentation for Copilot in Data Factory.
Enabling customers to parameterize their connections
Estimated release timeline: Q4 2024
Connections provide a common framework for defining connectivity and authentication for your data stores. These connections can be shared across different items. With parameterization support, you'll be able to build complex and reusable pipelines, notebooks, dataflows, and other item types.
Data Factory Git integration for dataflows
Estimated release timeline: Q4 2024
You can connect to a Git repository and develop your dataflows. This capability enables integration with version control, and offers commits and pull requests.