What's new and planned for OneLake in Microsoft Fabric
Important
The release plans describe functionality that may or may not have been released yet. The delivery timelines and projected functionality may change or may not ship. Refer to Microsoft policy for more information.
OneLake is a single, unified, logical data lake for your whole organization. Like OneDrive, OneLake comes automatically with every Microsoft Fabric tenant and is designed to be the single place for all your analytics data.
Any data in OneLake works with out-of-the-box governance such as data lineage, data protection, certification, catalog integration, etc. and is ultimately under the control of a tenant admin. Within a tenant, workspaces enable different parts of the organization to work independently while still contributing to the same data lake.
OneLake is open at every level. OneLake supports the same ADLS Gen2 APIs and SDKs to be compatible with existing ADLS Gen2 applications and can support any type of file, structured or unstructured.
OneLake aims to give you the most value possible out of a single copy of data. With OneLake shortcuts, you can unify your data across domains, clouds, and accounts by creating a reference to data stored in other file locations such as other OneLake locations or ADLS or S3 without data movement or duplication. You can also use the same data across multiple analytical engines because Fabric engines store all tabular data in the open parquet formats.There's no longer a need to copy data just to use it with another engine.
To learn more, see the documentation.
Investment areas
Feature | Estimated release timeline |
---|---|
Databricks Unity Catalog support for OneLake | Q3 2024 |
OneLake data access roles general availability | Q4 2024 |
OneLake security model | Q1 2025 |
OneLake shortcuts to on-premises data | Shipped (Q2 2024) |
Shortcuts Google cloud storage | Shipped (Q1 2024) |
Shortcuts API | Shipped (Q1 2024) |
Smart caching for Amazon S3 shortcuts | Shipped (Q4 2023) |
Databricks Unity Catalog support for OneLake
Estimated release timeline: Q3 2024
Release Type: Public preview
Azure Databricks Unity Catalog Integration with Microsoft Fabric
You will be able to access Azure Databricks Unity Catalog tables directly in Microsoft Fabric, making it even easier to unify Azure Databricks with Microsoft Fabric. From the Fabric portal, you can create and configure a new Azure Databricks Unity Catalog item in Fabric with just a few clicks. You can add a full catalog, a schema, or even individual tables to link and the management of this Azure Databricks item in OneLake—a shortcut connected to Unity Catalog—is automatically taken care of for you. This data acts like any other data in OneLake—you can write SQL queries or use it with any other workloads in Fabric including Power BI via Direct Lake mode. When the data is modified or tables are added, removed, or renamed in Azure Databricks, the data in Fabric will remain always in sync. This new integration makes it simple to unify Azure Databricks data in Fabric and seamlessly use it across every Fabric workload.
Federate OneLake as a Remote Catalog in Azure Databricks
Microsoft Fabric users will be able to access Fabric data items like lakehouses as a catalog in Azure Databricks. While the data remains in OneLake, you can access and view data lineage and other metadata in Azure Databricks, and leverage the full power of Unity Catalog.
OneLake data access roles general availability
Estimated release timeline: Q4 2024
Release Type: General availability
OneLake data access roles build upon the existing capabilities of OneLake’s security model to increase the granularity at which security can be applied within a Fabric data item. This feature adds an inheritable RBAC (role-based access control) model that simplifies user and permissions management for data in OneLake. You can define security roles that grant read access to specific folders in OneLake, and assign them to users or groups. The access permissions determine what folders users see when accessing the lake view of the data, either through the lakehouse UX, notebooks, or OneLake APIs.
OneLake security model
Estimated release timeline: Q1 2025
Release Type: Public preview
Managing data security across multiple analytical engines and copies of data is challenging. OneLake and Fabric simplify this by enabling the use of a single data copy across multiple analytical engines without any data movement or duplication. Taking the "one copy" concept further, OneLake is also enhancing security with a finer-grain model, allowing for table and folder access in addition to row and column level security. These security definitions live with the data and travel across shortcuts to wherever the data is used. Security defined at OneLake is universally enforced no matter which analytical engine is used to access the data.
Shipped feature(s)
OneLake shortcuts to on-premises data
Shipped (Q2 2024)
Release Type: Public preview
Microsoft OneLake shortcuts is expanding to include on-premises and network-restricted data sources. With this capability, you can unify your on-prem and cloud data in OneLake.
During creation of shortcuts to AWS S3, Google Cloud Storage, or S3 compatible buckets, you will be able to optionally select an on-premises data gateway (OPDG) to establish connectivity.
Shortcuts Google cloud storage
Shipped (Q1 2024)
Release Type: Public preview
OneLake will expand shortcut support to Google Cloud Storage, allowing virtualization of data without moving or duplicating it. This enables the integration of Google Cloud Storage data with Microsoft services like ADLS gen2, OneLake, Dataverse, and Amazon S3. The data will appear and work as if it was in OneLake and gives you a simple data lake that can span clouds.
Shortcuts API
Shipped (Q1 2024)
Release Type: Public preview
A public REST API to automate creation and management of shortcut scenarios.
Smart caching for Amazon S3 shortcuts
Shipped (Q4 2023)
Release Type: Public preview
Smart caching for Amazon S3 shortcuts reduces egress costs and enhances performance by bringing data closer to the compute engine. Smart caching will egress data from S3 once, cache it locally in OneLake for a certain period. This way it eliminates the need for repeated data retrieval from S3. The cached data can be reused across multiple users, analytical engines, and scenarios that optimize the value a single egress.