Lakehouse and Delta Lake tables
Microsoft Fabric Lakehouse is a data architecture platform for storing, managing, and analyzing structured and unstructured data in a single location. In order to achieve seamless data access across all compute engines in Microsoft Fabric, Delta Lake is chosen as the unified table format.
Microsoft Fabric is in preview.
For a more comprehensive introduction to the Delta Lake table format, follow links in the Next steps section.
Big data, Apache Spark and legacy table formats
Microsoft Fabric Runtime for Apache Spark uses the same foundation as Azure Synapse Analytics Runtime for Apache Spark, but contain key differences to provide a more streamlined behavior across all engines in the Microsoft Fabric service. In Microsoft Fabric, key performance features are turned on by default. Advanced Apache Spark users may revert configurations to previous values to better align with specific scenarios.
Microsoft Fabric Lakehouse and the Apache Spark engine support all table types, both managed and unmanaged; this includes views and regular non-Delta Hive table formats. Tables defined using PARQUET, CSV, AVRO, JSON, and any Apache Hive compatible file format work as expected.
The Lakehouse explorer user interface experience varies depending on table type. Currently, the Lakehouse explorer only renders table objects.
Configuration differences with Azure Synapse Analytics
The following table contains the configuration differences between Azure Synapse Analytics and Microsoft Fabric Runtime for Apache Spark.
|Apache Spark configuration||Microsoft Fabric value||Azure Synapse Analytics value||Notes|
|spark.sql.sources.default||delta||parquet||Default table format|
|spark.sql.parquet.vorder.dictionaryPageSize||2 GB||N/A||Dictionary page size limit for V-Order|
|spark.microsoft.delta.optimizeWrite.enabled||true||unset (false)||Optimize Write|
Auto discovery of tables
The Lakehouse explorer provides a tree-like view of the objects in the Microsoft Fabric Lakehouse item. It has a key capability of discovering and displaying tables that are described in the metadata repository and in OneLake storage. The table references are displayed under the
Tables section of the Lakehouse explorer user interface. Auto discovery also applies to tables defined over OneLake shortcuts.
Tables over shortcuts
Microsoft Fabric Lakehouse supports tables defined over OneLake shortcuts, to provide utmost compatibility and no data movement. The following table contains the scenario best-practices for each item type when using it over shortcuts.
|Shortcut destination||Where to create the shortcut||Best practice|
|Delta Lake table||
||If multiple tables are present in the destination, create one shortcut per table.|
|Folders with files||
||Use Apache Spark to use the destination directly using relative paths. Load the data into Lakehouse native Delta tables for maximum performance.|
|Legacy Apache Hive tables||
||Use Apache Spark to use the destination directly using relative paths, or create a metadata catalog reference using
Load to Tables
Microsoft Fabric Lakehouse provides a convenient and productive user interface to streamline loading data into Delta tables. The Load to Tables feature allows a visual experiences to load common file formats to Delta to boost analytical productivity to all personas. To learn more about the Load to Tables feature in details, read the Lakehouse Load to Tables reference documentation.
Delta Lake table optimization
Keeping tables in shape for the broad scope of analytics scenarios is no minor feat. Microsoft Fabric Lakehouse pro-actively enables the important parameters to minimize common problems associated with big data tables, such as compaction and small file sizes, and to maximize query performance. Still, there are many scenarios where those parameters need changes. The Delta Lake table optimization and V-Order article covers some key scenarios and provides an in-depth guide on how to efficiently maintain Delta tables for maximum performance.