Lakehouse and Delta Lake tables

Статия
11/15/2023

Microsoft Fabric Lakehouse is a data architecture platform for storing, managing, and analyzing structured and unstructured data in a single location. In order to achieve seamless data access across all compute engines in Microsoft Fabric, Delta Lake is chosen as the unified table format.

Saving data in the Lakehouse using capabilities such as Load to Tables or methods described in Options to get data into the Fabric Lakehouse, all data is saved in Delta format.

For a more comprehensive introduction to the Delta Lake table format, follow links in the Next steps section.

Big data, Apache Spark and legacy table formats

Microsoft Fabric Runtime for Apache Spark uses the same foundation as Azure Synapse Analytics Runtime for Apache Spark, but contain key differences to provide a more streamlined behavior across all engines in the Microsoft Fabric service. In Microsoft Fabric, key performance features are turned on by default. Advanced Apache Spark users may revert configurations to previous values to better align with specific scenarios.

Microsoft Fabric Lakehouse and the Apache Spark engine support all table types, both managed and unmanaged; this includes views and regular non-Delta Hive table formats. Tables defined using PARQUET, CSV, AVRO, JSON, and any Apache Hive compatible file format work as expected.

The Lakehouse explorer user interface experience varies depending on table type. Currently, the Lakehouse explorer only renders table objects.

Configuration differences with Azure Synapse Analytics

The following table contains the configuration differences between Azure Synapse Analytics and Microsoft Fabric Runtime for Apache Spark.

Apache Spark configuration	Microsoft Fabric value	Azure Synapse Analytics value	Notes
spark.sql.sources.default	delta	parquet	Default table format
spark.sql.parquet.vorder.enabled	true	N/A	V-Order writer
spark.sql.parquet.vorder.dictionaryPageSize	2 GB	N/A	Dictionary page size limit for V-Order
spark.microsoft.delta.optimizeWrite.enabled	true	unset (false)	Optimize Write

Auto discovery of tables

The Lakehouse explorer provides a tree-like view of the objects in the Microsoft Fabric Lakehouse item. It has a key capability of discovering and displaying tables that are described in the metadata repository and in OneLake storage. The table references are displayed under the Tables section of the Lakehouse explorer user interface. Auto discovery also applies to tables defined over OneLake shortcuts.

Tables over shortcuts

Microsoft Fabric Lakehouse supports tables defined over OneLake shortcuts, to provide utmost compatibility and no data movement. The following table contains the scenario best-practices for each item type when using it over shortcuts.

Shortcut destination	Where to create the shortcut	Best practice
Delta Lake table	`Tables` section	If multiple tables are present in the destination, create one shortcut per table.
Folders with files	`Files` section	Use Apache Spark to use the destination directly using relative paths. Load the data into Lakehouse native Delta tables for maximum performance.
Legacy Apache Hive tables	`Files` section	Use Apache Spark to use the destination directly using relative paths, or create a metadata catalog reference using `CREATE EXTERNAL TABLE` syntax. Load the data into Lakehouse native Delta tables for maximum performance.

Load to Tables

Microsoft Fabric Lakehouse provides a convenient and productive user interface to streamline loading data into Delta tables. The Load to Tables feature allows a visual experiences to load common file formats to Delta to boost analytical productivity to all personas. To learn more about the Load to Tables feature in details, read the Lakehouse Load to Tables reference documentation.

Delta Lake table optimization

Keeping tables in shape for the broad scope of analytics scenarios is no minor feat. Microsoft Fabric Lakehouse pro-actively enables the important parameters to minimize common problems associated with big data tables, such as compaction and small file sizes, and to maximize query performance. Still, there are many scenarios where those parameters need changes. The Delta Lake table optimization and V-Order article covers some key scenarios and provides an in-depth guide on how to efficiently maintain Delta tables for maximum performance.

Споделяне чрез

Lakehouse and Delta Lake tables

Big data, Apache Spark and legacy table formats

Configuration differences with Azure Synapse Analytics

Auto discovery of tables

Tables over shortcuts

Load to Tables

Delta Lake table optimization

Обратна връзка

Допълнителни ресурси

Споделяне чрез

Lakehouse and Delta Lake tables

Big data, Apache Spark and legacy table formats

Configuration differences with Azure Synapse Analytics

Auto discovery of tables

Tables over shortcuts

Load to Tables

Delta Lake table optimization

Related content

Обратна връзка

Допълнителни ресурси