Редактиране

Споделяне чрез


Use table maintenance feature to manage delta tables in Fabric

The Lakehouse in Microsoft Fabric provides the Table maintenance feature to efficiently manage delta tables and to keep them always ready for analytics. This guide describes the table maintenance feature in Lakehouse and its capabilities.

Key capabilities of the lakehouse table maintenance feature:

  • Perform ad-hoc table maintenance using contextual right-click actions in a delta table within the Lakehouse explorer.
  • Apply bin-compaction, V-Order, and unreferenced old files cleanup.

Note

For advanced maintenance tasks, such as grouping multiple table maintenance commands, orchestrating it based on a schedule, a code-centric approach is the recommended choice. To learn more, see Delta Lake table optimization and V-Order article. It is also possible to use the Lakehouse API to automate table maintenance operations, to learn more see Manage the Lakehouse with Microsoft Fabric REST API.

Supported file types

Lakehouse table maintenance applies only to delta Lake tables. The legacy Hive tables that use PARQUET, ORC, AVRO, CSV, and other formats aren't supported.

Table maintenance operations

The table maintenance feature offers three operations.

  • Optimize: Consolidates multiple small Parquet files into large file. Big Data processing engines, and all Fabric engines, benefit from having larger files sizes. Having files of size above 128 MB, and optimally close to 1 GB, improves compression and data distribution, across the cluster nodes. It reduces the need to scan numerous small files for efficient read operations. It's a general best practice to run optimization strategies after loading large tables.
  • V-Order: Applies optimized sorting, encoding, and compression to Delta parquet files to enable fast read operations across all the Fabric engines. V-Order happens during the optimize command, and is presented as an option to the command group in the user experience. To learn more about V-Order, see Delta Lake table optimization and V-Order.
  • Vacuum: Removes old files no longer referenced by a Delta table log. Files need to be older than the retention threshold, and the default file retention threshold is seven days. All the delta tables in OneLake have the same retention period. File retention period is same regardless of the Fabric compute engine you are using. This maintenance is important to optimize storage cost. Setting a shorter retention period impacts Delta's time travel capabilities. It's a general best practice to set a retention interval to at least seven days, because old snapshots and uncommitted files can still be in use by the concurrent table readers and writers. Cleaning up active files with the VACUUM command might lead to reader failures or even table corruption if the uncommitted files are removed.

Execute ad-hoc table maintenance on a Delta table using Lakehouse

How to use the feature:

  1. From your Microsoft Fabric account, navigate to the desired Lakehouse.

  2. From the Lakehouse explorer's Tables section, either right-click on the table or use the ellipsis to access the contextual menu.

  3. Select the Maintenance menu entry.

  4. Check the maintenance options in the dialog per your requirement. For more information, see the Table maintenance operations section of this article.

  5. Select Run now to execute the table maintenance job.

  6. Track maintenance job execution by the notifications pane, or the Monitoring Hub.

    Screenshot showing the load to tables dialog box with filled table name.

How does table maintenance work?

After Run now is selected, a Spark maintenance job is submitted for execution.

  1. The Spark job is submitted using the user identity and table privileges.
  2. The Spark job consumes Fabric capacity of the workspace/user that submitted the job.
  3. If there is another maintenance job running on a table, a new one is rejected.
  4. Jobs on different tables can execute in parallel.
  5. Table maintenance jobs can be easily tracked in the Monitoring Hub. Look for "TableMaintenance" text within the activity name column in the monitoring hub main page.