Predictive optimization for Delta Lake


This feature is in Public Preview.

Predictive optimization removes the need to manually manage maintenance operations for Delta tables on Azure Databricks.

With predictive optimization enabled, Azure Databricks automatically identifies tables that would benefit from maintenance operations and runs them for the user. Maintenance operations are only run as necessary, eliminating both unnecessary runs for maintenance operations and the burden associated with tracking and troubleshooting performance.

What operations does predictive optimization run?

Predictive optimization runs the following operations automatically for enabled Delta tables:

Operation Description
OPTIMIZE Improves query performance by optimizing file sizes. See Compact data files with optimize on Delta Lake.
VACUUM Reduces storage costs by deleting data files no longer referenced by the table. See Remove unused data files with vacuum.


OPTIMIZE does not run ZORDER when executed with predictive optimization.


The retention window for the VACUUM command is determined by the delta.deletedFileRetentionDuration table property, which defaults to 7 days. This means VACUUM removes data files that are no longer referenced by a Delta table version in the last 7 days. If you’d like to retain data for longer (such as to support time travel for longer durations), you must set this table property appropriately before you enable predictive optimization, as in the following example:

ALTER TABLE table_name SET TBLPROPERTIES ('delta.deletedFileRetentionDuration' = '30 days');

Where does predictive optimization run?

Predictive optimization identifies tables that would benefit from OPTIMIZE and VACUUM operations and queues them to run using jobs compute. Your account is billed for compute associated with these workloads using a SKU specific to Databricks Managed Services. See pricing for Databricks managed services. Azure Databricks provides system tables for observability into predictive optimization operations, costs, and impact. See Use system tables to track predictive optimization.


Predictive optimization does not run OPTIMIZE commands on tables that use liquid clustering.

Prerequisites for predictive optimization

You must fulfill the following requirements to enable predictive optimization:

  • Your Azure Databricks workspace must be on the Premium plan in a region that supports predictive optimization. See Azure Databricks regions.

  • You must use SQL warehouses or Databricks Runtime 12.2 LTS or above when you enable predictive optimization.

  • Only Unity Catalog managed tables are supported.

  • If you require private connectivity for your storage accounts, you must configure serverless private connectivity. See Configure private connectivity from serverless compute.

Enable predictive optimization

You must enable predictive optimization at the account level.

You must have the following privileges to enable or disable predictive optimization at the specified level:

Unity Catalog object Privilege
Account Account admin
Catalog Catalog owner
Schema Schema owner


When you enable predictive optimization for the first time, Azure Databricks automatically creates a service principal in your Azure Databricks account. Azure Databricks uses this service principal to perform the requested maintenance operations. See Manage service principals.

Enable predictive optimization for your account

An account admin must complete the following steps to enable predictive optimization for all metastores in an account:

  1. Access the accounts console.
  2. Navigate to Settings, then Feature enablement.
  3. Select Enabled next to Predictive optimization.


Metastores in regions that don’t support predictive optimization aren’t enabled.

Enable or disable predictive optimization for a catalog or schema

Predictive optimization uses an inheritance model. When enabled for a catalog, schemas inherit the property. Tables within an enabled schema inherit predictive optimization. To override this inheritance behavior, you can explicitly disable predictive optimization for a catalog or schema.


You can disable predictive optimization at the catalog or schema level before enabling it at the account level. If predictive optimization is later enabled on the account, it is blocked for tables in these objects.

Use the following syntax to enable or disable predictive optimization:


Check whether predictive optimization is enabled

The Predictive Optimization field is a Unity Catalog property that details if predictive optimization is enabled. If predictive optimization is inherited from a parent object, this is indicated in the field value.

Use the following syntax to see if predictive optimization is enabled:


Use system tables to track predictive optimization

Azure Databricks provides a system table to track the history of predictive optimization operations. See Predictive optimization system table reference.


Predictive optimization is not available in all regions. See Azure Databricks regions.

Predictive optimization does not run OPTIMIZE commands on tables that use liquid clustering or Z-order.

Predictive optimization does not run VACUUM operations on tables with a file retention window configured below the default of 7 days. See Configure data retention for time travel queries.

Predictive optimization does not perform maintenance operations on the following tables: