Predictive optimization for Unity Catalog managed tables

Predictive optimization removes the need to manually manage maintenance operations for Unity Catalog managed tables on Azure Databricks.

With predictive optimization enabled, Azure Databricks automatically identifies tables that would benefit from maintenance operations and runs them for the user. Maintenance operations are only run as necessary, eliminating both unnecessary runs for maintenance operations and the burden associated with tracking and troubleshooting performance.

What operations does predictive optimization run?

Predictive optimization runs the following operations automatically for enabled tables:

Operation Description
OPTIMIZE (1) Triggers incremental clustering for enabled tables. See Use liquid clustering for Delta tables.

Improves query performance by optimizing file sizes. See Optimize data file layout.
VACUUM Reduces storage costs by deleting data files no longer referenced by the table. See Remove unused data files with vacuum.
ANALYZE (2) Triggers incremental update of statistics to improve query performance.

(1) OPTIMIZE does not run ZORDER when executed with predictive optimization.

(2) Predictive optimization with ANALYZE is in Public Preview. It include intelligent stastistics collection during writes. Use this form to sign up for the Public Preview. During the initial Public Preview, ANALYZE commands can run on tables with up to 1 TB of data and 500 columns or fewer.

Warning

The retention window for the VACUUM command is determined by the delta.deletedFileRetentionDuration table property, which defaults to 7 days. This means VACUUM removes data files that are no longer referenced by a Delta table version in the last 7 days. If you’d like to retain data for longer (such as to support time travel for longer durations), you must set this table property appropriately before you enable predictive optimization, as in the following example:

ALTER TABLE table_name SET TBLPROPERTIES ('delta.deletedFileRetentionDuration' = '30 days');

Where does predictive optimization run?

Predictive optimization identifies tables that would benefit from ANALYZE, OPTIMIZE, and VACUUM operations and queues them to run using serverless compute for jobs. Your account is billed for compute associated with these workloads using a SKU specific to Databricks Managed Services. See pricing for Databricks managed services. Azure Databricks provides system tables for observability into predictive optimization operations, costs, and impact. See Use system tables to track predictive optimization.

Prerequisites for predictive optimization

You must fulfill the following requirements to enable predictive optimization:

  • Your Azure Databricks workspace must be on the Premium plan in a region that supports predictive optimization. See Azure Databricks regions.

  • You must use SQL warehouses or Databricks Runtime 12.2 LTS or above when you enable predictive optimization.

  • Only Unity Catalog managed tables are supported.

  • If you require private connectivity for your storage accounts, you must configure serverless private connectivity. See Configure private connectivity from serverless compute.

Enable predictive optimization

You must enable predictive optimization at the account level.

You must have the following privileges to enable or disable predictive optimization at the specified level:

Unity Catalog object Privilege
Account Account admin
Catalog Catalog owner
Schema Schema owner

Note

When you enable predictive optimization for the first time, Azure Databricks automatically creates a service principal in your Azure Databricks account. Azure Databricks uses this service principal to perform the requested maintenance operations. See Manage service principals.

Enable predictive optimization for your account

An account admin must complete the following steps to enable predictive optimization for all metastores in an account:

  1. Access the accounts console.
  2. Navigate to Settings, then Feature enablement.
  3. Select Enabled next to Predictive optimization.

Note

Metastores in regions that don’t support predictive optimization aren’t enabled.

Enable or disable predictive optimization for a catalog or schema

Predictive optimization uses an inheritance model. When enabled for a catalog, schemas inherit the property. Tables within an enabled schema inherit predictive optimization. To override this inheritance behavior, you can explicitly disable predictive optimization for a catalog or schema.

Note

You can disable predictive optimization at the catalog or schema level before enabling it at the account level. If predictive optimization is later enabled on the account, it is blocked for tables in these objects.

Use the following syntax to enable or disable predictive optimization:

ALTER CATALOG [catalog_name] {ENABLE | DISABLE} PREDICTIVE OPTIMIZATION;
ALTER {SCHEMA | DATABASE} schema_name {ENABLE | DISABLE} PREDICTIVE OPTIMIZATION;

Check whether predictive optimization is enabled

The Predictive Optimization field is a Unity Catalog property that details if predictive optimization is enabled. If predictive optimization is inherited from a parent object, this is indicated in the field value.

Important

You must enable predictive optimization at the account level to view this field.

Use the following syntax to see if predictive optimization is enabled:

DESCRIBE (CATALOG | SCHEMA | TABLE) EXTENDED name

Use system tables to track predictive optimization

Azure Databricks provides a system table to track the history of predictive optimization operations. See Predictive optimization system table reference.

If the system table marks operations as failed with FAILED: PRIVATE_LINK_SETUP_ERROR, you might not have correctly configured private link for serverless compute. See Configure private connectivity from serverless compute.

Limitations

Predictive optimization is not available in all regions. See Azure Databricks regions.

Predictive optimization does not run OPTIMIZE commands on tables that use Z-order.

Predictive optimization does not run VACUUM operations on tables with a file retention window configured below the default of 7 days. See Configure data retention for time travel queries.

Predictive optimization does not perform maintenance operations on the following tables: