Predictive optimization for Unity Catalog managed tables
Predictive optimization removes the need to manually manage maintenance operations for Unity Catalog managed tables on Azure Databricks.
With predictive optimization enabled, Azure Databricks automatically identifies tables that would benefit from maintenance operations and runs them for the user. Maintenance operations are only run as necessary, eliminating both unnecessary runs for maintenance operations and the burden associated with tracking and troubleshooting performance.
What operations does predictive optimization run?
Predictive optimization runs the following operations automatically for enabled tables:
Operation | Description |
---|---|
OPTIMIZE (1) |
Triggers incremental clustering for enabled tables. See Use liquid clustering for Delta tables. Improves query performance by optimizing file sizes. See Optimize data file layout. |
VACUUM |
Reduces storage costs by deleting data files no longer referenced by the table. See Remove unused data files with vacuum. |
ANALYZE (2) |
Triggers incremental update of statistics to improve query performance. |
(1) OPTIMIZE
does not run ZORDER
when executed with predictive optimization.
(2) Predictive optimization with ANALYZE
is in Public Preview. It include intelligent stastistics collection during writes. Use this form to sign up for the Public Preview. During the initial Public Preview, ANALYZE
commands can run on tables with up to 1 TB of data and 500 columns or fewer.
Warning
The retention window for the VACUUM
command is determined by the delta.deletedFileRetentionDuration
table property, which defaults to 7 days. This means VACUUM
removes data files that are no longer referenced by a Delta table version in the last 7 days. If you’d like to retain data for longer (such as to support time travel for longer durations), you must set this table property appropriately before you enable predictive optimization, as in the following example:
ALTER TABLE table_name SET TBLPROPERTIES ('delta.deletedFileRetentionDuration' = '30 days');
Where does predictive optimization run?
Predictive optimization identifies tables that would benefit from ANALYZE
, OPTIMIZE
, and VACUUM
operations and queues them to run using serverless compute for jobs. Your account is billed for compute associated with these workloads using a SKU specific to Databricks Managed Services. See pricing for Databricks managed services. Azure Databricks provides system tables for observability into predictive optimization operations, costs, and impact. See Use system tables to track predictive optimization.
Prerequisites for predictive optimization
You must fulfill the following requirements to enable predictive optimization:
Your Azure Databricks workspace must be on the Premium plan in a region that supports predictive optimization. See Azure Databricks regions.
You must use SQL warehouses or Databricks Runtime 12.2 LTS or above when you enable predictive optimization.
Only Unity Catalog managed tables are supported.
If you require private connectivity for your storage accounts, you must configure serverless private connectivity. See Configure private connectivity from serverless compute.
Enable predictive optimization
You must enable predictive optimization at the account level.
You must have the following privileges to enable or disable predictive optimization at the specified level:
Unity Catalog object | Privilege |
---|---|
Account | Account admin |
Catalog | Catalog owner |
Schema | Schema owner |
Note
When you enable predictive optimization for the first time, Azure Databricks automatically creates a service principal in your Azure Databricks account. Azure Databricks uses this service principal to perform the requested maintenance operations. See Manage service principals.
Enable predictive optimization for your account
An account admin must complete the following steps to enable predictive optimization for all metastores in an account:
- Access the accounts console.
- Navigate to Settings, then Feature enablement.
- Select Enabled next to Predictive optimization.
Note
Metastores in regions that don’t support predictive optimization aren’t enabled.
Enable or disable predictive optimization for a catalog or schema
Predictive optimization uses an inheritance model. When enabled for a catalog, schemas inherit the property. Tables within an enabled schema inherit predictive optimization. To override this inheritance behavior, you can explicitly disable predictive optimization for a catalog or schema.
Note
You can disable predictive optimization at the catalog or schema level before enabling it at the account level. If predictive optimization is later enabled on the account, it is blocked for tables in these objects.
Use the following syntax to enable or disable predictive optimization:
ALTER CATALOG [catalog_name] {ENABLE | DISABLE} PREDICTIVE OPTIMIZATION;
ALTER {SCHEMA | DATABASE} schema_name {ENABLE | DISABLE} PREDICTIVE OPTIMIZATION;
Check whether predictive optimization is enabled
The Predictive Optimization
field is a Unity Catalog property that details if predictive optimization is enabled. If predictive optimization is inherited from a parent object, this is indicated in the field value.
Important
You must enable predictive optimization at the account level to view this field.
Use the following syntax to see if predictive optimization is enabled:
DESCRIBE (CATALOG | SCHEMA | TABLE) EXTENDED name
Use system tables to track predictive optimization
Azure Databricks provides a system table to track the history of predictive optimization operations. See Predictive optimization system table reference.
Private link error message
If the system table marks operations as failed with FAILED: PRIVATE_LINK_SETUP_ERROR
, you might not have correctly configured private link for serverless compute. See Configure private connectivity from serverless compute.
Limitations
Predictive optimization is not available in all regions. See Azure Databricks regions.
Predictive optimization does not run OPTIMIZE
commands on tables that use Z-order.
Predictive optimization does not run VACUUM
operations on tables with a file retention window configured below the default of 7 days. See Configure data retention for time travel queries.
Predictive optimization does not perform maintenance operations on the following tables:
- Tables loaded to a workspace as Delta Sharing recipients.
- External tables.
- Materialized views. See Use materialized views in Databricks SQL.
- Streaming tables. See Load data using streaming tables in Databricks SQL.