Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Unity Catalog managed tables are the default and recommended table type in Azure Databricks for Delta Lake and Apache Iceberg. Unity Catalog manages all read, write, storage, and optimization responsibilities. See Convert an external Delta Lake table to a managed Unity Catalog table.
Data files for managed tables are stored in the schema or catalog containing them. See Specify a managed storage location in Unity Catalog.
Databricks recommends using managed tables to take advantage of the following benefits, compared to external and foreign tables:
- Reduced storage and compute costs.
- Faster query performance across all client types.
- Automatic table maintenance and optimization.
- Secure access for external clients via open APIs.
- Support for Delta Lake and Apache Iceberg formats.
- Automatic upgrades to the latest platform features.
You can work with managed tables across all languages and products supported in Azure Databricks. You need certain privileges to create, update, delete, or query managed tables. See Manage privileges in Unity Catalog.
Note
This page only describes Unity Catalog managed tables. For managed tables in the legacy Hive metastore, see Database objects in the legacy Hive metastore.
Benefits of Unity Catalog managed tables
Unity Catalog managed tables optimize storage costs and query speeds, and allow interoperability with third-party tools for Delta Lake and Apache Iceberg. To simplify data management and performance, these managed tables use AI-powered technologies, such as file size compaction and intelligent statistics collection.
Managed tables support interoperability by allowing access from Delta Lake and Apache Iceberg clients. See Access Databricks data using external systems.
The following features are unique to Unity Catalog managed tables, and aren't available for external tables and foreign tables:
| Feature | Benefits | Configuration |
|---|---|---|
| Catalog commits | Enables multi-statement transactions across tables, faster query planning by serving metadata directly from Unity Catalog, enforceable schema and constraint changes, and safe writes from external engines. | Turned off by default. To turn on, set the delta.feature.catalogManaged table property. See Enable catalog commits. |
| Predictive optimization | Automatically optimizes your data layout and compute using AI, without requiring manual maintenance operations. Databricks recommends enabling predictive optimization for all managed tables to reduce storage and compute costs. Automatically runs:
|
Enabled by default for all new accounts created on or after November 11, 2024. For current accounts, Azure Databricks is gradually enabling predictive optimization by default. See Verify whether predictive optimization is enabled. To configure, see Enable predictive optimization. |
| Multi-statement transactions | Allows you to run multiple SQL statements across one or more tables as a single atomic commit, with ACID guarantees. All changes succeed together or roll back together. Use for stored procedures and SQL scripting in mission-critical warehousing workloads. Transactions that write to managed Delta Lake tables are in Public Preview. Transactions that write to managed Apache Iceberg tables are in Private Preview. |
Turned off by default. Use BEGIN ATOMIC ... END; for non-interactive transactions or BEGIN TRANSACTION; ... COMMIT; for interactive transactions. See Transaction modes. |
| Automatic liquid clustering | For tables with predictive optimization, liquid clustering intelligently selects clustering keys and automatically updates them as query patterns change to improve performance and lower costs. | Turned off by default. To configure, see Enable liquid clustering. |
| Metadata caching | In-memory caching of transaction metadata enhances query performance by minimizing requests to the transaction log stored in the cloud. | Enabled by default. Not configurable. |
| Full-text search indexes | Accelerates substring and keyword lookups on text columns using the search and isearch functions. When an index applies, Azure Databricks skips files that can't contain matching rows, reducing the amount of data scanned.In Beta and requires Databricks Runtime 18.2 and above. |
Turned off by default. Create with CREATE SEARCH INDEX. |
Automatic file deletion after a DROP TABLE command |
If you DROP a managed table, Azure Databricks deletes the data files in cloud storage after the recovery period expires (default 7 days), reducing storage costs. For external tables, you must manually delete the files from your storage bucket. | Enabled by default. You can configure the recovery period at the catalog or schema level. See Drop a managed table. |
Access Databricks data using external systems
Managed tables support interoperability by allowing access from Delta Lake and Apache Iceberg clients.
Through open APIs and credential vending, Unity Catalog enables external engines such as Trino, DuckDB, Apache Spark, Daft, and Iceberg REST catalog-integrated engines, such as Dremio, to access managed tables. For external clients that don't support open APIs, you can use Compatibility Mode to read managed tables using any Delta Lake or Apache Iceberg client. OpenSharing, an open source protocol, enables secure, governed data sharing with external partners and platforms.
See integrations for a list of supported external engines, or check your engine's documentation if it's not included in this list.
The following open APIs allow external systems to access Unity Catalog managed tables:
- Unity REST API has read, write, and create access for Delta Lake clients to managed Delta Lake tables.
- Iceberg REST Catalog (IRC) has read, write, and create access for Apache Iceberg clients to managed Apache Iceberg tables and read-only access to Delta Lake tables with Apache Iceberg reads enabled (UniForm).
Both APIs support credential vending, which provides temporary, scoped credentials that inherit the privileges of the requesting Azure Databricks principal, maintaining governance and security controls.
OpenSharing is an open source protocol that allows secure and governed data access to external partners and platforms. You can use OpenSharing to grant partners temporary, read-only access.
All reads and writes to managed tables must use table names and catalog and schema names where they exist. For example, catalog_name.schema_name.table_name. Path-based access to Unity Catalog managed tables is not supported (except in Compatibility Mode) because it bypasses Unity Catalog access controls and prevents managed table features from working properly.
Create a managed table
To create a managed table, you must have:
USE SCHEMAon the table's parent schema.USE CATALOGon the table's parent catalog.CREATE TABLEon the table's parent schema.
Use the following SQL syntax to create an empty managed table using SQL. Replace the placeholder values:
<catalog-name>: The name of the catalog that will contain the table.<schema-name>: The schema's name containing the table.<table-name>: A name for the table.<column-specification>: Each column's name and data type.
-- Create a managed Delta table
CREATE TABLE <catalog-name>.<schema-name>.<table-name>
(
<column-specification>
);
-- Create a managed Iceberg table
CREATE TABLE <catalog-name>.<schema-name>.<table-name>
(
<column-specification>
)
USING iceberg;
To maintain performance on reads and writes, Azure Databricks periodically runs operations to optimize managed Apache Iceberg table metadata. This task is performed using serverless compute, which has MODIFY permissions on the Apache Iceberg table. This operation only writes to the table's metadata, and the compute only maintains permissions to the table for the duration of the job.
Note
To create an Apache Iceberg table, explicitly specify USING iceberg. Otherwise, Azure Databricks creates a Delta Lake table by default.
You can create managed tables from query results or DataFrame write operations. The following articles demonstrate some of the many patterns you can use to create a managed table on Azure Databricks:
To create a copy of an existing managed table, use clone. Managed Delta Lake tables support deep and shallow cloning. Managed Apache Iceberg tables support only deep cloning. See Clone a table on Azure Databricks and Clone a managed Iceberg table.
Drop a managed table
To drop a managed table, you must have:
MANAGEon the table or you must be the table owner.USE SCHEMAon the table's parent schema.USE CATALOGon the table's parent catalog.
To drop a managed table, run the following SQL command:
DROP TABLE IF EXISTS catalog_name.schema_name.table_name;
Unity Catalog supports the UNDROP TABLE command to recover accidentally dropped managed tables. By default, tables are recoverable for 7 days after being dropped. After the recovery period ends, Azure Databricks deletes the underlying data files from your cloud tenant within 48 hours.
Configure the recovery period
Important
Configurable recovery period is in Public Preview.
You can configure how long dropped managed tables remain recoverable at the catalog or schema level. If recovery periods are set at both levels, the schema-level setting takes precedence for tables in that schema.
To configure the recovery period, you must have MANAGE privilege or ownership on the catalog or schema. This setting only applies to tables dropped after it is configured. It doesn't affect tables that were already dropped.
The recovery period can be set to 0 hours (to disable recovery) or between 7–30 days, inclusive. A longer recovery period (up to 30 days) provides added protection against accidental drops of critical production data. A shorter recovery period, or setting it to 0, causes dropped data to be deleted faster — useful for cost savings in workloads that frequently create and drop tables as part of ETL pipelines. Setting the recovery period to 0 means dropped tables are not recoverable using UNDROP. Data files are deleted from cloud storage within 48 hours of the table being dropped.
To set the recovery period, use ALTER CATALOG or ALTER SCHEMA with the RETAIN DROPPED TO clause:
-- Set a 30-day recovery period on a catalog
ALTER CATALOG my_catalog RETAIN DROPPED TO 30 DAYS;
-- Set a 7-day recovery period on a schema (overrides the catalog setting)
ALTER SCHEMA my_catalog.my_schema RETAIN DROPPED TO 7 DAYS;
You can also set the recovery period when creating a catalog or schema with the RETAIN DROPPED FOR clause:
CREATE CATALOG my_catalog RETAIN DROPPED FOR 30 DAYS;
CREATE SCHEMA my_catalog.my_schema RETAIN DROPPED FOR 7 DAYS;
To check the current recovery period, run DESCRIBE EXTENDED. The output includes a Recovery Period Hours row:
DESCRIBE CATALOG EXTENDED my_catalog;
DESCRIBE SCHEMA EXTENDED my_catalog.my_schema;