Create tables in Unity Catalog
This article introduces the concept of managed and external tables in Unity Catalog and describes how to create tables in Unity Catalog.
Note
When you create a table, be sure to reference a catalog that is governed by Unity Catalog or set the default catalog to a catalog that is governed by Unity Catalog. See Manage the default catalog.
The catalog hive_metastore
appears in Catalog Explorer but is not considered governed by Unity Catalog. It is managed by your Azure Databricks workspace’s Hive metastore. All other catalogs listed are governed by Unity Catalog.
You can use the Unity Catalog table upgrade interface to upgrade existing tables registered in the Hive metastore to Unity Catalog. See Upgrade Hive tables and views to Unity Catalog.
Managed tables
Managed tables are the default way to create tables in Unity Catalog. Unity Catalog manages the lifecycle and file layout for these tables. You should not use tools outside of Azure Databricks to manipulate files in these tables directly.
Managed tables are stored in managed storage, either at the metastore, catalog, or schema level, depending on how the schema and catalog are configured. See Specify a managed storage location in Unity Catalog.
Managed tables always use the Delta table format.
When a managed table is dropped, its underlying data is deleted from your cloud tenant within 30 days.
External tables
External tables are tables whose data is stored outside of the managed storage location specified for the metastore, catalog, or schema. Use external tables only when you require direct access to the data outside of Azure Databricks clusters or Databricks SQL warehouses.
When you run DROP TABLE
on an external table, Unity Catalog does not delete the underlying data. To drop a table you must be its owner. You can manage privileges on external tables and use them in queries in the same way as managed tables. To create an external table with SQL, specify a LOCATION
path in your CREATE TABLE
statement. External tables can use the following file formats:
- DELTA
- CSV
- JSON
- AVRO
- PARQUET
- ORC
- TEXT
To manage access to the underlying cloud storage for an external table, you must set up storage credentials and external locations.
To learn more, see Create an external table.
Requirements
You must have the CREATE TABLE
privilege on the schema in which you want to create the table, as well as the USE SCHEMA
privilege on the schema and the USE CATALOG
privilege on the parent catalog.
If you are creating an external table, see Create an external table for additional requirements.
Create a managed table
To create a managed table, run the following SQL command. Items in brackets are optional. Replace the placeholder values:
<catalog-name>
: The name of the catalog that will contain the table..This cannot be the
hive_metastore
catalog that is created automatically for the Hive metastore associated with your Azure Databricks workspace. You can drop the catalog name if you are creating the table in the workspace’s default catalog.<schema-name>
: The name of the schema that will contain the table..<table-name>
: A name for the table.<column-specification>
: The name and data type for each column.
SQL
CREATE TABLE <catalog-name>.<schema-name>.<table-name>
(
<column-specification>
);
Python
spark.sql("CREATE TABLE <catalog-name>.<schema-name>.<table-name> "
"("
" <column-specification>"
")")
R
library(SparkR)
sql(paste("CREATE TABLE <catalog-name>.<schema-name>.<table-name> ",
"(",
" <column-specification>",
")",
sep = ""))
Scala
spark.sql("CREATE TABLE <catalog-name>.<schema-name>.<table-name> " +
"(" +
" <column-specification>" +
")")
You can also create a managed table by using the Databricks Terraform provider and databricks_table. You can retrieve a list of table full names by using databricks_tables.
For example, to create the table main.default.department
and insert five rows into it:
SQL
CREATE TABLE main.default.department
(
deptcode INT,
deptname STRING,
location STRING
);
INSERT INTO main.default.department VALUES
(10, 'FINANCE', 'EDINBURGH'),
(20, 'SOFTWARE', 'PADDINGTON'),
(30, 'SALES', 'MAIDSTONE'),
(40, 'MARKETING', 'DARLINGTON'),
(50, 'ADMIN', 'BIRMINGHAM');
Python
spark.sql("CREATE TABLE main.default.department "
"("
" deptcode INT,"
" deptname STRING,"
" location STRING"
")"
"INSERT INTO main.default.department VALUES "
" (10, 'FINANCE', 'EDINBURGH'),"
" (20, 'SOFTWARE', 'PADDINGTON'),"
" (30, 'SALES', 'MAIDSTONE'),"
" (40, 'MARKETING', 'DARLINGTON'),"
" (50, 'ADMIN', 'BIRMINGHAM')")
R
library(SparkR)
sql(paste("CREATE TABLE main.default.department ",
"(",
" deptcode INT,",
" deptname STRING,",
" location STRING",
")",
"INSERT INTO main.default.department VALUES ",
" (10, 'FINANCE', 'EDINBURGH'),",
" (20, 'SOFTWARE', 'PADDINGTON'),",
" (30, 'SALES', 'MAIDSTONE'),",
" (40, 'MARKETING', 'DARLINGTON'),",
" (50, 'ADMIN', 'BIRMINGHAM')",
sep = ""))
Scala
spark.sql("CREATE TABLE main.default.department " +
"(" +
" deptcode INT," +
" deptname STRING," +
" location STRING" +
")" +
"INSERT INTO main.default.department VALUES " +
" (10, 'FINANCE', 'EDINBURGH')," +
" (20, 'SOFTWARE', 'PADDINGTON')," +
" (30, 'SALES', 'MAIDSTONE')," +
" (40, 'MARKETING', 'DARLINGTON')," +
" (50, 'ADMIN', 'BIRMINGHAM')")
Drop a managed table
You must be the table’s owner to drop a table. To drop a managed table, run the following SQL command:
DROP TABLE IF EXISTS catalog_name.schema_name.table_name;
When a managed table is dropped, its underlying data is deleted from your cloud tenant within 30 days.
Create an external table
The data in an external table is stored in a path on your cloud tenant. To work with external tables, Unity Catalog introduces two objects to access and work with external cloud storage:
- A storage credential contains an authentication method for accessing a cloud storage location. The storage credential does not contain a mapping to the path to which it grants access. Storage credentials are access-controlled to determine which users can use the credential.
- An external location maps a storage credential with a cloud storage path to which it grants access. The external location grants access only to that cloud storage path and its contents. External locations are access-controlled to determine which users can use them. An external location is used automatically when your SQL command contains a
LOCATION
clause.
Requirements
To create an external table, you must have:
- The
CREATE EXTERNAL TABLE
privilege on an external location that grants access to theLOCATION
accessed by the external table. - The
USE SCHEMA
permission on the table’s parent schema. - The
USE CATALOG
permission on the table’s parent catalog. - The
CREATE TABLE
permission on the table’s parent schema.
External locations and storage credentials are stored at the metastore level, rather than in a catalog. To create a storage credential, you must be an account admin or have the CREATE STORAGE CREDENTIAL
privilege. To create an external location, you must be the metastore admin or have the CREATE EXTERNAL LOCATION
privilege. See Connect to cloud object storage using Unity Catalog.
Create a table
Use one of the following command examples in a notebook or the SQL query editor to create an external table.
You can also use an example notebook to create the storage credential, external location, and external table, and also manage permissions for them.
In the following examples, replace the placeholder values:
<catalog>
: The name of the catalog that will contain the table.This cannot be the
hive_metastore
catalog that is created automatically for the Hive metastore associated with your Azure Databricks workspace. You can drop the catalog name if you are creating the table in the workspace’s default catalog.<schema>
: The name of the schema that will contain the table.<table-name>
: A name for the table.<column-specification>
: The name and data type for each column.<bucket-path>
: The path to the cloud storage bucket where the table will be created.<table-directory>
: A directory where the table will be created. Use a unique directory for each table.
Important
Once a table is created in a path, users can no longer directly access the files in that path from Azure Databricks even if they have been given privileges on an external location or storage credential to do so. This is to ensure that users cannot circumvent access controls applied to tables by reading files from your cloud tenant directly.
SQL
CREATE TABLE <catalog>.<schema>.<table-name>
(
<column-specification>
)
LOCATION 'abfss://<bucket-path>/<table-directory>';
Python
spark.sql("CREATE TABLE <catalog>.<schema>.<table-name> "
"("
" <column-specification>"
") "
"LOCATION 'abfss://<bucket-path>/<table-directory>'")
R
library(SparkR)
sql(paste("CREATE TABLE <catalog>.<schema>.<table-name> ",
"(",
" <column-specification>",
") ",
"LOCATION 'abfss://<bucket-path>/<table-directory>'",
sep = ""))
Scala
spark.sql("CREATE TABLE <catalog>.<schema>.<table-name> " +
"(" +
" <column-specification>" +
") " +
"LOCATION 'abfss://<bucket-path>/<table-directory>'")
Unity Catalog checks that you have the following permissions:
CREATE EXTERNAL TABLE
on the external location that references the cloud storage path you specify.CREATE TABLE
on the parent schema.USE SCHEMA
on the parent schema.USE CATALOG
on the parent catalog.
If you do, the external table is created. Otherwise, an error occurs and the external table is not created.
Note
You can instead migrate an existing external table in the Hive metastore to Unity Catalog without duplicating its data. See Upgrade a single Hive table to a Unity Catalog external table using the upgrade wizard.
You can also create an external table by using the Databricks Terraform provider and databricks_table. You can retrieve a list of table full names by using databricks_tables.
Example notebook: Create external tables
You can use the following example notebook to create a catalog, schema, and external table, and to manage permissions on them.
Create and manage an external table in Unity Catalog notebook
Create a table from files stored in your cloud tenant
You can populate a managed or external table with records from files stored in your cloud tenant. Unity Catalog reads the files at that location and inserts their contents into the table. In Unity Catalog, this is called path-based-access.
You can follow the examples in this section or use the add data UI.
Explore the contents of the files
To explore data stored in an external location before you create tables from that data, you can use Catalog Explorer or the following commands.
Permissions required: You must have the READ FILES
permission on the external location associated with the cloud storage path to return a list of data files in that location.
Sql
List the files in a cloud storage path:
LIST 'abfss://<path-to-files>';
Query the data in the files in a given path:
SELECT * FROM <format>.`abfss://<path-to-files>`;
Python
List the files in a cloud storage path:
display(spark.sql("LIST 'abfss://<path-to-files>'"))
Query the data in the files in a given path:
display(spark.read.load("abfss://<path-to-files>"))
R
List the files in a cloud storage path:
library(SparkR) display(sql("LIST 'abfss://<path-to-files>'"))
Query the data in the files in a given path:
library(SparkR) display(loadDF("abfss://<path-to-files>"))
Scala
List the files in a cloud storage path:
display(spark.sql("LIST 'abfss://<path-to-files>'"))
Query the data in the files in a given path:
display(spark.read.load("abfss://<path-to-files>"))
Create a table from the files
Follow the examples in this section to create a new table and populate it with data files on your cloud tenant.
Note
You can instead migrate an existing external table in the Hive metastore to Unity Catalog without duplicating its data. See Upgrade a single Hive table to a Unity Catalog external table using the upgrade wizard.
Important
- When you create a table using this method, the storage path is read only once, to prevent duplication of records. If you want to re-read the contents of the directory, you must drop and re-create the table. For an existing table, you can insert records from a storage path.
- The bucket path where you create a table cannot also be used to read or write data files.
- Only the files in the exact directory are read; the read is not recursive.
- You must have the following permissions:
USE CATALOG
on the parent catalog andUSE SCHEMA
on the schema.CREATE TABLE
on the parent schema.READ FILES
on the external location associated with the bucket path where the files are located, or directly on the storage credential if you are not using an external location.- If you are creating an external table, you need
CREATE EXTERNAL TABLE
on the bucket path where the table will be created.
To create a new managed table and populate it with data in your cloud storage, use the following examples.
SQL
CREATE TABLE <catalog>.<schema>.<table-name>
(
<column-specification>
)
SELECT * from <format>.`abfss://<path-to-files>`;
Python
spark.sql("CREATE TABLE <catalog>.<schema>.<table-name> "
"( "
" <column-specification> "
") "
"SELECT * from <format>.`abfss://<path-to-files>`")
R
library(SparkR)
sql(paste("CREATE TABLE <catalog>.<schema>.<table-name> ",
"( ",
" <column-specification> ",
") ",
"SELECT * from <format>.`abfss://<path-to-files>`",
sep = ""))
Scala
spark.sql("CREATE TABLE <catalog>.<schema>.<table-name> " +
"( " +
" <column-specification> " +
") " +
"SELECT * from <format>.`abfss://<path-to-files>`")
To create an external table and populate it with data in your cloud storage, add a LOCATION
clause:
SQL
CREATE TABLE <catalog>.<schema>.<table-name>
(
<column-specification>
)
USING <format>
LOCATION 'abfss://<table-location>'
SELECT * from <format>.`abfss://<path-to-files>`;
Python
spark.sql("CREATE TABLE <catalog>.<schema>.<table-name> "
"( "
" <column-specification> "
") "
"USING <format> "
"LOCATION 'abfss://<table-location>' "
"SELECT * from <format>.`abfss://<path-to-files>`")
R
library(SparkR)
sql(paste("CREATE TABLE <catalog>.<schema>.<table-name> ",
"( ",
" <column-specification> ",
") ",
"USING <format> ",
"LOCATION 'abfss://<table-location>' ",
"SELECT * from <format>.`abfss://<path-to-files>`",
sep = ""))
Scala
spark.sql("CREATE TABLE <catalog>.<schema>.<table-name> " +
"( " +
" <column-specification> " +
") " +
"USING <format> " +
"LOCATION 'abfss://<table-location>' " +
"SELECT * from <format>.`abfss://<path-to-files>`")
Insert records from a path into an existing table
To insert records from a bucket path into an existing table, use the COPY INTO
command. In the following examples, replace the placeholder values:
<catalog>
: The name of the table’s parent catalog.<schema>
: The name of the table’s parent schema.<path-to-files>
: The bucket path that contains the data files.<format>
: The format of the files, for exampledelta
.<table-location>
: The bucket path where the table will be created.
Important
- When you insert records into a table using this method, the bucket path you provide is read only once, to prevent duplication of records.
- The bucket path where you create a table cannot also be used to read or write data files.
- Only the files in the exact directory are read; the read is not recursive.
- You must have the following permissions:
USE CATALOG
on the parent catalog andUSE SCHEMA
on the schema.MODIFY
on the table.READ FILES
on the external location associated with the bucket path where the files are located, or directly on the storage credential if you are not using an external location.- To insert records into an external table, you need
CREATE EXTERNAL TABLE
on the bucket path where the table is located.
To insert records from files in a bucket path into a managed table, using an external location to read from the bucket path:
SQL
COPY INTO <catalog>.<schema>.<table>
FROM (
SELECT *
FROM 'abfss://<path-to-files>'
)
FILEFORMAT = <format>;
Python
spark.sql("COPY INTO <catalog>.<schema>.<table> "
"FROM ( "
" SELECT * "
" FROM 'abfss://<path-to-files>' "
") "
"FILEFORMAT = <format>")
R
library(SparkR)
sql(paste("COPY INTO <catalog>.<schema>.<table> ",
"FROM ( ",
" SELECT * ",
" FROM 'abfss://<path-to-files>' ",
") ",
"FILEFORMAT = <format>",
sep = ""))
Scala
spark.sql("COPY INTO <catalog>.<schema>.<table> " +
"FROM ( " +
" SELECT * " +
" FROM 'abfss://<path-to-files>' " +
") " +
"FILEFORMAT = <format>")
To insert into an external table, add a LOCATION
clause:
SQL
COPY INTO <catalog>.<schema>.<table>
LOCATION 'abfss://<table-location>'
FROM (
SELECT *
FROM 'abfss://<path-to-files>'
)
FILEFORMAT = <format>;
Python
spark.sql("COPY INTO <catalog>.<schema>.<table> "
"LOCATION 'abfss://<table-location>' "
"FROM ( "
" SELECT * "
" FROM 'abfss://<path-to-files>' "
") "
"FILEFORMAT = <format>")
R
library(SparkR)
sql(paste("COPY INTO <catalog>.<schema>.<table> ",
"LOCATION 'abfss://<table-location>' ",
"FROM ( ",
" SELECT * ",
" FROM 'abfss://<path-to-files>' ",
") ",
"FILEFORMAT = <format>",
sep = ""))
Scala
spark.sql("COPY INTO <catalog>.<schema>.<table> " +
"LOCATION 'abfss://<table-location>' " +
"FROM ( " +
" SELECT * " +
" FROM 'abfss://<path-to-files>' " +
") " +
"FILEFORMAT = <format>")
Add comments to a table
As a table owner or a user with the MODIFY
privilege on a table, you can add comments to a table and its columns. You can add comments using the following functionality:
- The COMMENT ON command. This option does not support column comments.
- The
COMMENT
option when you use the CREATE TABLE and ALTER TABLE commands. This option supports column comments. - The “manual” comment field in Catalog Explorer. This option supports column comments. See Document data in Catalog Explorer using markdown comments.
- AI-generated comments (also known as AI-generated documentation) in Catalog Explorer. You can view a comment suggested by a large language model (LLM) that takes into account the table metadata, such as the table schema and column names, and edit or accept the comment as-is to add it. See Add AI-generated comments to a table.
Next steps
Feedback
https://aka.ms/ContentUserFeedback.
Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see:Submit and view feedback for