Create tables in Unity Catalog

This article introduces the concept of managed and external tables in Unity Catalog and describes how to create tables in Unity Catalog.

Note

When you create a table, be sure to reference a catalog that is governed by Unity Catalog or set the default catalog to a catalog that is governed by Unity Catalog. See Manage the default catalog.

The catalog hive_metastore appears in Catalog Explorer but is not considered governed by Unity Catalog. It is managed by your Azure Databricks workspace’s Hive metastore. All other catalogs listed are governed by Unity Catalog.

You can use the Unity Catalog table upgrade interface to upgrade existing tables registered in the Hive metastore to Unity Catalog. See Upgrade Hive tables and views to Unity Catalog.

Managed tables

Managed tables are the default way to create tables in Unity Catalog. Unity Catalog manages the lifecycle and file layout for these tables. You should not use tools outside of Azure Databricks to manipulate files in these tables directly.

Managed tables are stored in managed storage, either at the metastore, catalog, or schema level, depending on how the schema and catalog are configured. See Specify a managed storage location in Unity Catalog.

Managed tables always use the Delta table format.

When a managed table is dropped, its underlying data is deleted from your cloud tenant within 30 days.

External tables

External tables are tables whose data is stored outside of the managed storage location specified for the metastore, catalog, or schema. Use external tables only when you require direct access to the data outside of Azure Databricks clusters or Databricks SQL warehouses.

When you run DROP TABLE on an external table, Unity Catalog does not delete the underlying data. To drop a table you must be its owner. You can manage privileges on external tables and use them in queries in the same way as managed tables. To create an external table with SQL, specify a LOCATION path in your CREATE TABLE statement. External tables can use the following file formats:

  • DELTA
  • CSV
  • JSON
  • AVRO
  • PARQUET
  • ORC
  • TEXT

To manage access to the underlying cloud storage for an external table, you must set up storage credentials and external locations.

To learn more, see Create an external table.

Requirements

You must have the CREATE TABLE privilege on the schema in which you want to create the table, as well as the USE SCHEMA privilege on the schema and the USE CATALOG privilege on the parent catalog.

If you are creating an external table, see Create an external table for additional requirements.

Create a managed table

To create a managed table, run the following SQL command. Items in brackets are optional. Replace the placeholder values:

  • <catalog-name>: The name of the catalog that will contain the table..

    This cannot be the hive_metastore catalog that is created automatically for the Hive metastore associated with your Azure Databricks workspace. You can drop the catalog name if you are creating the table in the workspace’s default catalog.

  • <schema-name>: The name of the schema that will contain the table..

  • <table-name>: A name for the table.

  • <column-specification>: The name and data type for each column.

SQL

CREATE TABLE <catalog-name>.<schema-name>.<table-name>
(
  <column-specification>
);

Python

spark.sql("CREATE TABLE <catalog-name>.<schema-name>.<table-name> "
  "("
  "  <column-specification>"
  ")")

R

library(SparkR)

sql(paste("CREATE TABLE <catalog-name>.<schema-name>.<table-name> ",
  "(",
  "  <column-specification>",
  ")",
  sep = ""))

Scala

spark.sql("CREATE TABLE <catalog-name>.<schema-name>.<table-name> " +
  "(" +
  "  <column-specification>" +
  ")")

You can also create a managed table by using the Databricks Terraform provider and databricks_table. You can retrieve a list of table full names by using databricks_tables.

For example, to create the table main.default.department and insert five rows into it:

SQL

CREATE TABLE main.default.department
(
  deptcode  INT,
  deptname  STRING,
  location  STRING
);

INSERT INTO main.default.department VALUES
  (10, 'FINANCE', 'EDINBURGH'),
  (20, 'SOFTWARE', 'PADDINGTON'),
  (30, 'SALES', 'MAIDSTONE'),
  (40, 'MARKETING', 'DARLINGTON'),
  (50, 'ADMIN', 'BIRMINGHAM');

Python

spark.sql("CREATE TABLE main.default.department "
  "("
  "  deptcode  INT,"
  "  deptname  STRING,"
  "  location  STRING"
  ")"
  "INSERT INTO main.default.department VALUES "
  "  (10, 'FINANCE', 'EDINBURGH'),"
  "  (20, 'SOFTWARE', 'PADDINGTON'),"
  "  (30, 'SALES', 'MAIDSTONE'),"
  "  (40, 'MARKETING', 'DARLINGTON'),"
  "  (50, 'ADMIN', 'BIRMINGHAM')")

R

library(SparkR)

sql(paste("CREATE TABLE main.default.department ",
  "(",
  "  deptcode  INT,",
  "  deptname  STRING,",
  "  location  STRING",
  ")",
  "INSERT INTO main.default.department VALUES ",
  "  (10, 'FINANCE', 'EDINBURGH'),",
  "  (20, 'SOFTWARE', 'PADDINGTON'),",
  "  (30, 'SALES', 'MAIDSTONE'),",
  "  (40, 'MARKETING', 'DARLINGTON'),",
  "  (50, 'ADMIN', 'BIRMINGHAM')",
  sep = ""))

Scala

spark.sql("CREATE TABLE main.default.department " +
  "(" +
  "  deptcode  INT," +
  "  deptname  STRING," +
  "  location  STRING" +
  ")" +
  "INSERT INTO main.default.department VALUES " +
  "  (10, 'FINANCE', 'EDINBURGH')," +
  "  (20, 'SOFTWARE', 'PADDINGTON')," +
  "  (30, 'SALES', 'MAIDSTONE')," +
  "  (40, 'MARKETING', 'DARLINGTON')," +
  "  (50, 'ADMIN', 'BIRMINGHAM')")

Drop a managed table

You must be the table’s owner to drop a table. To drop a managed table, run the following SQL command:

DROP TABLE IF EXISTS catalog_name.schema_name.table_name;

When a managed table is dropped, its underlying data is deleted from your cloud tenant within 30 days.

Create an external table

The data in an external table is stored in a path on your cloud tenant. To work with external tables, Unity Catalog introduces two objects to access and work with external cloud storage:

  • A storage credential contains an authentication method for accessing a cloud storage location. The storage credential does not contain a mapping to the path to which it grants access. Storage credentials are access-controlled to determine which users can use the credential.
  • An external location maps a storage credential with a cloud storage path to which it grants access. The external location grants access only to that cloud storage path and its contents. External locations are access-controlled to determine which users can use them. An external location is used automatically when your SQL command contains a LOCATION clause.

Requirements

To create an external table, you must have:

  • The CREATE EXTERNAL TABLE privilege on an external location that grants access to the LOCATION accessed by the external table.
  • The USE SCHEMA permission on the table’s parent schema.
  • The USE CATALOG permission on the table’s parent catalog.
  • The CREATE TABLE permission on the table’s parent schema.

External locations and storage credentials are stored at the metastore level, rather than in a catalog. To create a storage credential, you must be an account admin or have the CREATE STORAGE CREDENTIAL privilege. To create an external location, you must be the metastore admin or have the CREATE EXTERNAL LOCATION privilege. See Connect to cloud object storage using Unity Catalog.

Create a table

Use one of the following command examples in a notebook or the SQL query editor to create an external table.

You can also use an example notebook to create the storage credential, external location, and external table, and also manage permissions for them.

In the following examples, replace the placeholder values:

  • <catalog>: The name of the catalog that will contain the table.

    This cannot be the hive_metastore catalog that is created automatically for the Hive metastore associated with your Azure Databricks workspace. You can drop the catalog name if you are creating the table in the workspace’s default catalog.

  • <schema>: The name of the schema that will contain the table.

  • <table-name>: A name for the table.

  • <column-specification>: The name and data type for each column.

  • <bucket-path>: The path to the cloud storage bucket where the table will be created.

  • <table-directory>: A directory where the table will be created. Use a unique directory for each table.

Important

Once a table is created in a path, users can no longer directly access the files in that path from Azure Databricks even if they have been given privileges on an external location or storage credential to do so. This is to ensure that users cannot circumvent access controls applied to tables by reading files from your cloud tenant directly.

SQL

CREATE TABLE <catalog>.<schema>.<table-name>
(
  <column-specification>
)
LOCATION 'abfss://<bucket-path>/<table-directory>';

Python

spark.sql("CREATE TABLE <catalog>.<schema>.<table-name> "
  "("
  "  <column-specification>"
  ") "
  "LOCATION 'abfss://<bucket-path>/<table-directory>'")

R

library(SparkR)

sql(paste("CREATE TABLE <catalog>.<schema>.<table-name> ",
  "(",
  "  <column-specification>",
  ") ",
  "LOCATION 'abfss://<bucket-path>/<table-directory>'",
  sep = ""))

Scala

spark.sql("CREATE TABLE <catalog>.<schema>.<table-name> " +
  "(" +
  "  <column-specification>" +
  ") " +
  "LOCATION 'abfss://<bucket-path>/<table-directory>'")

Unity Catalog checks that you have the following permissions:

  • CREATE EXTERNAL TABLE on the external location that references the cloud storage path you specify.
  • CREATE TABLE on the parent schema.
  • USE SCHEMA on the parent schema.
  • USE CATALOG on the parent catalog.

If you do, the external table is created. Otherwise, an error occurs and the external table is not created.

Note

You can instead migrate an existing external table in the Hive metastore to Unity Catalog without duplicating its data. See Upgrade a single Hive table to a Unity Catalog external table using the upgrade wizard.

You can also create an external table by using the Databricks Terraform provider and databricks_table. You can retrieve a list of table full names by using databricks_tables.

Example notebook: Create external tables

You can use the following example notebook to create a catalog, schema, and external table, and to manage permissions on them.

Create and manage an external table in Unity Catalog notebook

Get notebook

Create a table from files stored in your cloud tenant

You can populate a managed or external table with records from files stored in your cloud tenant. Unity Catalog reads the files at that location and inserts their contents into the table. In Unity Catalog, this is called path-based-access.

You can follow the examples in this section or use the add data UI.

Explore the contents of the files

To explore data stored in an external location before you create tables from that data, you can use Catalog Explorer or the following commands.

Permissions required: You must have the READ FILES permission on the external location associated with the cloud storage path to return a list of data files in that location.

SQL

  1. List the files in a cloud storage path:

    LIST 'abfss://<path-to-files>';
    
  2. Query the data in the files in a given path:

    SELECT * FROM <format>.`abfss://<path-to-files>`;
    

Python

  1. List the files in a cloud storage path:

    display(spark.sql("LIST 'abfss://<path-to-files>'"))
    
  2. Query the data in the files in a given path:

    display(spark.read.load("abfss://<path-to-files>"))
    

R

  1. List the files in a cloud storage path:

    library(SparkR)
    
    display(sql("LIST 'abfss://<path-to-files>'"))
    
  2. Query the data in the files in a given path:

    library(SparkR)
    
    display(loadDF("abfss://<path-to-files>"))
    

Scala

  1. List the files in a cloud storage path:

    display(spark.sql("LIST 'abfss://<path-to-files>'"))
    
  2. Query the data in the files in a given path:

    display(spark.read.load("abfss://<path-to-files>"))
    

Create a table from the files

Follow the examples in this section to create a new table and populate it with data files on your cloud tenant.

Note

You can instead migrate an existing external table in the Hive metastore to Unity Catalog without duplicating its data. See Upgrade a single Hive table to a Unity Catalog external table using the upgrade wizard.

Important

  • When you create a table using this method, the storage path is read only once, to prevent duplication of records. If you want to re-read the contents of the directory, you must drop and re-create the table. For an existing table, you can insert records from a storage path.
  • The bucket path where you create a table cannot also be used to read or write data files.
  • Only the files in the exact directory are read; the read is not recursive.
  • You must have the following permissions:
    • USE CATALOG on the parent catalog and USE SCHEMA on the schema.
    • CREATE TABLE on the parent schema.
    • READ FILES on the external location associated with the bucket path where the files are located, or directly on the storage credential if you are not using an external location.
    • If you are creating an external table, you need CREATE EXTERNAL TABLE on the bucket path where the table will be created.

To create a new managed table and populate it with data in your cloud storage, use the following examples.

SQL

CREATE TABLE <catalog>.<schema>.<table-name>
(
  <column-specification>
)
SELECT * from <format>.`abfss://<path-to-files>`;

Python

spark.sql("CREATE TABLE <catalog>.<schema>.<table-name> "
  "( "
  "  <column-specification> "
  ") "
  "SELECT * from <format>.`abfss://<path-to-files>`")

R

library(SparkR)

sql(paste("CREATE TABLE <catalog>.<schema>.<table-name> ",
  "( ",
  "  <column-specification> ",
  ") ",
  "SELECT * from <format>.`abfss://<path-to-files>`",
  sep = ""))

Scala

spark.sql("CREATE TABLE <catalog>.<schema>.<table-name> " +
  "( " +
  "  <column-specification> " +
  ") " +
  "SELECT * from <format>.`abfss://<path-to-files>`")

To create an external table and populate it with data in your cloud storage, add a LOCATION clause:

SQL

CREATE TABLE <catalog>.<schema>.<table-name>
(
    <column-specification>
)
USING <format>
LOCATION 'abfss://<table-location>'
SELECT * from <format>.`abfss://<path-to-files>`;

Python

spark.sql("CREATE TABLE <catalog>.<schema>.<table-name> "
  "( "
  "  <column-specification> "
  ") "
  "USING <format> "
  "LOCATION 'abfss://<table-location>' "
  "SELECT * from <format>.`abfss://<path-to-files>`")

R

library(SparkR)

sql(paste("CREATE TABLE <catalog>.<schema>.<table-name> ",
  "( ",
  "  <column-specification> ",
  ") ",
  "USING <format> ",
  "LOCATION 'abfss://<table-location>' ",
  "SELECT * from <format>.`abfss://<path-to-files>`",
  sep = ""))

Scala

spark.sql("CREATE TABLE <catalog>.<schema>.<table-name> " +
  "( " +
  "  <column-specification> " +
  ") " +
  "USING <format> " +
  "LOCATION 'abfss://<table-location>' " +
  "SELECT * from <format>.`abfss://<path-to-files>`")

Insert records from a path into an existing table

To insert records from a bucket path into an existing table, use the COPY INTO command. In the following examples, replace the placeholder values:

  • <catalog>: The name of the table’s parent catalog.
  • <schema>: The name of the table’s parent schema.
  • <path-to-files>: The bucket path that contains the data files.
  • <format>: The format of the files, for example delta.
  • <table-location>: The bucket path where the table will be created.

Important

  • When you insert records into a table using this method, the bucket path you provide is read only once, to prevent duplication of records.
  • The bucket path where you create a table cannot also be used to read or write data files.
  • Only the files in the exact directory are read; the read is not recursive.
  • You must have the following permissions:
    • USE CATALOG on the parent catalog and USE SCHEMA on the schema.
    • MODIFY on the table.
    • READ FILES on the external location associated with the bucket path where the files are located, or directly on the storage credential if you are not using an external location.
    • To insert records into an external table, you need CREATE EXTERNAL TABLE on the bucket path where the table is located.

To insert records from files in a bucket path into a managed table, using an external location to read from the bucket path:

SQL

COPY INTO <catalog>.<schema>.<table>
FROM (
  SELECT *
  FROM 'abfss://<path-to-files>'
)
FILEFORMAT = <format>;

Python

spark.sql("COPY INTO <catalog>.<schema>.<table> "
  "FROM ( "
  "  SELECT * "
  "  FROM 'abfss://<path-to-files>' "
  ") "
  "FILEFORMAT = <format>")

R

library(SparkR)

sql(paste("COPY INTO <catalog>.<schema>.<table> ",
  "FROM ( ",
  "  SELECT * ",
  "  FROM 'abfss://<path-to-files>' ",
  ") ",
  "FILEFORMAT = <format>",
  sep = ""))

Scala

spark.sql("COPY INTO <catalog>.<schema>.<table> " +
  "FROM ( " +
  "  SELECT * " +
  "  FROM 'abfss://<path-to-files>' " +
  ") " +
  "FILEFORMAT = <format>")

To insert into an external table, add a LOCATION clause:

SQL

COPY INTO <catalog>.<schema>.<table>
LOCATION 'abfss://<table-location>'
FROM (
  SELECT *
  FROM 'abfss://<path-to-files>'
)
FILEFORMAT = <format>;

Python

spark.sql("COPY INTO <catalog>.<schema>.<table> "
  "LOCATION 'abfss://<table-location>' "
  "FROM ( "
  "  SELECT * "
  "  FROM 'abfss://<path-to-files>' "
  ") "
  "FILEFORMAT = <format>")

R

library(SparkR)

sql(paste("COPY INTO <catalog>.<schema>.<table> ",
  "LOCATION 'abfss://<table-location>' ",
  "FROM ( ",
  "  SELECT * ",
  "  FROM 'abfss://<path-to-files>' ",
  ") ",
  "FILEFORMAT = <format>",
  sep = ""))

Scala

spark.sql("COPY INTO <catalog>.<schema>.<table> " +
  "LOCATION 'abfss://<table-location>' " +
  "FROM ( " +
  "  SELECT * " +
  "  FROM 'abfss://<path-to-files>' " +
  ") " +
  "FILEFORMAT = <format>")

Add comments to a table

As a table owner or a user with the MODIFY privilege on a table, you can add comments to a table and its columns. You can add comments using the following functionality:

  • The COMMENT ON command. This option does not support column comments.
  • The COMMENT option when you use the CREATE TABLE and ALTER TABLE commands. This option supports column comments.
  • The “manual” comment field in Catalog Explorer. This option supports column comments. See Document data in Catalog Explorer using markdown comments.
  • AI-generated comments (also known as AI-generated documentation) in Catalog Explorer. You can view a comment suggested by a large language model (LLM) that takes into account the table metadata, such as the table schema and column names, and edit or accept the comment as-is to add it. See Add AI-generated comments to a table.

Next steps