Manage external locations and storage credentials

This article introduces external locations and storage credentials and explains how to create and use them to manage access to external tables.

What are external locations and storage credentials?

External locations and storage credentials allow Unity Catalog to read and write data on your cloud tenant on behalf of users. These objects are used for:

A storage credential represents an authentication and authorization mechanism for accessing data stored on your cloud tenant, using either an Azure managed identity (strongly recommended) or a service principal. Each storage credential is subject to Unity Catalog access-control policies that control which users and groups can access the credential. If a user does not have access to a storage credential in Unity Catalog, the request fails and Unity Catalog does not attempt to authenticate to your cloud tenant on the user’s behalf.

An external location is an object that combines a cloud storage path with a storage credential that authorizes access to the cloud storage path. Each storage location is subject to Unity Catalog access-control policies that control which users and groups can access the credential. If a user does not have access to a storage location in Unity Catalog, the request fails and Unity Catalog does not attempt to authenticate to your cloud tenant on the user’s behalf.

Databricks recommends using external locations rather than using storage credentials directly.

Requirements

  • To create storage credentials, you must be an Azure Databricks account admin. The account admin who creates the storage credential can delegate ownership to another user or group to manage permissions on it.
  • To create external locations, you must be a metastore admin or a user with the CREATE EXTERNAL LOCATION privilege.
  • Unity Catalog only supports Azure Data Lake Storage Gen2 for external locations.

Manage storage credentials

The following sections show how to create and manage storage credentials.

Create a storage credential

You can use either an Azure managed identity or a service principal as the identity that authorizes access to your storage container. Managed identities are strongly recommended. They have the benefit of allowing Unity Catalog to access storage accounts protected by network rules, which isn’t possible using service principals, and they remove the need to manage and rotate secrets.

To create a storage credential using a managed identity:

  1. Create an Azure Databricks access connector and assign it permissions to the storage container that you would like to access, using the instructions in Configure a managed identity for Unity Catalog.

    An Azure Databricks access connector is a first-party Azure resource that lets you connect managed identities to an Azure Databricks account.

    Make note of the access connector’s resource ID.

  2. Log in to your Unity Catalog-enabled Azure Databricks workspace as a workspace admin.

  3. Click Data Icon Data.

  4. Click the + menu at the upper right and select Add a storage credential.

  5. On the Create a new storage credential dialog, select Managed identity (recommended).

  6. Enter a name for the credential, and enter the access connector’s resource ID in the format:

    /subscriptions/12f34567-8ace-9c10-111c-aea8eba12345c/resourceGroups/<resource_group>/providers/Microsoft.Databricks/accessConnectors/<connector-name>
    
  7. Click Save.

  8. Create an external location that references this storage credential.

To create a storage credential using a service principal:

  1. In the Azure Portal, create a service principal and grant it access to your storage account.
    1. Create a client secret for the service principal and make a note of it.
    2. Make a note of the directory ID, and application ID for the service principal.
    3. Go to your storage account and grant the service principal the Azure Blob Contributor role.
  2. Log in to your Unity Catalog-enabled Azure Databricks workspace as a workspace admin.
  3. From the persona switcher at the top of the sidebar, select SQL
  4. Click Data Icon Data.
  5. Click the + menu at the upper right and select Add a storage credential.
  6. On the Create a new storage credential dialog, select Service principal.
  7. Enter a name for the credential, along with the directory ID, application ID, and client secret of the service principal that has been granted the Azure Blob Contributor role on the storage container you want to access.
  8. Click Save.
  9. Create an external location that references this storage credential.

You can also create a storage credential by using Databricks Terraform provider and databricks_storage_credential.

List storage credentials

To view the list of all storage credentials in a metastore, you can use Data Explorer or a SQL command.

Data explorer

  1. Log in to a workspace that is linked to the metastore.
  2. Click Data Icon Data.
  3. At the bottom of the screen, click Storage Credentials.

Sql

Run the following command in a notebook or the Databricks SQL editor.

SHOW STORAGE CREDENTIALS;

Python

Run the following command in a notebook.

display(spark.sql("SHOW STORAGE CREDENTIALS"))

R

Run the following command in a notebook.

library(SparkR)

display(sql("SHOW STORAGE CREDENTIALS"))

Scala

Run the following command in a notebook.

display(spark.sql("SHOW STORAGE CREDENTIALS"))

View a storage credential

To view the properties of a storage credential, you can use Data Explorer or a SQL command.

Data explorer

  1. Log in to a workspace that is linked to the metastore.
  2. Click Data Icon Data.
  3. At the bottom of the screen, click Storage Credentials.
  4. Click the name of a storage credential to see its properties.

Sql

Run the following command in a notebook or the Databricks SQL editor. Replace <credential_name> with the name of the credential.

DESCRIBE STORAGE CREDENTIAL <credential_name>;

Python

Run the following command in a notebook. Replace <credential_name> with the name of the credential.

display(spark.sql("DESCRIBE STORAGE CREDENTIAL <credential_name>"))

R

Run the following command in a notebook. Replace <credential_name> with the name of the credential.

library(SparkR)

display(sql("DESCRIBE STORAGE CREDENTIAL <credential_name>"))

Scala

Run the following command in a notebook. Replace <credential_name> with the name of the credential.

display(spark.sql("DESCRIBE STORAGE CREDENTIAL <credential_name>"))

Rename a storage credential

To rename a storage credential, you can use Data Explorer or a SQL command.

Data explorer

  1. Log in to a workspace that is linked to the metastore.
  2. Click Data Icon Data.
  3. At the bottom of the screen, click Storage Credentials.
  4. Click the name of a storage credential to open the edit dialog.
  5. Rename the storage credential and save it.

Sql

Run the following command in a notebook or the Databricks SQL editor. Replace the placeholder values:

  • <credential_name>: The name of the credential.
  • <new_credential_name>: A new name for the credential.
ALTER STORAGE CREDENTIAL <credential_name> RENAME TO <new_credential_name>;

Python

Run the following command in a notebook. Replace the placeholder values:

  • <credential_name>: The name of the credential.
  • <new_credential_name>: A new name for the credential.
spark.sql("ALTER STORAGE CREDENTIAL <credential_name> RENAME TO <new_credential_name>")

R

Run the following command in a notebook. Replace the placeholder values:

  • <credential_name>: The name of the credential.
  • <new_credential_name>: A new name for the credential.
library(SparkR)

sql("ALTER STORAGE CREDENTIAL <credential_name> RENAME TO <new_credential_name>")

Scala

Run the following command in a notebook. Replace the placeholder values:

  • <credential_name>: The name of the credential.
  • <new_credential_name>: A new name for the credential.
spark.sql("ALTER STORAGE CREDENTIAL <credential_name> RENAME TO <new_credential_name>")

Manage permissions for a storage credential

You can grant permissions directly on the storage credential, but Databricks recommends that you reference it in an external location and grant permissions to that instead. An external location combines a storage credential with a specific path, and authorizes access only to that path and its contents.

You can manage permissions for a storage credential using Data Explorer, SQL commands, or Terraform. You can grant and revoke the following permissions on a storage credential:

  • CREATE TABLE
  • READ FILES
  • WRITE FILES

In the following examples, replace the placeholder values:

  • <principal>: The email address of the account-level user or the name of the account level group to whom to grant the permission.
  • <storage_credential_name>: The name of a storage credential.

To show grants on a storage credential, use a command like the following. You can optionally filter the results to show only the grants for the specified principal.

SQL

SHOW GRANTS [<principal>] ON STORAGE CREDENTIAL <storage_credential_name>;

Python

display(spark.sql("SHOW GRANTS [<principal>] ON STORAGE CREDENTIAL <storage_credential_name>"))

R

library(SparkR)
display(sql("SHOW GRANTS [<principal>] ON STORAGE CREDENTIAL <storage_credential_name>"))

Scala

display(spark.sql("SHOW GRANTS [<principal>] ON STORAGE CREDENTIAL <storage_credential_name>"))

To grant permission to create an external table using a storage credential directly:

SQL

GRANT CREATE EXTERNAL TABLE ON STORAGE CREDENTIAL <storage_credential_name> TO <principal>;

Python

spark.sql("GRANT CREATE EXTERNAL TABLE ON STORAGE CREDENTIAL <storage_credential_name> TO <principal>")

R

library(SparkR)
sql("GRANT CREATE EXTERNAL TABLE ON STORAGE CREDENTIAL <storage_credential_name> TO <principal>")

Scala

spark.sql("GRANT CREATE EXTERNAL TABLE ON STORAGE CREDENTIAL <storage_credential_name> TO <principal>")

To grant permission to select from an external table using a storage credential directly:

SQL

GRANT READ FILES ON STORAGE CREDENTIAL <storage_credential_name> TO <principal>;

Python

spark.sql("GRANT READ FILES ON STORAGE CREDENTIAL <storage_credential_name> TO <principal>")

R

library(SparkR)
sql("GRANT READ FILES ON STORAGE CREDENTIAL <storage_credential_name> TO <principal>")

Scala

spark.sql("GRANT READ FILES ON STORAGE CREDENTIAL <storage_credential_name> TO <principal>")

Note

If a group name contains a space, use back-ticks around it (not apostrophes).

Change the owner of a storage credential

A storage credential’s creator is its initial owner. To change the owner to a different account-level user or group, do the following:

Sql

Run the following command in a notebook or the Databricks SQL editor. Replace the placeholder values:

  • <credential_name>: The name of the credential.
  • <principal>: The email address of an account-level user or the name of an account-level group.
ALTER STORAGE CREDENTIAL <credential_name> OWNER TO <principal>;

Python

Run the following command in a notebook. Replace the placeholder values:

  • <credential_name>: The name of the credential.
  • <principal>: The email address of an account-level user or the name of an account-level group.
spark.sql("ALTER STORAGE CREDENTIAL <credential_name> OWNER TO <principal>")

R

Run the following command in a notebook. Replace the placeholder values:

  • <credential_name>: The name of the credential.
  • <principal>: The email address of an account-level user or the name of an account-level group.
library(SparkR)

sql("ALTER STORAGE CREDENTIAL <credential_name> OWNER TO <principal>")

Scala

Run the following command in a notebook. Replace the placeholder values:

  • <credential_name>: The name of the credential.
  • <principal>: The email address of an account-level user or the name of an account-level group.
spark.sql("ALTER STORAGE CREDENTIAL <credential_name> OWNER TO <principal>")

Delete a storage credential

To delete a storage credential, you can use Data Explorer or a SQL command.

Data explorer

  1. Log in to a workspace that is linked to the metastore.
  2. Click Data Icon Data.
  3. At the bottom of the screen, click Storage Credentials.
  4. Click the name of a storage credential to open the edit dialog.
  5. Click the Delete button.

Sql

Run the following command in a notebook or the Databricks SQL editor. Replace <credential_name> with the name of the credential. Portions of the command that are in brackets are optional. By default, if the credential is used by an external location, it is not deleted. Replace <credential_name> with the name of the credential.

  • IF EXISTS does not return an error if the credential does not exist.
DROP STORAGE CREDENTIAL [IF EXISTS] <credential_name>;

Python

Run the following command in a notebook. Replace <credential_name> with the name of the credential. Portions of the command that are in brackets are optional. By default, if the credential is used by an external location, it is not deleted. Replace <credential_name> with the name of the credential.

  • IF EXISTS does not return an error if the credential does not exist.

  • <credential_name>: The name of the credential.

    • <principal>: The email address of an account-level user or the name of an account-level group.
    spark.sql("DROP STORAGE CREDENTIAL [IF EXISTS] <credential_name>")
    

R

Run the following command in a notebook. Replace <credential_name> with the name of the credential. Portions of the command that are in brackets are optional. By default, if the credential is used by an external location, it is not deleted. Replace <credential_name> with the name of the credential.

  • IF EXISTS does not return an error if the credential does not exist.
 library(SparkR)

 sql("DROP STORAGE CREDENTIAL [IF EXISTS] <credential_name>")

Scala

Run the following command in a notebook. Replace <credential_name> with the name of the credential. Portions of the command that are in brackets are optional. By default, if the credential is used by an external location, it is not deleted. Replace <credential_name> with the name of the credential.

  • IF EXISTS does not return an error if the credential does not exist.
 spark.sql("DROP STORAGE CREDENTIAL [IF EXISTS] <credential_name>")

Manage external locations

The following sections illustrate how to create and manage external locations.

Create an external location

You can create an external location using Data Explorer, a SQL command, or Terraform.

Run the following SQL command in a notebook or the Databricks SQL editor. Replace the placeholder values:

  • <location_name>: A name for the external location.

  • <bucket_path>: The path in your cloud tenant that this external location grants access to.

  • <storage_credential_name>: The name of the storage credential that contains details about a service principal that is authorized to read to and write from the storage container path.

Note

  • Each cloud storage path can be associated with only one external location. If you attempt to create a second external location that references the same path, the command fails.
  • External locations only support Azure Data Lake Storage Gen2 storage.

SQL

CREATE EXTERNAL LOCATION <location_name>
URL 'abfss://<container_name>@<storage_account>.dfs.core.windows.net/<path>'
WITH ([STORAGE] CREDENTIAL <storage_credential_name>);

Python

spark.sql("CREATE EXTERNAL LOCATION <location_name> "
  "URL 'abfss://<container_name>@<storage_account>.dfs.core.windows.net/<path>' "
  "WITH ([STORAGE] CREDENTIAL <storage_credential_name>)")

R

library(SparkR)
sql(paste("CREATE EXTERNAL LOCATION <location_name> ",
  "URL 'abfss://<container_name>@<storage_account>.dfs.core.windows.net/<path>' ",
  "WITH ([STORAGE] CREDENTIAL <storage_credential_name>)",
  sep = ""))

Scala

spark.sql("CREATE EXTERNAL LOCATION <location_name> " +
  "URL 'abfss://<container_name>@<storage_account>.dfs.core.windows.net/<path>' " +
  "WITH ([STORAGE] CREDENTIAL <storage_credential_name>)")

Describe an external location

To see the properties of an external location, you can use Data Explorer or a SQL command.

Data explorer

  1. Log in to a workspace that is linked to the metastore.
  2. Click Data Icon Data.
  3. At the bottom of the screen, click External Locations.
  4. Click the name of an external location to see its properties.

Sql

Run the following command in a notebook or the Databricks SQL editor. Replace <credential_name> with the name of the credential.

DESCRIBE EXTERNAL LOCATION <location_name>;

Python

Run the following command in a notebook. Replace <credential_name> with the name of the credential.

display(spark.sql("DESCRIBE EXTERNAL LOCATION <location_name>"))

R

Run the following command in a notebook. Replace <credential_name> with the name of the credential.

library(SparkR)

display(sql("DESCRIBE EXTERNAL LOCATION <location_name>"))

Scala

Run the following command in a notebook. Replace <credential_name> with the name of the credential.

display(spark.sql("DESCRIBE EXTERNAL LOCATION <location_name>"))

Modify an external location

An external location’s owner can rename, change the URI, and change the storage credential of the external location.

To rename an external location, do the following:

Sql

Run the following command in a notebook or the Databricks SQL editor. Replace the placeholder values:

  • <location_name>: The name of the location.
  • <new_location_name>: A new name for the location.
ALTER EXTERNAL LOCATION <location_name> RENAME TO <new_location_name>;

Python

Run the following command in a notebook. Replace the placeholder values:

  • <location_name>: The name of the location.
  • <new_location_name>: A new name for the location.
spark.sql("ALTER EXTERNAL LOCATION <location_name> RENAME TO <new_location_name>")

R

Run the following command in a notebook. Replace the placeholder values:

  • <location_name>: The name of the location.
  • <new_location_name>: A new name for the location.
library(SparkR)

sql("ALTER EXTERNAL LOCATION <location_name> RENAME TO <new_location_name>")

Scala

Run the following command in a notebook. Replace the placeholder values:

  • <location_name>: The name of the location.
  • <new_location_name>: A new name for the location.
spark.sql("ALTER EXTERNAL LOCATION <location_name> RENAME TO <new_location_name>")

To change the URI that an external location points to in your cloud tenant, do the following:

Sql

Run the following command in a notebook or the Databricks SQL editor. Replace the placeholder values:

  • <location_name>: The name of the external location.
  • <url>: The new storage URL the location should authorize access to in your cloud tenant.
ALTER EXTERNAL LOCATION location_name SET URL `<url>` [FORCE];

Python

Run the following command in a notebook. Replace the placeholder values:

  • <location_name>: The name of the external location.
  • <url>: The new storage URL the location should authorize access to in your cloud tenant.
spark.sql("ALTER EXTERNAL LOCATION location_name SET URL `<url>` [FORCE]")

R

Run the following command in a notebook. Replace the placeholder values:

  • <location_name>: The name of the external location.
  • <url>: The new storage URL the location should authorize access to in your cloud tenant.
library(SparkR)

sql("ALTER EXTERNAL LOCATION location_name SET URL `<url>` [FORCE]")

Scala

Run the following command in a notebook. Replace the placeholder values:

  • <location_name>: The name of the external location.
  • <url>: The new storage URL the location should authorize access to in your cloud tenant.
spark.sql("ALTER EXTERNAL LOCATION location_name SET URL `<url>` [FORCE]")

The FORCE option changes the URL even if external tables depend upon the external location.

To change the storage credential that an external location uses, do the following:

Sql

Run the following command in a notebook or the Databricks SQL editor. Replace the placeholder values:

  • <location_name>: The name of the external location.
  • <credential_name>: The name of the storage credential that grants access to the location’s URL in your cloud tenant.
ALTER EXTERNAL LOCATION <location_name> SET STORAGE CREDENTIAL <credential_name>;

Python

Run the following command in a notebook. Replace the placeholder values:

  • <location_name>: The name of the external location.
  • <credential_name>: The name of the storage credential that grants access to the location’s URL in your cloud tenant.
spark.sql("ALTER EXTERNAL LOCATION <location_name> SET STORAGE CREDENTIAL <credential_name>")

R

Run the following command in a notebook. Replace the placeholder values:

  • <location_name>: The name of the external location.
  • <credential_name>: The name of the storage credential that grants access to the location’s URL in your cloud tenant.
library(SparkR)

sql("ALTER EXTERNAL LOCATION <location_name> SET STORAGE CREDENTIAL <credential_name>")

Scala

Run the following command in a notebook. Replace the placeholder values:

  • <location_name>: The name of the external location.
  • <credential_name>: The name of the storage credential that grants access to the location’s URL in your cloud tenant.
spark.sql("ALTER EXTERNAL LOCATION <location_name> SET STORAGE CREDENTIAL <credential_name>")

Manage permissions for an external location

You can grant and revoke the following permissions on an external location using Data Explorer, a SQL command, or Terraform:

  • CREATE TABLE
  • READ FILES
  • WRITE FILES

In the following examples, replace the placeholder values:

  • <principal>: The email address of the account-level user or the name of the account level group to whom to grant the permission.

  • <location_name>: The name of the external location that authorizes reading from and writing to the storage container path in your cloud tenant.

  • <principal>: The email address of an account-level user or the name of an account-level group.

To show grants on an external location, use a command like the following. You can optionally filter the results to show only the grants for the specified principal.

SQL

SHOW GRANTS [<principal>] ON EXTERNAL LOCATION <location_name>;

Python

display(spark.sql("SHOW GRANTS [<principal>] ON EXTERNAL LOCATION <location_name>"))

R

library(SparkR)

display(sql("SHOW GRANTS [<principal>] ON EXTERNAL LOCATION <location_name>"))

Scala

display(spark.sql("SHOW GRANTS [<principal>] ON EXTERNAL LOCATION <location_name>"))

To grant permission to use an external location to create a table:

SQL

GRANT CREATE EXTERNAL TABLE ON EXTERNAL LOCATION <location_name> TO <principal>;

Python

spark.sql("GRANT CREATE EXTERNAL TABLE ON EXTERNAL LOCATION <location_name> TO <principal>")

R

library(SparkR)

sql("GRANT CREATE EXTERNAL TABLE ON EXTERNAL LOCATION <location_name> TO <principal>")

Scala

spark.sql("GRANT CREATE EXTERNAL TABLE ON EXTERNAL LOCATION <location_name> TO <principal>")

To grant permission to read files from an external location:

SQL

GRANT READ FILES ON EXTERNAL LOCATION <location_name> TO <principal>;

Python

spark.sql("GRANT READ FILES ON EXTERNAL LOCATION <location_name> TO <principal>")

R

library(SparkR)

sql("GRANT READ FILES ON EXTERNAL LOCATION <location_name> TO <principal>")

Scala

spark.sql("GRANT READ FILES ON EXTERNAL LOCATION <location_name> TO <principal>")

Note

If a group name contains a space, use back-ticks around it (not apostrophes).

Change the owner of an external location

An external location’s creator is its initial owner. To change the owner to a different account-level user or group, run the following command in a notebook or the Databricks SQL editor or use Data Explorer. Replace the placeholder values:

  • <location_name>: The name of the credential.
  • <principal>: The email address of an account-level user or the name of an account-level group.
ALTER EXTERNAL LOCATION <location_name> OWNER TO <principal>

Delete an external location

To delete an external location, do the following:

Sql

Run the following command in a notebook or the Databricks SQL editor. Items in brackets are optional. Replace <location_name> with the name of the external location.

DROP EXTERNAL LOCATION [IF EXISTS] <location_name>;

Python

Run the following command in a notebook. Items in brackets are optional. Replace <location_name> with the name of the external location.

spark.sql("DROP EXTERNAL LOCATION [IF EXISTS] <location_name>")

R

Run the following command in a notebook. Items in brackets are optional. Replace <location_name> with the name of the external location.

library(SparkR)

sql("DROP EXTERNAL LOCATION [IF EXISTS] <location_name>")

Scala

Run the following command in a notebook. Items in brackets are optional. Replace <location_name> with the name of the external location.

spark.sql("DROP EXTERNAL LOCATION [IF EXISTS] <location_name>")

Next steps