Manage external locations and storage credentials
This article introduces external locations and storage credentials and explains how to create and use them to manage access to your data.
What are external locations and storage credentials?
External locations and storage credentials allow Unity Catalog to read and write data on your cloud tenant on behalf of users. These objects are used for:
- Creating, reading from, and writing to external tables.
- Overriding the metastore’s default managed table storage location at the catalog or schema level.
- Creating a managed or external table from files stored on your cloud tenant.
- Inserting records into tables from files stored on your cloud tenant.
- Directly exploring data files stored on your cloud tenant.
A storage credential represents an authentication and authorization mechanism for accessing data stored on your cloud tenant, using either an Azure managed identity (strongly recommended) or a service principal. Each storage credential is subject to Unity Catalog access-control policies that control which users and groups can access the credential. If a user does not have access to a storage credential in Unity Catalog, the request fails and Unity Catalog does not attempt to authenticate to your cloud tenant on the user’s behalf. You can mark a storage credential as read-only to prevent users from writing to external locations that use the storage credential.
An external location is an object that combines a cloud storage path with a storage credential that authorizes access to the cloud storage path. Each storage location is subject to Unity Catalog access-control policies that control which users and groups can access the credential. If a user does not have access to a storage location in Unity Catalog, the request fails and Unity Catalog does not attempt to authenticate to your cloud tenant on the user’s behalf. You can mark an external location as read-only to prevent users from writing to that location, which means that users cannot create tables or volumes (whether external or managed) in that location.
Note
Despite the term “external” in the name, external locations can be used not just to define storage locations for external tables and volumes, but also for managed tables tables and volumes. Specifically, they can be used to define storage locations for managed tables and volumes at the catalog and schema levels, overriding the metastore root storage location. See CREATE CATALOG and CREATE SCHEMA.
Databricks recommends using external locations rather than using storage credentials directly.
Requirements
- To create storage credentials, you must be an Azure Databricks account admin. The account admin who creates the storage credential can delegate ownership to another user or group to manage permissions on it.
- To create external locations, you must be a metastore admin or a user with the
CREATE EXTERNAL LOCATION
privilege. - External locations must use Azure Data Lake Storage Gen2 storage accounts that have a hierarchical namespace.
Manage storage credentials
The following sections show how to create and manage storage credentials.
Create a storage credential
You can use either an Azure managed identity or a service principal as the identity that authorizes access to your storage container. Managed identities are strongly recommended. They have the benefit of allowing Unity Catalog to access storage accounts protected by network rules, which isn’t possible using service principals, and they remove the need to manage and rotate secrets.
Create a storage credential using a managed identity
Create an Azure Databricks access connector and assign it permissions to the storage container that you would like to access, using the instructions in Configure a managed identity for Unity Catalog.
An Azure Databricks access connector is a first-party Azure resource that lets you connect managed identities to an Azure Databricks account.
Make a note of the access connector’s resource ID.
Log in to your Unity Catalog-enabled Azure Databricks workspace as a user who has the account admin role on the Azure Databricks account.
Click
Catalog.
At the bottom of the screen, click Storage Credentials.
Click +Add > Add a storage credential.
Enter a name for the credential, and enter the access connector’s resource ID in the format:
/subscriptions/12f34567-8ace-9c10-111c-aea8eba12345c/resourceGroups/<resource-group>/providers/Microsoft.Databricks/accessConnectors/<connector-name>
(Optional) If you created the access connector using a user-assigned managed identity, enter the resource ID of the managed identity in the User-assigned managed identity ID field, in the format:
/subscriptions/<subscription-id>/resourceGroups/<resource-group-name>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<managed-identity-name>
(Optional) If you want users to have read-only access to the external locations that use this storage credential, select Read only. For more information, see Mark an external location or storage credential as read-only.
Click Save.
Create an external location that references this storage credential.
Create a storage credential using a service principal
First, create a service principal and grant it access to your storage account following Access storage with Azure Active Directory.
You cannot add a service principal storage credential using Catalog Explorer. Instead, use the Storage Credentials API. For example:
curl -X PATCH -n \
https://<databricks-instance>/api/2.1/unity-catalog/storage-credentials \
-d '{
"name": "<storage-credential-name>",
"read_only": true,
"azure_service_principal": {
"directory_id": "<directory-id>",
"application_id": "<application-id>",
"client_secret": "<client-secret>"
},
"skip_validation": "false"
}'
You can also create a storage credential by using Databricks Terraform provider and databricks_storage_credential.
List storage credentials
To view the list of all storage credentials in a metastore, you can use Catalog Explorer or a SQL command.
Data explorer
- Log in to a workspace that is linked to the metastore.
- Click
Catalog.
- At the bottom of the screen, click Storage Credentials.
Sql
Run the following command in a notebook or the Databricks SQL editor.
SHOW STORAGE CREDENTIALS;
Python
Run the following command in a notebook.
display(spark.sql("SHOW STORAGE CREDENTIALS"))
R
Run the following command in a notebook.
library(SparkR)
display(sql("SHOW STORAGE CREDENTIALS"))
Scala
Run the following command in a notebook.
display(spark.sql("SHOW STORAGE CREDENTIALS"))
View a storage credential
To view the properties of a storage credential, you can use Catalog Explorer or a SQL command.
Data explorer
- Log in to a workspace that is linked to the metastore.
- Click
Catalog.
- At the bottom of the screen, click Storage Credentials.
- Click the name of a storage credential to see its properties.
Sql
Run the following command in a notebook or the Databricks SQL editor. Replace <credential-name>
with the name of the credential.
DESCRIBE STORAGE CREDENTIAL <credential-name>;
Python
Run the following command in a notebook. Replace <credential-name>
with the name of the credential.
display(spark.sql("DESCRIBE STORAGE CREDENTIAL <credential-name>"))
R
Run the following command in a notebook. Replace <credential-name>
with the name of the credential.
library(SparkR)
display(sql("DESCRIBE STORAGE CREDENTIAL <credential-name>"))
Scala
Run the following command in a notebook. Replace <credential-name>
with the name of the credential.
display(spark.sql("DESCRIBE STORAGE CREDENTIAL <credential-name>"))
Rename a storage credential
To rename a storage credential, you can use Catalog Explorer or a SQL command.
Data explorer
- Log in to a workspace that is linked to the metastore.
- Click
Catalog.
- At the bottom of the screen, click Storage Credentials.
- Click the name of a storage credential to open the edit dialog.
- Rename the storage credential and save it.
Sql
Run the following command in a notebook or the Databricks SQL editor. Replace the placeholder values:
<credential-name>
: The name of the credential.<new-credential-name>
: A new name for the credential.
ALTER STORAGE CREDENTIAL <credential-name> RENAME TO <new-credential-name>;
Python
Run the following command in a notebook. Replace the placeholder values:
<credential-name>
: The name of the credential.<new-credential-name>
: A new name for the credential.
spark.sql("ALTER STORAGE CREDENTIAL <credential-name> RENAME TO <new-credential-name>")
R
Run the following command in a notebook. Replace the placeholder values:
<credential-name>
: The name of the credential.<new-credential-name>
: A new name for the credential.
library(SparkR)
sql("ALTER STORAGE CREDENTIAL <credential-name> RENAME TO <new-credential-name>")
Scala
Run the following command in a notebook. Replace the placeholder values:
<credential-name>
: The name of the credential.<new-credential-name>
: A new name for the credential.
spark.sql("ALTER STORAGE CREDENTIAL <credential-name> RENAME TO <new-credential-name>")
Manage permissions for a storage credential
You can grant permissions directly on the storage credential, but Databricks strongly recommends that you reference it in an external location and grant permissions to that instead. An external location combines a storage credential with a specific path, and authorizes access only to that path and its contents.
You can manage permissions for a storage credential using Catalog Explorer, the Databricks CLI, SQL commands in a notebook or Databricks SQL query, or Terraform. You can grant and revoke the following permissions on a storage credential:
CREATE TABLE
READ FILES
WRITE FILES
In the following examples, replace the placeholder values:
<principal>
: The email address of the account-level user or the name of the account level group to whom to grant the permission.<storage-credential-name>
: The name of a storage credential.
To show grants on a storage credential, use a command like the following. You can optionally filter the results to show only the grants for the specified principal.
SQL
SHOW GRANTS [<principal>] ON STORAGE CREDENTIAL <storage-credential-name>;
Python
display(spark.sql("SHOW GRANTS [<principal>] ON STORAGE CREDENTIAL <storage-credential-name>"))
R
library(SparkR)
display(sql("SHOW GRANTS [<principal>] ON STORAGE CREDENTIAL <storage-credential-name>"))
Scala
display(spark.sql("SHOW GRANTS [<principal>] ON STORAGE CREDENTIAL <storage-credential-name>"))
To grant permission to create an external table using a storage credential directly:
SQL
GRANT CREATE EXTERNAL TABLE ON STORAGE CREDENTIAL <storage-credential-name> TO <principal>;
Python
spark.sql("GRANT CREATE EXTERNAL TABLE ON STORAGE CREDENTIAL <storage-credential-name> TO <principal>")
R
library(SparkR)
sql("GRANT CREATE EXTERNAL TABLE ON STORAGE CREDENTIAL <storage-credential-name> TO <principal>")
Scala
spark.sql("GRANT CREATE EXTERNAL TABLE ON STORAGE CREDENTIAL <storage-credential-name> TO <principal>")
To grant permission to select from an external table using a storage credential directly:
SQL
GRANT READ FILES ON STORAGE CREDENTIAL <storage-credential-name> TO <principal>;
Python
spark.sql("GRANT READ FILES ON STORAGE CREDENTIAL <storage-credential-name> TO <principal>")
R
library(SparkR)
sql("GRANT READ FILES ON STORAGE CREDENTIAL <storage-credential-name> TO <principal>")
Scala
spark.sql("GRANT READ FILES ON STORAGE CREDENTIAL <storage-credential-name> TO <principal>")
Note
If a group name contains a space, use back-ticks around it (not apostrophes).
Change the owner of a storage credential
A storage credential’s creator is its initial owner. To change the owner to a different account-level user or group, do the following:
Sql
Run the following command in a notebook or the Databricks SQL editor. Replace the placeholder values:
<credential-name>
: The name of the credential.<principal>
: The email address of an account-level user or the name of an account-level group.
ALTER STORAGE CREDENTIAL <credential-name> OWNER TO <principal>;
Python
Run the following command in a notebook. Replace the placeholder values:
<credential-name>
: The name of the credential.<principal>
: The email address of an account-level user or the name of an account-level group.
spark.sql("ALTER STORAGE CREDENTIAL <credential-name> OWNER TO <principal>")
R
Run the following command in a notebook. Replace the placeholder values:
<credential-name>
: The name of the credential.<principal>
: The email address of an account-level user or the name of an account-level group.
library(SparkR)
sql("ALTER STORAGE CREDENTIAL <credential-name> OWNER TO <principal>")
Scala
Run the following command in a notebook. Replace the placeholder values:
<credential-name>
: The name of the credential.<principal>
: The email address of an account-level user or the name of an account-level group.
spark.sql("ALTER STORAGE CREDENTIAL <credential-name> OWNER TO <principal>")
Delete a storage credential
To delete (drop) a storage credential you must be its owner. To delete a storage credential, you can use Catalog Explorer or a SQL command.
Data explorer
- Log in to a workspace that is linked to the metastore.
- Click
Catalog.
- At the bottom of the screen, click Storage Credentials.
- Click the name of a storage credential to open the edit dialog.
- Click the Delete button.
Sql
Run the following command in a notebook or the Databricks SQL editor. Replace <credential-name>
with the name of the credential. Portions of the command that are in brackets are optional. By default, if the credential is used by an external location, it is not deleted. Replace <credential-name>
with the name of the credential.
IF EXISTS
does not return an error if the credential does not exist.
DROP STORAGE CREDENTIAL [IF EXISTS] <credential-name>;
Python
Run the following command in a notebook. Replace <credential-name>
with the name of the credential. Portions of the command that are in brackets are optional. By default, if the credential is used by an external location, it is not deleted. Replace <credential-name>
with the name of the credential.
IF EXISTS
does not return an error if the credential does not exist.<credential-name>
: The name of the credential.
spark.sql("DROP STORAGE CREDENTIAL [IF EXISTS] <credential-name>")
R
Run the following command in a notebook. Replace <credential-name>
with the name of the credential. Portions of the command that are in brackets are optional. By default, if the credential is used by an external location, it is not deleted. Replace <credential-name>
with the name of the credential.
IF EXISTS
does not return an error if the credential does not exist.library(SparkR) sql("DROP STORAGE CREDENTIAL [IF EXISTS] <credential-name>")
Scala
Run the following command in a notebook. Replace <credential-name>
with the name of the credential. Portions of the command that are in brackets are optional. By default, if the credential is used by an external location, it is not deleted. Replace <credential-name>
with the name of the credential.
IF EXISTS
does not return an error if the credential does not exist.spark.sql("DROP STORAGE CREDENTIAL [IF EXISTS] <credential-name>")
Manage external locations
The following sections show how to create and manage external locations.
Note
When you define a volume, you can no longer access any paths that overlap the volume location using external locations in the Catalog Explorer or cloud URIs.
Create an external location
You can create an external location using Catalog Explorer, the Databricks CLI, SQL commands in a notebook or Databricks SQL query, or Terraform. To mark an external location as read-only, you must use Catalog Explorer.
This section describes how to create an external location using SQL.
Run the following SQL command in a notebook or the Databricks SQL editor. Replace the placeholder values:
<location-name>
: A name for the external location.<bucket-path>
: The path in your cloud tenant that this external location grants access to.<storage-credential-name>
: The name of the storage credential that contains details about a service principal that is authorized to read to and write from the storage container path.
Note
- Each cloud storage path can be associated with only one external location. If you attempt to create a second external location that references the same path, the command fails.
- External locations only support Azure Data Lake Storage Gen2 storage.
SQL
CREATE EXTERNAL LOCATION <location-name>
URL 'abfss://<container-name>@<storage-account>.dfs.core.windows.net/<path>'
WITH ([STORAGE] CREDENTIAL <storage-credential-name>);
Python
spark.sql("CREATE EXTERNAL LOCATION <location-name> "
"URL 'abfss://<container-name>@<storage-account>.dfs.core.windows.net/<path>' "
"WITH ([STORAGE] CREDENTIAL <storage-credential-name>)")
R
library(SparkR)
sql(paste("CREATE EXTERNAL LOCATION <location-name> ",
"URL 'abfss://<container-name>@<storage-account>.dfs.core.windows.net/<path>' ",
"WITH ([STORAGE] CREDENTIAL <storage-credential-name>)",
sep = ""))
Scala
spark.sql("CREATE EXTERNAL LOCATION <location-name> " +
"URL 'abfss://<container-name>@<storage-account>.dfs.core.windows.net/<path>' " +
"WITH ([STORAGE] CREDENTIAL <storage-credential-name>)")
Describe an external location
To see the properties of an external location, you can use Catalog Explorer or a SQL command.
Data explorer
- Log in to a workspace that is linked to the metastore.
- Click
Catalog.
- At the bottom of the screen, click External Locations.
- Click the name of an external location to see its properties.
Sql
Run the following command in a notebook or the Databricks SQL editor. Replace <credential-name>
with the name of the credential.
DESCRIBE EXTERNAL LOCATION <location-name>;
Python
Run the following command in a notebook. Replace <credential-name>
with the name of the credential.
display(spark.sql("DESCRIBE EXTERNAL LOCATION <location-name>"))
R
Run the following command in a notebook. Replace <credential-name>
with the name of the credential.
library(SparkR)
display(sql("DESCRIBE EXTERNAL LOCATION <location-name>"))
Scala
Run the following command in a notebook. Replace <credential-name>
with the name of the credential.
display(spark.sql("DESCRIBE EXTERNAL LOCATION <location-name>"))
Modify an external location
An external location’s owner can rename, change the URI, and change the storage credential of the external location.
See also Manage Unity Catalog external locations in Catalog Explorer.
To rename an external location, do the following:
Sql
Run the following command in a notebook or the Databricks SQL editor. Replace the placeholder values:
<location-name>
: The name of the location.<new-location-name>
: A new name for the location.
ALTER EXTERNAL LOCATION <location-name> RENAME TO <new-location-name>;
Python
Run the following command in a notebook. Replace the placeholder values:
<location-name>
: The name of the location.<new-location-name>
: A new name for the location.
spark.sql("ALTER EXTERNAL LOCATION <location-name> RENAME TO <new-location-name>")
R
Run the following command in a notebook. Replace the placeholder values:
<location-name>
: The name of the location.<new-location-name>
: A new name for the location.
library(SparkR)
sql("ALTER EXTERNAL LOCATION <location-name> RENAME TO <new-location-name>")
Scala
Run the following command in a notebook. Replace the placeholder values:
<location-name>
: The name of the location.<new-location-name>
: A new name for the location.
spark.sql("ALTER EXTERNAL LOCATION <location-name> RENAME TO <new-location-name>")
To change the URI that an external location points to in your cloud tenant, do the following:
Sql
Run the following command in a notebook or the Databricks SQL editor. Replace the placeholder values:
<location-name>
: The name of the external location.<url>
: The new storage URL the location should authorize access to in your cloud tenant.
ALTER EXTERNAL LOCATION location_name SET URL '<url>' [FORCE];
Python
Run the following command in a notebook. Replace the placeholder values:
<location-name>
: The name of the external location.<url>
: The new storage URL the location should authorize access to in your cloud tenant.
spark.sql("ALTER EXTERNAL LOCATION location_name SET URL '<url>' [FORCE]")
R
Run the following command in a notebook. Replace the placeholder values:
<location-name>
: The name of the external location.<url>
: The new storage URL the location should authorize access to in your cloud tenant.
library(SparkR)
sql("ALTER EXTERNAL LOCATION location_name SET URL '<url>' [FORCE]")
Scala
Run the following command in a notebook. Replace the placeholder values:
<location-name>
: The name of the external location.<url>
: The new storage URL the location should authorize access to in your cloud tenant.
spark.sql("ALTER EXTERNAL LOCATION location_name SET URL '<url>' [FORCE]")
The FORCE
option changes the URL even if external tables depend upon the external location.
To change the storage credential that an external location uses, do the following:
Sql
Run the following command in a notebook or the Databricks SQL editor. Replace the placeholder values:
<location-name>
: The name of the external location.<credential-name>
: The name of the storage credential that grants access to the location’s URL in your cloud tenant.
ALTER EXTERNAL LOCATION <location-name> SET STORAGE CREDENTIAL <credential-name>;
Python
Run the following command in a notebook. Replace the placeholder values:
<location-name>
: The name of the external location.<credential-name>
: The name of the storage credential that grants access to the location’s URL in your cloud tenant.
spark.sql("ALTER EXTERNAL LOCATION <location-name> SET STORAGE CREDENTIAL <credential-name>")
R
Run the following command in a notebook. Replace the placeholder values:
<location-name>
: The name of the external location.<credential-name>
: The name of the storage credential that grants access to the location’s URL in your cloud tenant.
library(SparkR)
sql("ALTER EXTERNAL LOCATION <location-name> SET STORAGE CREDENTIAL <credential-name>")
Scala
Run the following command in a notebook. Replace the placeholder values:
<location-name>
: The name of the external location.<credential-name>
: The name of the storage credential that grants access to the location’s URL in your cloud tenant.
spark.sql("ALTER EXTERNAL LOCATION <location-name> SET STORAGE CREDENTIAL <credential-name>")
Manage permissions for an external location
You can grant and revoke the following permissions on an external location using Catalog Explorer, the Databricks CLI, SQL commands in a notebook or Databricks SQL query, or Terraform:
CREATE TABLE
READ FILES
WRITE FILES
In the following examples, replace the placeholder values:
<location-name>
: The name of the external location that authorizes reading from and writing to the storage container path in your cloud tenant.<principal>
: The email address of an account-level user or the name of an account-level group.
To show grants on an external location, use a command like the following. You can optionally filter the results to show only the grants for the specified principal.
SQL
SHOW GRANTS [<principal>] ON EXTERNAL LOCATION <location-name>;
Python
display(spark.sql("SHOW GRANTS [<principal>] ON EXTERNAL LOCATION <location-name>"))
R
library(SparkR)
display(sql("SHOW GRANTS [<principal>] ON EXTERNAL LOCATION <location-name>"))
Scala
display(spark.sql("SHOW GRANTS [<principal>] ON EXTERNAL LOCATION <location-name>"))
To grant permission to use an external location to create a table:
SQL
GRANT CREATE EXTERNAL TABLE ON EXTERNAL LOCATION <location-name> TO <principal>;
Python
spark.sql("GRANT CREATE EXTERNAL TABLE ON EXTERNAL LOCATION <location-name> TO <principal>")
R
library(SparkR)
sql("GRANT CREATE EXTERNAL TABLE ON EXTERNAL LOCATION <location-name> TO <principal>")
Scala
spark.sql("GRANT CREATE EXTERNAL TABLE ON EXTERNAL LOCATION <location-name> TO <principal>")
To grant permission to read files from an external location:
SQL
GRANT READ FILES ON EXTERNAL LOCATION <location-name> TO <principal>;
Python
spark.sql("GRANT READ FILES ON EXTERNAL LOCATION <location-name> TO <principal>")
R
library(SparkR)
sql("GRANT READ FILES ON EXTERNAL LOCATION <location-name> TO <principal>")
Scala
spark.sql("GRANT READ FILES ON EXTERNAL LOCATION <location-name> TO <principal>")
Note
If a group name contains a space, use back-ticks around it (not apostrophes).
Change the owner of an external location
An external location’s creator is its initial owner. To change the owner to a different account-level user or group, run the following command in a notebook or the Databricks SQL editor or use Catalog Explorer. Replace the placeholder values:
<location-name>
: The name of the credential.<principal>
: The email address of an account-level user or the name of an account-level group.
ALTER EXTERNAL LOCATION <location-name> OWNER TO <principal>
Delete an external location
To delete (drop) an external location you must be its owner. To delete an external location, do the following:
Sql
Run the following command in a notebook or the Databricks SQL editor. Items in brackets are optional. Replace <location-name>
with the name of the external location.
DROP EXTERNAL LOCATION [IF EXISTS] <location-name>;
Python
Run the following command in a notebook. Items in brackets are optional. Replace <location-name>
with the name of the external location.
spark.sql("DROP EXTERNAL LOCATION [IF EXISTS] <location-name>")
R
Run the following command in a notebook. Items in brackets are optional. Replace <location-name>
with the name of the external location.
library(SparkR)
sql("DROP EXTERNAL LOCATION [IF EXISTS] <location-name>")
Scala
Run the following command in a notebook. Items in brackets are optional. Replace <location-name>
with the name of the external location.
spark.sql("DROP EXTERNAL LOCATION [IF EXISTS] <location-name>")
Mark an external location or storage credential as read-only
If you want users to have read-only access to an external location, you can use Catalog Explorer to mark the external location as read-only.
Likewise, if you want users to have read-only access to all external locations that are referenced by a specific storage credential, you can use Catalog Explorer to mark that storage credential as read-only.
Making storage credentials and external locations read-only:
- Prevents users from writing to files in those external locations, regardless of any write permissions granted by the Azure managed identity that underlies the storage credential, and regardless of the Unity Catalog permissions granted on that external location.
- Prevents users from creating tables or volumes (whether external or managed) in those external locations.
- Enables the system to validate the external location or storage credential properly at creation time.
You should mark storage credentials and external locations as read-only when you create them, but you can also add the option after you’ve created them:
- In Catalog Explorer, find the storage credential or external location, click the
kebab menu (also known as the three-dot menu) on the object row, and select Edit.
- On the edit dialog, select the Read only option.
Next steps
Feedback
Submit and view feedback for