Create an external location to connect cloud storage to Azure Databricks
This article describes how to configure an external location in Unity Catalog to connect cloud storage to Azure Databricks.
External locations associate Unity Catalog storage credentials with cloud object storage containers. External locations are used to define managed storage locations for catalogs and schemas, and to define locations for external tables and external volumes.
You can create an external location that references storage in an Azure Data Lake Storage Gen2 storage container or Cloudflare R2 bucket.
You can create an external location using Catalog Explorer, the Databricks CLI, SQL commands in a notebook or Databricks SQL query, or Terraform.
Note
When you define a volume, cloud URI access to data under the volume path is governed by the permissions of the volume.
Before you begin
Prerequisites:
You must create the Azure Data Lake Storage Gen2 storage container or Cloudflare R2 bucket that you want to use as an external location before you create the external location object in Azure Databricks.
Azure Data Lake Storage Gen2 storage accounts that you use as external locations must have a hierarchical namespace.
You must have a storage credential defined in Azure Databricks that gives access to the cloud storage location path. See Create a storage credential for connecting to Azure Data Lake Storage Gen2 and Create a storage credential for connecting to Cloudflare R2.
Permissions requirements:
- You must have the
CREATE EXTERNAL LOCATION
privilege on both the metastore and the storage credential referenced in the external location. Metastore admins haveCREATE EXTERNAL LOCATION
on the metastore by default.
Create an external location using Catalog Explorer
You can create an external location manually using Catalog Explorer.
Permissions and prerequisites: see Before you begin.
To create the external location:
Log in to a workspace that is attached to the metastore.
In the sidebar, click Catalog.
Click the + Add button and select Add an external location.
Enter an External location name.
Optionally copy the container path from an existing mount point (Azure Data Lake Storage Gen2 containers only).
If you aren’t copying from an existing mount point, use the URL field to enter the storage container or R2 bucket path that you want to use as the external location.
For example,
abfss://my-container-name@my-storage-account.dfs.core.windows.net/<path>
orr2://my-bucket@my-account-id.r2.cloudflarestorage.com/<path>
.Select the storage credential that grants access to the external location.
(Optional) If you want users to have read-only access to the external location, click Advanced Options and select Read only. For more information, see Mark an external location as read-only.
Click Create.
(Optional) Bind the external location to specific workspaces.
By default, any privileged user can use the external location on any workspace attached to the metastore. If you want to allow access only from specific workspaces, go to the Workspaces tab and assign workspaces. See (Optional) Assign an external location to specific workspaces.
Grant permission to use the external location.
For anyone to use the external location you must grant permissions:
- To use the external location to add a managed storage location to metastore, catalog, or schema, grant the
CREATE MANAGED LOCATION
privilege. - To create external tables or volumes, grant
CREATE EXTERNAL TABLE
orCREATE EXTERNAL VOLUME
.
To use Catalog Explorer to grant permissions:
- Click the external location name to open the details pane.
- On the Permissions tab, click Grant.
- On the Grant on
<external location>
dialog, select users, groups, or service principals in Principals field, and select the privilege you want to grant. - Click Grant.
- To use the external location to add a managed storage location to metastore, catalog, or schema, grant the
Create an external location using SQL
To create an external location using SQL, run the following command in a notebook or the SQL query editor. Replace the placeholder values.
Permissions and prerequisites: see Before you begin.
<location-name>
: A name for the external location. Iflocation_name
includes special characters, such as hyphens (-
), it must be surrounded by backticks (` `
). See Names.<bucket-path>
: The path in your cloud tenant that this external location grants access to. For example,abfss://my-container-name@my-storage-account.dfs.core.windows.net/<path>
orr2://my-bucket@my-account-id.r2.cloudflarestorage.com/<path>
.<storage-credential-name>
: The name of the storage credential that authorizes reading from and writing to the storage container or bucket path. If the storage credential name includes special characters, such as hyphens (-
), it must be surrounded by backticks (` `
).
CREATE EXTERNAL LOCATION [IF NOT EXISTS] `<location-name>`
URL '<bucket-path>'
WITH ([STORAGE] CREDENTIAL `<storage-credential-name>`)
[COMMENT '<comment-string>'];
If you want to limit external location access to specific workspaces in your account, also known as workspace binding or external location isolation, see (Optional) Assign an external location to specific workspaces.
(Optional) Assign an external location to specific workspaces
Important
This feature is in Public Preview.
By default, an external location is accessible from all of the workspaces in the metastore. This means that if a user has been granted a privilege (such as READ FILES
) on that external location, they can exercise that privilege from any workspace attached to the metastore. If you use workspaces to isolate user data access, you might want to allow access to an external location only from specific workspaces. This feature is known as workspace binding or external location isolation.
Typical use cases for binding an external location to specific workspaces include:
- Ensuring that data engineers who have the
CREATE EXTERNAL TABLE
privilege on an external location that contains production data can create external tables on that location only in a production workspace. - Ensuring that data engineers who have the
READ FILES
privilege on an external location that contains sensitive data can only use specific workspaces to access that data.
For more information about how to restrict other types of data access by workspace, see Workspace-catalog binding example.
Bind an external location to one or more workspaces
To assign an external location to specific workspaces, you can use Catalog Explorer or the Unity Catalog REST API.
Permissions required: Metastore admin or external location owner.
Note
Metastore admins can see all external locations in a metastore using Catalog Explorer—and external location owners can see all external locations that they own in a metastore—regardless of whether the external location is assigned to the current workspace. External locations that are not assigned to the workspace appear grayed out.
Catalog explorer
Log in to a workspace that is linked to the metastore.
In the sidebar, click Catalog.
At the bottom of the screen, click External Data > External Locations.
Select the external location and go to the Workspaces tab.
On the Workspaces tab, clear the All workspaces have access checkbox.
If your external location is already bound to one or more workspaces, this checkbox is already cleared.
Click Assign to workspaces and enter or find the workspaces you want to assign.
To revoke access, go to the Workspaces tab, select the workspace, and click Revoke. To allow access from all workspaces, select the All workspaces have access checkbox.
Api
There are two APIs and two steps required to assign an external location to a workspace. In the following examples, replace <workspace-url>
with your workspace instance name. To learn how to get the workspace instance name and workspace ID, see Get identifiers for workspace objects. To learn about getting access tokens, see Authentication for Azure Databricks automation - overview.
Use the
catalogs
API to set the external location’sisolation mode
toISOLATED
:curl -L -X PATCH 'https://<workspace-url>/api/2.1/unity-catalog/external-locations/<my-location> \ -H 'Authorization: Bearer <my-token> \ -H 'Content-Type: application/json' \ --data-raw '{ "isolation_mode": "ISOLATED" }'
The default
isolation mode
isOPEN
to all workspaces attached to the metastore. See Catalogs in the REST API reference.Use the update
bindings
API to assign the workspaces to the catalog:curl -L -X PATCH 'https://<workspace-url>/api/2.1/unity-catalog/bindings/external-locations/<my-location> \ -H 'Authorization: Bearer <my-token> \ -H 'Content-Type: application/json' \ --data-raw '{ "add": [{"workspace_id": <workspace-id>,...], "remove": [{"workspace_id": <workspace-id>,...] }'
Use the
"add"
and"remove"
properties to add or remove workspace bindings...note:: Read-only binding (
BINDING_TYPE_READ_ONLY
) is not available for external locations. Therefore there is no reason to setbinding_type
for the external locations binding.
To list all workspace assignments for an external location, use the list bindings
API:
curl -L -X GET 'https://<workspace-url>/api/2.1/unity-catalog/bindings/external-locations/<my-location> \
-H 'Authorization: Bearer <my-token> \
See Workspace Bindings in the REST API reference.
Unbind an external location from a workspace
Instructions for revoking workspace access to an external location using Catalog Explorer or the bindings
API are included in Bind an external location to one or more workspaces.
Next steps
- Grant other users permission to use external locations. See Manage external locations.
- Define managed storage locations using external locations. See Specify a managed storage location in Unity Catalog.
- Define external tables using external locations. See Create an external table.
- Define external volumes using external locations. See Create and work with volumes.
Feedback
https://aka.ms/ContentUserFeedback.
Coming soon: Throughout 2024 we will be phasing out GitHub Issues as the feedback mechanism for content and replacing it with a new feedback system. For more information see:Submit and view feedback for