Configure data access for ingestion
This article describes how admin users can configure access to data in a container in Azure Data Lake Storage Gen2 (ADLS Gen2) so that Azure Databricks users can load data from ADLS Gen2 into a table in Azure Databricks.
This article describes the following ways to configure secure access to source data:
(Recommended) Create a Unity Catalog volume.
Create a Unity Catalog external location with a storage credential.
Launch a compute resource that uses a service principal.
Generate temporary credentials (a Blob SAS token).
Before you begin
Before you configure access to data in ADLS Gen2, make sure you have the following:
Data in a container in your Azure storage account. To create a container, see Create a container in the Azure storage documentation.
To access data using a Unity Catalog volume (recommended), the
READ VOLUME
privilege on the volume. For more information, see What are Unity Catalog volumes? and Unity Catalog privileges and securable objects.To access data using a Unity Catalog external location, the
READ FILES
privilege on the external location. For more information, see Create an external location to connect cloud storage to Azure Databricks.To access data using a compute resource with a service principal, Azure Databricks workspace admin permissions.
To access data using temporary credentials:
- Azure Databricks workspace admin permissions.
- Permissions in your Azure account to create Blob SAS tokens. This allows you to generate temporary credentials.
A Databricks SQL warehouse. To create a SQL warehouse, see Create a SQL warehouse.
Familiarity with the Databricks SQL user interface.
Configure access to cloud storage
Use one of the following methods to configure access to ADLS Gen2:
(Recommended) Create a Unity Catalog volume. For more information, see What are Unity Catalog volumes?.
Configure a Unity Catalog external location with a storage credential. For more information about external locations, see Create an external location to connect cloud storage to Azure Databricks.
Configure a compute resource to use a service principal. For more information, see Configure a service principal.
Generate temporary credentials (a Blob SAS token) to share with other Azure Databricks users. For more information, see Generate temporary credentials for ingestion.
Clean up
You can clean up the associated resources in your cloud account and Azure Databricks if you no longer want to keep them.
Delete the ADLS Gen2 storage account
- Open the Azure portal for your Azure account, typically at https://portal.azure.com.
- Browse to and open your storage account.
- Click Delete.
- Enter the name of the storage account, and then click Delete.
Stop the SQL warehouse
If you are not using the SQL warehouse for any other tasks, you should stop the SQL warehouse to avoid additional costs.
- In the SQL persona, on the sidebar, click SQL Warehouses.
- Next to the name of the SQL warehouse, click Stop.
- When prompted, click Stop again.
Next steps
After you complete the steps in this article, users can run the COPY INTO
command to load the data from the ADLS Gen2 container into your Azure Databricks workspace.
To load data using a Unity Catalog volume or external location, see Load data using COPY INTO with Unity Catalog volumes or external locations.
To load data using a SQL warehouse with a service principal, see Load data using COPY INTO with a service principal.
To load data using temporary credentials (a Blob SAS token), see Load data using COPY INTO with temporary credentials.