Access storage using a service principal & Microsoft Entra ID(Azure Active Directory)
Note
This article describes legacy patterns for configuring access to Azure Data Lake Storage Gen2.
Databricks recommends using Azure managed identities as Unity Catalog storage credentials to connect to Azure Data Lake Storage Gen2 instead of service principals. Managed identities have the benefit of allowing Unity Catalog to access storage accounts protected by network rules, which isn’t possible using service principals, and they remove the need to manage and rotate secrets. For more information, see Use Azure managed identities in Unity Catalog to access storage.
Registering an application with Microsoft Entra ID creates a service principal you can use to provide access to Azure storage accounts.
You can then configure access to these service principals using them as storage credentials in Unity Catalog or credentials stored with secrets.
Register a Microsoft Entra ID application
Registering a Microsoft Entra ID (formerly Azure Active Directory) application and assigning appropriate permissions will create a service principal that can access Azure Data Lake Storage Gen2 or Blob Storage resources.
To register a Microsoft Entra ID application, you must have the Application Administrator
role or the Application.ReadWrite.All
permission in Microsoft Entra ID.
- In the Azure portal, go to the Microsoft Entra ID service.
- Under Manage, click App Registrations.
- Click + New registration. Enter a name for the application and click Register.
- Click Certificates & Secrets.
- Click + New client secret.
- Add a description for the secret and click Add.
- Copy and save the value for the new secret.
- In the application registration overview, copy and save the Application (client) ID and Directory (tenant) ID.
Assign roles
You control access to storage resources by assigning roles to a Microsoft Entra ID application registration associated with the storage account. You might need to assign other roles depending on specific requirements.
To assign roles on a storage account you must have the Owner or User Access Administrator Azure RBAC role on the storage account.
- In the Azure portal, go to the Storage accounts service.
- Select an Azure storage account to use with this application registration.
- Click Access Control (IAM).
- Click + Add and select Add role assignment from the dropdown menu.
- Set the Select field to the Microsoft Entra ID application name and set Role to Storage Blob Data Contributor.
- Click Save.
To enable file event access on the storage account using the service principal, you must have the Owner or User Access Administrator Azure RBAC role on the Azure resource group that your Azure Data Lake Storage Gen2 account is in.
- Follow the steps above and assign the Storage Queue Data Contributor and the Storage Account Contributor roles your service principal.
- Navigate to the Azure resource group that your Azure Data Lake Storage Gen2 account is in.
- Go to Access Control (IAM), click + Add, and select Add role assignment.
- Select the EventGrid EventSubscription Contributor role and click Next.
- Under Assign access to, select Service Principal.
- Click +Select Members, select your service principal, and click Review and Assign.
Alternatively, you can limit access by only granting the Storage Queue Data Contributor role the service principal and granting no roles to your resource group. In this case, Azure Databricks cannot configure file events on your behalf.