Use Azure managed identities in Unity Catalog to access storage

Important

This feature is in Public Preview.

This article describes how to use Azure managed identities for connecting to storage containers on behalf of Unity Catalog users.

What are Azure managed identities?

Unity Catalog can be configured to use an Azure managed identity to access storage containers on behalf of Unity Catalog users. Managed identities provide an identity for applications to use when they connect to resources that support Azure Active Directory (Azure AD) authentication.

You can use managed identities in Unity Catalog to support two primary use cases:

  • As an identity to connect to the metastore’s root storage account (where managed tables are stored).
  • As an identity to connect to other external storage accounts (either for file-based access or for external tables).

Configuring Unity Catalog with a managed identity has the following benefits over configuring Unity Catalog with a service principal:

  • You can connect to an Azure Data Lake Storage Gen2 account that is protected by a storage firewall.
  • Managed identities do not require you to maintain credentials or rotate secrets.

Configure a managed identity for Unity Catalog

To configure a managed identity for your Unity Catalog metastore and other external storage, first you must create an access connector for Azure Databricks, to which the system assigns a managed identity. Then you grant the managed identity access to your Azure Data Lake Storage Gen2 account.

Note

Azure Databricks supports only system-assigned managed identities. You cannot use user-assigned managed identities.

Requirements

  • There must be at least one Azure Databricks workspace in your Azure tenant.
  • You must be a Contributor or Owner of an Azure resource group in the same region as the storage account that you want to connect to.
  • You must be an Owner or a user with the User Access Administrator Azure RBAC role on the storage account.

Step 1: Create an access connector for Azure Databricks

The Access Connector for Azure Databricks is a first-party Azure resource that lets you connect managed identities to an Azure Databricks account. Azure Databricks account admins can delegate the managed identity assigned to the access connector to Azure Databricks resources like Unity Catalog metastores.

Azure Databricks access connector

Each access connector for Azure Databricks has one system-assigned managed identity, so you must create a separate access connector for each managed identity.

  1. Log in to the Azure Portal as a Contributor or Owner of a resource group.

    Note

    You cannot manage access connectors as a service principal.

    The resource group should be in the same region as the storage account that you want to connect to.

  2. Click + Create or Create a new resource.

  3. Search for Access Connector for Azure Databricks and select it.

  4. Click Create.

  5. On the Basics tab, accept, select, or enter values for the following fields:

    • Subscription: This is the Azure subscription that the access connector will be created in. The default is the Azure subscription you are currently using. It can be any subscription in the tenant.
    • Resource group: This should be a resource group in the same region as the storage account that you will connect to.
    • Name: Enter a name that indicates the purpose of the connector.
    • Region: This should be the same region as the storage account that you will connect to.
  6. Click Review + create.

  7. When you see the Validation Passed message, click Create.

    When the deployment succeeds, the access connector is deployed with a system-assigned managed identity.

  8. When the deployment is complete, click Go to resource.

  9. Make note of the Resource ID.

    The resource ID is in the format:

    /subscriptions/12f34567-8ace-9c10-111c-aea8eba12345c/resourceGroups/<resource_group>/providers/Microsoft.Databricks/accessConnectors/<connector-name>
    

Step 2: Grant the managed identity access to the storage account

  1. Log in to your Azure Data Lake Storage Gen2 account as an Owner or a user with the User Access Administrator Azure RBAC role on the storage account.
  2. Go to Access Control (IAM), click + Add, and select Add role assignment.
  3. Select the Storage Blob Data Contributor role and click Next.
  4. Under Assign access to, select Managed identity.
  5. Click +Select Members, and select All system-assigned managed identities.
  6. Search for your connector name, select it, and click Review and Assign.

Use a managed identity to access storage managed by a Unity Catalog metastore

This section describes how to give the managed identity access to the root storage account, used for managed storage, when you create a Unity Catalog metastore.

To learn how to upgrade an existing Unity Catalog metastore to use a managed identity, see Upgrade your existing Unity Catalog metastore to use a managed identity to access its root storage.

  1. As an Azure Databricks account admin, log in to the Azure Databricks account console.

  2. Click Data Icon Data.

  3. Click Create Metastore.

  4. Enter values for the following fields:

    • Name for the metastore.

    • Region where the metastore will be deployed.

      For best performance, co-locate the access connector, workspaces, metastore and cloud storage location in the same cloud region.

    • ADLS Gen 2 path: enter the path to the storage container that you will use as root storage for the metastore.

      The abfss:// prefix is added automatically.

    • Access Connector ID: enter the Azure Databricks access connector’s resource ID in the format:

      /subscriptions/12f34567-8ace-9c10-111c-aea8eba12345c/resourceGroups/<resource_group>/providers/Microsoft.Databricks/accessConnectors/<connector-name>
      
  5. Click Create.

    If the request fails, retry using a different metastore name.

  6. When prompted, select workspaces to link to the metastore.

Use a managed identity to access external storage managed in Unity Catalog

Unity Catalog gives you the ability to access data outside the metastore root bucket using storage credentials and external locations. Storage credentials store the managed identity, and external locations define a path to storage along with a reference to the storage credential. You can use this approach to grant and control access to existing data in cloud storage and to register external tables in Unity Catalog.

A storage credential can hold a managed identity or service principal. Using a managed identity has the benefit of allowing Unity Catalog to access storage accounts protected by network rules, which isn’t possible using service principals, and it removes the need to manage and rotate secrets.

To create a storage credential using a managed identity and assign that storage credential to an external location, follow the instructions in Manage external locations and storage credentials.

Azure Data Lake Storage Gen2 provides a model to secure access to your storage account. When network rules are configured, only applications requesting data over the specified set of networks or through the specified set of Azure resources can access a storage account. You can enable a Unity Catalog metastore to access data in your storage account by adding the system-assigned managed identity to the network rules. See Configure Azure Storage firewalls and virtual networks.

Requirements

Step 1: Enable your managed identity to access Azure Storage

This step is necessary only if “Allow Azure services on the trusted services list to access this storage account” is disabled for your Azure Storage account. If that configuration is enabled:

  • Any access connector for Azure Databricks in the same tenant as the storage account can access the storage account.
  • Any Azure trusted service can access the storage account. See Grant access to trusted Azure services.

The instructions below include a step in which you disable this configuration. You can use the Azure Portal or the Azure CLI.

Use the Azure Portal

  1. Log in to the Azure Portal, find and select the Azure Storage account, and go to the Networking tab.

  2. Set Public Network Access to Enabled from selected virtual networks and IP addresses.

    As an option, you can instead set Public Network Access to Disabled. The managed identity can be used to bypass the check on public network access.

  3. Under Resource instances, select a Resource type of Microsoft.Databricks/accessConnectors and select your Azure Databricks access connector.

  4. Under Exceptions, clear the Allow Azure services on the trusted services list to access this storage account checkbox.

Use the Azure CLI

  1. Install the Azure CLI and sign in.

  2. Add a network rule to the storage account:

    az storage account network-rule add \
    -–subscription <subscription id of the resource group> \
    -–resource-id <resource Id of the access connector for Azure Databricks> \
    -–tenant-id <tenant Id> \
    -g <name of the Azure Storage resource group> \
    -–account-name <name of the Azure Storage resource> \
    

    Add the resource ID in the format:

    /subscriptions/12f34567-8ace-9c10-111c-aea8eba12345c/resourceGroups/<resource_group>/providers/Microsoft.Databricks/accessConnectors/<connector-name>
    
  3. After you create the network rule, go to your Azure Storage account in the Azure Portal and view the managed identity in the Networking tab under Resource instances, resource type Microsoft.Databricks/accessConnectors.

  4. Under Exceptions, clear the Allow Azure services on the trusted services list to access this storage account checkbox.

  5. Optionally, set Public Network Access to Disabled. The managed identity can be used to bypass the check on public network access.

    The standard approach is to keep this value set to Enabled from selected virtual networks and IP addresses.

Step 2. Enable your Azure Databricks workspace to access Azure Storage

Follow the instructions in Securely Accessing Azure Data Sources from Azure Databricks to secure connectivity from your Azure Databricks workspace to Azure Storage.

Upgrade your existing Unity Catalog metastore to use a managed identity to access its root storage

If you have a Unity Catalog metastore that was created using a service principal and you would like to upgrade it to use a managed identity, you can update it using an API call.

  1. Create an Access Connector for Azure Databricks and assign it permissions to the storage container that is being used for your Unity Catalog metastore root storage, using the instructions in Configure a managed identity for Unity Catalog.

    Make note of the access connector’s resource ID.

  2. As an account admin, log in to an Azure Databricks workspace that is assigned to the metastore.

    You do not have to be a workspace admin.

    Make a note of the workspace URL, which is the first portion of the URL, after https:// and inclusive of azuredatabricks.net.

  3. Generate a personal access token.

  4. Add the personal access token to the .netrc file in your home directory. This improves security by preventing the personal access token from appearing in your shell’s command history. See Store tokens in a .netrc file and use them in curl.

  5. Run the following cURL command to recreate the storage credential.

    Replace the placeholder values:

    • <databricks-instance>: The workspace URL of the workspace where the personal access token was generated.
    • <credential-name>: A name for the storage credential.
    • <access_connector_id>: Resource ID for the Azure Databricks access connector in the format /subscriptions/12f34567-8ace-9c10-111c-aea8eba12345c/resourceGroups/<resource_group>/providers/Microsoft.Databricks/accessConnectors/<connector-name>
    curl -n -X POST --header 'Content-Type: application/json' https://<databricks-instance>/api/2.0/unity-catalog/storage-credentials --data "{
      \"name\": \"<credential-name>\",
      \"azure_managed_identity\": {
        \"access_connector_id\": \"<access_connector_id>\"
      }
    }"
    
  6. Make a note of the storage credential ID in the response.

  7. Run the following cURL command to retrieve the metastore_id, where <databricks-instance> is the workspace URL of the workspace where the personal access token was generated.

    curl -n GET--header 'Content-Type: application/json' https://<databricks-instance>/api/2.0/unity-catalog/metastore_summary
    
  8. Run the following cURL command to update the metastore with the new root storage credential.

    Replace the placeholder values:

    • <databricks-instance>: The workspace URL of the workspace where the personal access token was generated.
    • <metastore-id>: The metastore ID that you retrieved in the previous step.
    • <storage-credential-id>: The storage credential ID.
    curl -n -X PATCH --header 'Content-Type: application/json' https://<databricks-instance>/api/2.0/unity-catalog/metastores/<metastore-id> --data
    "{\"storage_root_credential_id\": \"<storage-credential-id>\"}"