Configure a firewall for serverless compute access

Note

If you configured storage firewalls using subnet IDs from Azure Databricks documentation before October 31, 2023, Databricks recommends you update the workspaces following the steps in this article or using a private endpoint. If you choose to not update existing workspaces, they continue to work without changes.

This article describes how to configure an Azure storage firewall for serverless compute using the Azure Databricks account console UI. You can also use the Network Connectivity Configurations API.

To configure a private endpoint for serverless compute access, see Configure private connectivity from serverless compute.

Overview of firewall enablement for serverless compute

Serverless network connectivity is managed with network connectivity configurations (NCCs). Account admins create NCCs in the account console and an NCC can be attached to one or more workspaces

An NCC contains a list of network identities for an Azure resource type as default rules. When an NCC is attached to a workspace, serverless compute in that workspace uses one of those networks to connect the Azure resource. You can allow list those networks on your Azure resource firewall.

NCC firewall enablement is only supported from serverless SQL warehouses for data sources that you manage. It is not supported from other compute resources in the serverless compute plane or for the workspace root storage (root DBFS).

For more information on NCCs, see What is a network connectivity configuration (NCC)?.

Cost implications of cross-region storage access

For cross-region traffic from Azure Databricks serverless SQL warehouses (for example, workspace is in East US region and ADLS storage is in West Europe), Azure Databricks routes the traffic through an Azure NAT Gateway service.

Important

There are currently no charges to use this feature. In a later release, you might be charged for usage. To avoid these charges, Databricks recommends you create a workspace in the same region as your storage.

Requirements

  • Your workspace must be on the Premium plan.

  • You must be an Azure Databricks account admin.

  • Each NCC can be attached to up to 50 workspaces.

  • Each Azure Databricks account can have up to 10 NCCs per region.

    • You must have WRITE access to your Azure storage account’s network rules.

Step 1: Create a network connectivity configuration and copy subnet IDs

Databricks recommends sharing NCCs among workspaces in the same business unit and those sharing the same region and connectivity properties. For example, if some workspaces use storage firewall and other workspaces use the alternative approach of Private Link, use separate NCCs for those use cases.

  1. As an account admin, go to the account console.
  2. In the sidebar, click Cloud Resources.
  3. Click Network Connectivity Configuration.
  4. Click Add Network Connectivity Configurations.
  5. Type a name for the NCC.
  6. Choose the region. This must match your workspace region.
  7. Click Add.
  8. In the list of NCCs, click on your new NCC.
  9. In Default Rules under Network identities, click View all.
  10. In the dialog, click the Copy subnets button and save the list of subnets.
  11. Click Close.

Step 2: Attach an NCC to workspaces

You can attach an NCC to up to 50 workspaces in the same region as the NCC.

To use the API to attach an NCC to a workspace, see the Account Workspaces API.

  1. In the account console sidebar, click Workspaces.
  2. Click your workspace’s name.
  3. Click Update workspace.
  4. In the Network Connectivity Config field, select your NCC. If it’s not visible, confirm that you’ve selected the same region for both the workspace and the NCC.
  5. Click Update.
  6. Wait 10 minutes for the change to take effect.
  7. Restart any running serverless SQL warehouses in the workspace.

Step 3: Lock down your storage account

If you haven’t already limited access to the Azure storage account to only allow-listed networks, do so now. Creating a storage firewall also affects connectivity from classic compute plane to your resources. You must also add network rules to connect to your storage accounts from classic compute resources.

  1. Go to the Azure portal.
  2. Navigate to your storage account for the data source.
  3. In the left nav, click Networking.
  4. In the field Public network access, check the value. By default, the value is Enabled from all networks. Change this to Enabled from selected virtual networks and IP addresses.

Step 4: Add Azure storage account network rules

  1. Add one Azure storage account network rule for each subnet. You can do this using the Azure CLI, PowerShell, Terraform, or other automation tools. Note that this step cannot be done in the Azure Portal user interface.

    The following example uses the Azure CLI:

    az storage account network-rule add --subscription "<sub>" \
        --resource-group "<res>" --account-name "<account>" --subnet "<subnet>"
    
    • Replace <sub> with the name of your Azure subscription for the storage account.
    • Replace <res> with the resource group of your storage account.
    • Replace <account> with the name of your storage account
    • Replace <subnet> with the ARM resource ID (resourceId) of the serverless SQL warehouse subnet.

    After running all the commands, you can use the Azure portal to view your storage account and confirm that there is an entry in the Virtual Networks table that represents the new subnet. However, you cannot make the network rules changes in the Azure portal.

    Tip

    Ignore the mention of “Insufficient permissions” in the endpoint status column or the warning below the network list. They indicate only that you do not have permission to read the Azure Databricks subnets but it does not interfere with the ability for that Azure Databricks serverless subnet to contact your Azure storage.

    Example new entries in Virtual Networks list

  2. Repeat this command once for every subnet. You can optionally automate the network rule creation process. See Automate your network rule creation.

  3. To confirm that your storage account uses these settings from the Azure portal, navigate to Networking in your storage account.

    Confirm that the Public network access is set to Enabled from selected virtual networks and IP addresses and allowed networks are listed in the Virtual Networks section.

Automate your network rule creation

You can automate network rule creation for you storage account using the Azure CLI or Powershell.

This Azure CLI example uses two subnets in a list that you can use with a loop to run the command for each subnet. In this example, mystorage-rg is the resource group, and myaccount is the storage account.

#!/bin/bash
SUBNETS=(/subscriptions/8453a5d5-9e9e-40c7-87a4-0ab4cc197f48/resourceGroups/prod-azure-eastusc3-nephos2/providers/Microsoft.Network/virtualNetworks/kaas-vnet/subnets/worker-subnet /subscriptions/8453a5d5-9e9e-40c7-87a4-0ab4cc197f48/resourceGroups/prod-azure-eastusc3-nephos3/providers/Microsoft.Network/virtualNetworks/kaas-vnet/subnets/worker-subnet)
for SUBNET in ${SUBNETS[@]}
do
  az storage account network-rule add --subscription 9999999-1ff3-43f4-b91e-d0ceb97111111 --resource-group mystorage-rg --account-name myaccount --subnet ${SUBNET}
done

To use Powershell, use the following command:

Add-AzStorageAccountNetworkRule -ResourceGroupName <resource group name> -Name <storage account name> -VirtualNetworkResourceId <subnets>

Replace:

  • <resource group name> with the resource group of your storage account.
  • <storage account name> with the name of your storage account.
  • <subnets> with a list of the subnet resource IDs separated by commas.