Enable Azure Private Link
This feature is in Public Preview.
This article summarizes the use of Azure Private Link to enable private connectivity between users and their Databricks workspaces, and also between clusters on the data plane and the core services on the control plane within the Databricks workspace infrastructure.
This article mentions the term data plane, which is the compute layer of the Azure Databricks platform. In the context of this article, data plane refers to the Classic data plane in your Azure subscription. By contrast, the Serverless data plane that supports serverless SQL warehouses (Public Preview) runs in the Azure subscription of Azure Databricks. To learn more, see Serverless compute.
Private Link provides private connectivity from Azure VNets and on-premises networks to Azure services without exposing the traffic to the public network. Azure Databricks supports the following Private Link connection types:
Front-end Private Link, also known as user to workspace: A front-end Private Link connection allows users to connect to the Azure Databricks web application, REST API, and Databricks Connect API over a VNet interface endpoint. The front-end connection is also used by JDBC/ODBC and PowerBI integrations. The network traffic for a front-end Private Link connection between a transit VNet and the workspace control plane traverses over the Microsoft backbone network.
Back-end Private Link, also known as data plane to control plane: Databricks Runtime clusters in a customer-managed VNet (the data plane) connect to an Azure Databricks workspace’s core services (the control plane) in the Azure Databricks cloud account. This enables private connectivity from the clusters to the secure cluster connectivity relay endpoint and REST API endpoint.
Web auth private connections: To support private front-end connections, special configuration is required to support the single sign-on (SSO) login callbacks to the Azure Databricks web application. A special type of private connection with sub-resource type
browser_authenticationhosts a private connection from the transit VNet that allows Azure Active Directory to redirect users after login to the correct control plane instance. One of these connections is shared for all workspaces in the region.
Databricks strongly recommends creating a workspace called a private web auth workspace for each region to host the web auth private network settings. This workspace must not be deleted. All other workspaces in that region must refer to the Web auth private connection configuration of this workspace. This solves the problem of deleting a workspace potentially affecting other workspaces in that region. You can omit the private web auth workspace for non-production deployments and other deployments where you are certain you will not need to delete your one workspace that you use for web authentication.
If you implement Private Link for both front-end and back-end connections, you can optionally mandate private connectivity for the workspace, which means Azure Databricks rejects any connections over the public network. If you decline to implement both front-end or back-end connection types, you cannot enforce this requirement.
Be sure that you understand the Private Link configuration options before deploying your workspace and other cloud resources. You can set up Private Link connectivity with a new workspace, but you cannot update the workspace fields associated with Private Link front-end and back-end connections on a existing workspace.
The following table describes important terminology.
|Azure Private Link||An Azure technology that provides private connectivity from Azure VNets and on-premises networks to Azure services without exposing the traffic to the public network.|
|Azure Private Link service||A service that can be the destination for a Private Link connection. Each Azure Databricks control plane instance publishes an Azure Private Link service.|
|Azure private endpoint||An Azure private endpoint enables a private connection between a VNet and a Private Link service. For front-end and back-end connectivity, the target of a Azure private endpoint is the Azure Databricks control plane.|
For general information about private endpoints, see the Microsoft article What is a private endpoint?.
There are two types of Private Link deployment that Azure Databricks supports, and you must choose one:
- Standard deployment (recommended): For improved security, Databricks recommends you use a separate private endpoint for your front-end connection from a separate transit VNet. You can implement both front-end and back-end Private Link connections or just the back-end connection. Use a separate VNet to encapsulate user access, separate from the VNet that you use for your compute resources in the Classic data plane. Create separate Private Link endpoints for back-end and front-end access. Follow the instructions in Enable Azure Private Link as a standard deployment.
- Simplified deployment: Some organizations cannot use the standard deployment for various network policy reasons, such as disallowing more than one private endpoint or discouraging separate transit VNets. You can alternatively use the Private Link simplified deployment. No separate VNet separates user access from the VNet that you use for your compute resources in the Classic data plane. Instead, a transit subnet in the data plane VNet is used for user access. There is only a single Private Link endpoint. Typically both front-end and back-end connectivity are configured. You can optionally only configure the front-end connection. You cannot choose to use only the back-end connections in this deployment type. Follow the instructions in Enable Azure Private Link as a simplified deployment.
- Your Azure workspace must be on the Premium tier.
Azure Databricks workspace
- Your Azure Databricks workspace must use VNet injection to add any Private Link connection (even a front-end-only connection).
- If you implement the back-end Private Link connection, your Azure Databricks workspace must use secure cluster connectivity (SCC / No Public IP / NPIP).
You cannot update an existing workspace to change certain workspace attributes that relate to Private Link:
- You cannot update a workspace with the default (Databricks-managed) VNet and change it to use VNet injection.
- You cannot update a workspace that does not use secure cluster connectivity to enable secure cluster connectivity.
- You cannot update a workspace to modify the Private Link options Required Network Access or Required NSG Rules.
You need a VNet that satisfies the requirements of VNet injection.
- As discussed in that article, you need to define two subnets (referred to in the UI as the public subnet and the private subnet). The VNet and subnet IP ranges that you use for Azure Databricks defines the maximum number of cluster nodes that you can use at one time. Choose these values carefully.
- To implement front-end Private Link, back-end Private Link, or both, your workspace VNet needs a third subnet that contains the Private Link endpoint and its IP address range must not overlap with the range of your other workspace subnets. This article refers to this third subnet as the private endpoint subnet. Examples and screenshots assume the subnet name
private-link. This can be as small as CIDR range
/27. Do not define any NSG rules for a subnet that contains private endpoints.
- If you use the UI to create objects, you need to create the network and subnets manually before creating the Azure Databricks workspace. If you want to use a template, the template that Azure Databricks provides creates a VNet and appropriate subnets for you, including the two regular subnets plus another for private endpoints.
(For front-end Private Link) For users to access the workspace from your on-premises network, you must add private connectivity from that network to your Azure network. Add this connectivity before configuring Private Link.
The details vary based on whether you use the Private Link standard configuration or the simplified configuration.
- For the standard deployment, you would create or use an existing transit VNet, sometimes called a bastion VNet or hub VNet. This VNet must be reachable from the on-premises user environment using Expressroute or a VPN gateway connection. For front-end Private Link, Databricks recommends creating a separate VNet for your connectivity to the control plane, rather than sharing the workspace VNet. Note that the transit VNet and its subnet can be in the same region, zone, and resource group as your workspace VNet and its subnets, but they do not have to match. Create a resource group for the separate transit VNet and use a different private DNS zone for that private endpoint. If you use two separate private endpoints, you cannot share the DNS zone.
- For the simplified deployment, you create a transit subnet in your workspace VNet. In this deployment, the transit subnet does not have a separate private endpoint. The transit subnet in the workspace VNet uses a single private endpoint for both back-end and front-end connections.
Azure user permissions
As Azure user, you have read/write permissions sufficient to:
- Provision a new Azure Databricks workspace.
- Create Azure Private Link endpoints in your workspace VNet and also (for front-end usage) your transit VNet.
If the user who created the private endpoint for the transit VNet does not have owner/contributor permissions for the workspace, then a separate user with owner/contributor permissions for the workspace must manually approve the private endpoint creation request.