First thing to is to set Up an Azure NAT Gateway :
- Create a NAT Gateway:
- Navigate to the Azure portal.
- Go to Create a resource > Networking > NAT gateway.
- Configure the NAT gateway with a public IP address.
- Attach NAT Gateway to Subnets:
- In the NAT gateway configuration, attach it to the subnets used by your Databricks clusters. Make sure both worker and driver subnets are included.
Then, you need to configure the Databricks cluster network :
- Create a Virtual Network (VNet):
- Ensure your Databricks workspace is configured to use your custom VNet (VNet Injection).
- Configure Subnets:
- Make sure the subnets used by Databricks clusters are the ones attached to the NAT gateway.
Then, enable the Secure Cluster Connectivity which routes all traffic through the Azure backbone network.
Then, modify the NSGs associated with your subnets to allow outbound traffic to the internet through the NAT gateway.
Next step, enable Azure Private Link to ensure that your Databricks clusters do not create their own public IPs.
Example Configuration Steps:
- Create a NAT Gateway:
az network nat gateway create --resource-group <your_resource_group> --name <your_nat_gateway_name> --location <your_location> --public-ip-addresses <your_public_ip_address>
- Attach NAT Gateway to Subnet:
az network vnet subnet update --resource-group <your_resource_group> --vnet-name <your_vnet_name> --name <your_subnet_name> --nat-gateway <your_nat_gateway_name>
- Enable SCC in Databricks:
- Go to the Databricks workspace settings.
- Enable Secure Cluster Connectivity (SCC).
- Update NSG Rules:
- Ensure outbound rules allow traffic to the internet through the NAT gateway.
Diagram:
Azure Virtual Network (VNet)
|
|-- Subnet A (Driver Nodes)
| |-- NAT Gateway (Public IP)
|
|-- Subnet B (Worker Nodes)
|-- NAT Gateway (Public IP)