How to make Azure Databricks cluster outbound connectivity consistent with 1 public outgoing IP address?

kvdv 0 Reputation points
2023-12-06T09:53:57.1566667+00:00

I've setup an Azure Databricks service that should get outbound connectivity through an Azure Firewall, which in turn makes sure that all outbound traffic is routed through a single public IP address.

As suggested by a Microsoft auto generated solution I have done the following:

  • Used a VNet injected workspace, as suggested in this article (referenced by Microsoft):

https://learn.microsoft.com/en-gb/azure/databricks/administration-guide/cloud-configurations/azure/vnet-inject

  • And applied this article (referenced by Microsoft):

https://kb.databricks.com/cloud/azure-vnet-single-ip

This works! Also when stopping and starting the Databricks cluster, the same public IP address is used.

However; when stopping and starting the firewall, on occasion the Databricks cluster cannot get outbound connectivity at all. Sometimes it works, sometimes it doesn't and I can't find a possible reason or inconsistency in the way things are invoked.

The method I'm using to stop and start the firewall (and Databricks cluster) is:

Yesterday this worked perfectly, today no luck and nothing has changed in the Azure setup.

The output of the start- and stop firewall scripts are 99% the same every time, with the only difference being new e-tags. Sometimes when I manually stop / start the Firewall it suddenly starts working, and a next time it doesn't. So basically: inconsistent behaviour without infra changes.

Things I've tried to fix the problem:

  • One method:
    • Stopping the firewall "manually" via the automation runbook
      • Waiting for x amount of minutes to make sure it's actually stopped
        • Starting the firewall manually via the automation runbook
          • Starting the Databricks cluster either manually or via the workflow
            • Checking for outbound connectivity with a simple http request
  • Another method:
    • Stopping and starting the firewall as described above
      • Running a databricks notebook, which invokes the cluster start
        • Checking for outbound connectivity with a simple http request

As said: sometimes it works, sometimes it doesn't. When it doesn't; no outbound connectivity is possible at all.

Any suggestions as to where I could investigate the possible cause of this are much appreciated.

Azure Firewall
Azure Firewall
An Azure network security service that is used to protect Azure Virtual Network resources.
612 questions
Azure Virtual Network
Azure Virtual Network
An Azure networking service that is used to provision private networks and optionally to connect to on-premises datacenters.
2,296 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,075 questions
{count} votes