Hello @George Alexiou ,
Welcome to the MS Q&A platform.
The connection to the Hive metastore is required for Databricks to access the metadata of the tables stored in the Unity Catalog metastore. The Unity Catalog metastore stores the metadata of the tables in Databricks. However, Databricks still needs to connect to the Hive metastore to retrieve the metadata of the tables(mysql DB is used for storing other metadata in addition to hive metastore)
If the customer wants to block all connections to the internet, they can consider setting up a private endpoint for the Hive metastore. A private endpoint is a network interface that connects to a private IP address in a VNet. By setting up a private endpoint for the Hive metastore, customer can ensure that all traffic to the metastore stays within their VNet and does not go over the internet.
Below is the document to set up a private endpoint for the Hive metastore:
The other option is to use Secure Cluster Connectivity (No Public IP / NPIP). This will prevent Databricks from making connections to the internet. With this option, Databricks clusters are not assigned public IP addresses, and all inbound and outbound traffic is routed through a customer-managed virtual network.
https://learn.microsoft.com/en-us/azure/databricks/security/network/secure-cluster-connectivity
I hope this helps. Please let us know if you have any further questions.
If this answers your question, please consider accepting the answer by hitting the Accept answer and up-vote as it helps the community look for answers to similar questions.