Virtual Network settings on Storage account interaction with Databricks Table Monitoring

Asger Lillie Larsen 0 Reputation points
2024-07-25T11:14:24.92+00:00

I have set up my Databricks Unity Catalog on an Azure Data Lake storage account which uses my companies virtual network to allow access. I have all privileges on my account, so I am able to create, alter or delete catalogs, schemas and tables using a Databricks general purpose cluster. I can do these things either using the Databricks UI or simple by using SQL statements from a notebook.

I'm trying to use the Databricks table monitoring feature to track the quality of the data in my catalog tables, but when I try to create the metric tables, I am met with an error suggesting there is a problem with the configuration of the warehouse cluster (serverless compute).
User's image

I know the problem is the access between the storage account and the serverless compute, because if I check the 'Enabled from all networks' option under networking in the storage account, then I am able to setup the monitoring and the metrics tables are created.

User's image

Unfortunately, I am required to use the company's virtual network for security reasons, so allowing access for all networks is not an option.

Keep in mind, everything else I've tested so far works fine in Databricks using the virtual network setting. I am able to create, alter and drop tables in the Unity Catalog, just not monitoring the tables that I've created.

A difference I have noticed is that the monitoring feature is automatically using a warehouse cluster instead of the general purpose cluster that i usually use for everything else. I have enabled the 'Unity Catalog' setting in the warehouse cluster, but it still won't create the metric tables. I have gone through the proposed checklist in this post, metric-tables-not-created-automatically., and it looks like everything is setup correctly.

Is there a specific configuration of the warehouse cluster or the storage account that needs to be enabled before I can setup monitoring for my tables?

Azure Storage Accounts
Azure Storage Accounts
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
3,222 questions
Azure
Azure
A cloud computing platform and infrastructure for building, deploying and managing applications and services through a worldwide network of Microsoft-managed datacenters.
1,017 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,218 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Luis Arias 7,126 Reputation points
    2024-07-25T20:23:41.38+00:00

    Hi Asger Lillie Larsen,

    I understood you want to insolate the communication between your Storage account datalake and the Azure Databricks Serverless Cluster . In that case you can need to use network connectivity configurations (NCCs) following this guide:

    https://learn.microsoft.com/en-us/azure/databricks/security/network/serverless-network-security/serverless-firewall#--step-4-add-azure-storage-account-network-rules

    If you use an Azure Storage firewall to protect access to Azure storage data sources, you must configure your firewall to allow access from the serverless compute nodes. See Configure a firewall for serverless compute access. https://learn.microsoft.com/en-us/azure/databricks/admin/sql/serverless

    References:

    If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.

    Regards,

    Luis

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.