can't observe "hive_metastore" external table

Martin 100 Reputation points
2024-09-10T07:51:45.22+00:00

Hello Microsoft Team,

I haven't used the Data Catalogue of Azure Databricks for some time and it turned out that a lot of things have changed there. I am testing something and created an Azure Data Lake Storage and saved some parquet files in delta format there. I just wanted to create an external table based on those files in order to observe the table and I came to a lots of issues:

First, I wanted to create a mounting point and based on it create the external table. It turned out that the mounting is no longer advisable and is deprecated. I tried first to mount the storage, using the "wasbs" protocol. It worked, but as far as I see I cannot create an external table based on it. So I tried to mount it using the "abfss" protocol, but it keeps telling me "Unsupported Azure Scheme: abfss". Are my observations right until now?

Second, I don't have access to the Microsoft Entra ID in my Azure Portal since it is a corporate Tenant and I don't want to loose time and request the responsible colleges invest their time in this, too. BUT is kind of hard to use the Unity Catalogue without the right permissions (possibility to create Service Principals and Managed Identities in the Microsoft Entra ID). So it is hard to create external location and storage credentials. It is also impossible to set up the security key using the Oauth2 method. Are these observations also right?

Third, because of all that, I decided to use the hive_metastore and create and external table there referring to the data lake directly with the URL.
So, I fist set up the Data Lake Key using the method shown in the image (using the respective key of course, I am just masking it for the snapshot)Azure Key

And after that I run the code that can be found in this image:
CET

So far so good. The table is created and I am able to query it with a code like:

%SQL
select * from hive_metastore.test.external_comments

And get the results. But when I go to the Data Catalogue and try to observe the table there, I see this:

data_catalogue

As far as I understand, this is because the Table Access Control is not enabled. BUT I have enabled it here in the Workspace:

TAC Workspace

And it is said that I should enabled it on the Cluster level, too. But I cannot find a place where I should do this. Could you please help me and shed more light on this?

I first used a single node cluster but after that tried with one node - one worker Shared cluster. No difference. Still the same result. And it is stated that the Shared clusters have the TAC enabled by default.

Many thanks for your help in advance!

BR,

Martin

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,373 questions
{count} votes

Accepted answer
  1. NIKHILA NETHIKUNTA 4,600 Reputation points Microsoft External Staff
    2024-09-12T07:02:32.55+00:00

    @Martin
    I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others ", I'll repost your solution in case you'd like to accept the answer .

    Ask: You decided to use the hive_metastore to create an external table by directly referring to the data lake URL. After setting up the Data Lake Key, you encountered errors when trying to observe the table in the Data Catalog. You suspect this is due to Table Access Control (TAC) not being enabled at the cluster level, despite enabling it in the Workspace and using a Shared cluster, which should have TAC enabled by default. You are seeking guidance on how to resolve these issues, particularly with enabling TAC at the cluster level and addressing the mounting and permission challenges.

    Solution: The TAC issue got resolved the by creating the table with a Shared cluster instead of a single node cluster. However, the main problem was related to the Data Lake access key. It was fixed by defining the key in the Spark configuration as follows:

    fs.azure.account.key.cdctestpymongo.dfs.core.windows.net{{secrets/<scope_name>/<your_secret_key_name>}}
    

    If I missed anything please let me know and I'd be happy to add it to my answer, or feel free to comment below with any additional information.

    If you have any other questions, please let me know. Thank you again for your time and patience throughout this issue.


    Please don’t forget to Accept Answer and Yes for "was this answer helpful" wherever the information provided helps you, this can be beneficial to other community members.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.