Use Unity Catalog OSS in Azure Synapse Spark (instead of Hive metastore)

Martin B 126 Reputation points
2024-10-11T15:49:36.6033333+00:00

Hi, some month ago, Databricks announced that they open sourced Unity Catalog.
This is the GitHub project: https://github.com/unitycatalog/unitycatalog

I was wondering if it is possible to use Unity Catalog in Azure Synapse Spark instead of Hive metastore.

This is the official quickstart but I'm not able to understand what this would mean for a Synapse setting. Will I need to run the Catalog on a separate Server/VM in Azure or can in run Unity Catalog on my Synapse Spark cluster itself?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,378 questions
{count} vote

Accepted answer
  1. PRADEEPCHEEKATLA 90,651 Reputation points Moderator
    2024-10-22T11:13:13.25+00:00

    @Martin B - Thanks for the question and using MS Q&A platform.

    Currently, use of Unity Catalog OSS in Azure Synapse Spark (instead of Hive metastore) is not supported yet in Azure Synapse Analytics.

    Appreciate if you could share the feedback on our feedback channel. Which would be open for the user community to upvote & comment on. This allows our product teams to effectively prioritize your request against our existing feature backlog and gives insight into the potential impact of implementing the suggested feature.

    Hope this helps. Do let us know if you have any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    1 person found this answer helpful.

3 additional answers

Sort by: Most helpful
  1. Amira Bedhiafi 33,866 Reputation points Volunteer Moderator
    2024-10-11T21:52:12.6266667+00:00

    Unity Catalog, which was initially developed by Databricks, is indeed available as open source now, and it offers a centralized governance model for data lakes. However, Azure Synapse Spark, as of now, primarily integrates with Azure Data Lake Storage and Azure SQL Database, and leverages the Hive Metastore for metadata management.

    The key question about whether you can use Unity Catalog in Azure Synapse Spark instead of the Hive metastore depends on how much integration Synapse offers with external catalog systems like Unity Catalog.

    Running Unity Catalog in Azure Synapse Spark

    Here’s what to consider:

    1. Native Synapse Support: As of now, Azure Synapse Analytics does not natively support Unity Catalog. It primarily integrates with the Hive Metastore and Azure Data Lake Storage for metadata. Unity Catalog is deeply integrated with Databricks' workspace, and using it outside of the Databricks platform (especially in Synapse Spark) is not a standard feature yet.
    2. Separate Server/VM Requirement: If you still want to attempt using Unity Catalog in Azure Synapse Spark, you will likely need to run Unity Catalog on a separate server or VM. This is because Synapse Spark clusters do not currently support Unity Catalog natively.
    3. Synapse Spark Cluster Limitation: The catalog itself (Unity) is designed to run within Databricks or within environments that natively integrate with Databricks services. For Azure Synapse, integrating Unity Catalog would involve custom configurations, which likely means:
      • Setting up a standalone Unity Catalog service on a separate server.
      • Creating a connector or an integration to enable Synapse Spark to read metadata from Unity Catalog.

    Recommendations

    • Hive Metastore: Since Synapse supports Hive metastore natively, it's more straightforward to use it for now, unless Microsoft releases specific integration options for Unity Catalog in Synapse.
    • Separate VM for Unity Catalog: If you want to experiment with Unity Catalog in Synapse, you will need to host it separately and handle integration manually, as there are no built-in connectors or configurations for this in Synapse yet.

    You cannot directly run Unity Catalog on Synapse Spark clusters. Instead, you would need a separate VM to host Unity Catalog, and Synapse Spark would need to connect to that. The integration process will require additional steps since this is not an out-of-the-box feature for Azure Synapse.

    For further steps, consider staying updated on any changes in the Azure Synapse Spark ecosystem or looking for custom connector implementations that might allow you to integrate Unity Catalog in the future.


  2. Ahmed Shimail Gillani 0 Reputation points Microsoft Employee
    2024-12-02T06:50:53.15+00:00

    I think there is another option, just create separate Azure Databricks Workspace and Unity Catalog and then register Azure Synapse Workspace as Foreign Catalog inside Azure Databricks?


  3. Ahmed Shimail Gillani 0 Reputation points Microsoft Employee
    2024-12-02T06:51:34.0933333+00:00

    I think there is another option, just create separate Azure Databricks Workspace and Unity Catalog and then register Azure Synapse Workspace as Foreign Catalog inside Azure Databricks?

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.