Data model tool to connect to Databricks or Data lake

Darshan MS 61 Reputation points
2022-07-04T11:57:56.67+00:00

Hi Everyone,

From data modeling documentation (Dimensional/ ER Diagram), is there any tool available which can connect to databricks/ data lake and read the table structure directly and also updates the structure of table whenever there is a addition or deletions of columns in a table.
And in a process, it should not remove the relationship made between tables whenever there is an update to a columns and/ or tables (addition/ deletion). And version control on same will be helpful using GIT etc.

Reason being I understand the PK and FK details are not maintained in datalake/ databricks tables entities. Request to please propose if any modeling tools are present for this use case.

Thanks,

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,400 questions
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA 90,596 Reputation points
    2022-07-05T06:01:26.24+00:00

    Hello @Darshan MS ,

    Thanks for the question and using MS Q&A platform.

    This article - Working with Entity Relationship (ER) Diagrams on Databricks, helps connecting one of these tools to Databricks with the focus on generating an Entity-Relationship (ER) Diagram.

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Darshan MS 61 Reputation points
    2022-07-08T14:34:22.543+00:00

    Yes, I am able to connect via DBeaver to Databricks using the JDBC and supported provided link (Sorry for delay in update as I had to try on Trial version of Enterprise DBeaver) and additional links which I had followed are https://docs.databricks.com/dev-tools/dbeaver.html and https://databricks.com/spark/jdbc-drivers-download (As download in primary link is pointing to odbc drivers) and URL may need some changes based on Databricks which is available in second link I have shared.

    Observations:

    Very neatly able to connect to databricks schema and tables and ER diagram can be built based on selective schema/ tables as well.

    If PK/ FK or constraints are not defined in Databricks delta table then we cannot add it in table while doing ER diagram in properties as that will try to persist and it will fail as ALTER is not supported and only CHECK is supported as of now.

    Virtual relationships can be performed and it looks good. While exporting as pdf via print option, then we cannot get the details of virtual relationships (we can see which table is connected to other, column analysis need to be made visually or depend on tool).

    Q1: Any way to extract the relationship made between tables as well like ER export, and am I doing right way of export?

    If multiple focal points need to work on same file, changing the properties is not possible under Projects tab i.e., C:\Users\UserProfile\AppData\Roaming\DBeaverData\workspace6 and we cannot change the path.

    Q2: I did not get how to enable in tool of 's feedback "Then you can use the repos feature where you will be integrating GIT." in tool. Forward question on the same subject i.e., if multiple folks need to work on tool, then do we require multiple licenses (Is license is transferrable based on request, maybe specific to DBeaver team)


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.