Share via

Databricks Access Best Practices for Multi-Engineer Team on Hybrid Ingestion Project

Anonymous
2025-08-05T10:02:18.8133333+00:00

In our healthcare project, we have seven data engineers collaborating on Databricks notebooks across three pipeline stages: historical load, catch-up (CDC batch from Kafka), and real-time streaming ingestion. Key Architecture Details: All pipelines are triggered by ADF, which executes Databricks notebooks. Each notebook runs on a job-scoped cluster created dynamically using a Databricks cluster pool (classified as small, medium, large based on the job).

  • Shared interactive clusters are not used.
  • Is it Critical to use Unity Catalog ? Is there any extra cost with UC ? how we can do without UC ? Our Intent: To define a role-based access model where: Data engineers can develop, run, and debug notebooks collaboratively Cluster and job configurations are centrally managed by DevOps Engineers cannot create personal clusters or jobs independently

Questions: What is the best practice for managing notebook-level access across the team in such a setup? Should we restrict job and cluster creation to DevOps/Leads only, given that ADF handles job execution? How can we enforce Git-based collaboration using Repos while preventing accidental overwrites?

Any recommendations for RBAC setup and workspace folder organization to support this pipeline design?

Azure Databricks
Azure Databricks

An Apache Spark-based analytics platform optimized for Azure.

0 comments No comments

1 answer

Sort by: Most helpful
  1. Venkat Reddy Navari 5,840 Reputation points Microsoft External Staff Moderator
    2025-08-05T11:03:39.82+00:00

    Hi Janice Chi

    Is Unity Catalog critical? What are the pros/cons and costs

    You can implement access controls without Unity Catalog (UC), but for a team of this size and in a regulated environment like healthcare, we strongly recommend using it.

    Benefits of Unity Catalog:

    • Centralized governance across all data, notebooks, and compute.
    • Fine-grained access controls (table, column, view level).
    • Lineage, audit logging, and data masking capabilities.
    • Better support for scalable role-based access using Azure AD groups.

    Cost: Unity Catalog itself has no separate cost, but it does require the Premium or Enterprise tier of Databricks.

    More info: Unity Catalog best practices – Microsoft Learn

    Access control in Unity Catalog

    If you decide not to use Unity Catalog, you'll need to manage permissions via:

    • Workspace folder-level ACLs,
    • External tools like Azure Purview,
    • Manual handling of cluster/job permissions which can become harder to scale.

    Should DevOps own cluster/job management

    Yes, especially since you're using ADF to trigger notebooks with job-scoped clusters, it makes sense to restrict cluster and job creation to DevOps or leads.

    Best Practice:

    • Use Cluster Policies to limit the types of clusters users can launch.
    • Grant engineers only the Can Run permission on jobs, not Can Manage.
    • Disable personal cluster creation for all non-admin users.

    This helps standardize your environment, reduces errors, and aligns with least privilege access.

    How to collaborate with Git and avoid overwrites

    To enable safe collaboration across multiple engineers:

    • Use Databricks Repos with Git integration (GitHub or Azure Repos).
    • Organize work around feature branches.
    • Use pull requests and enforce branch protections to avoid accidental overwrites.
    • Consider using notebooks in source format (.py, .sql) to make Git diffs cleaner and version control more robust.

    More info: https://learn.microsoft.com/en-us/azure/databricks/repos/

    Folder organization and RBAC setup

    Organize your workspace into logical folders by pipeline stage:

    /Shared
      /HistoricalLoad
      /CatchUpCDC
      /RealtimeStreaming
      /CommonLibs
      /DevOps
    

    RBAC Mapping by Folder:

    • Data Engineers: Can Edit on specific pipeline folders.
    • Leads: Can Manage on shared folders.
    • DevOps: Full access to /DevOps and jobs/clusters.
    • Use Workspace ACLs or Unity Catalog roles to enforce this model.

    Hope this helps. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    Was this answer helpful?


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.