Data lake security and acces

Anshal 2,251 Reputation points
2024-02-11T13:15:10.2033333+00:00

Hi friends, we can't use the Unity catalog for Databricks for a budget reason, I want to under how the security roles are defined in Databricks. The roles that we set in ADLS or Active Directory roles in ADF will cascade to Databricks , how to access level actually defined and work.

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,520 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,633 questions
0 comments No comments
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA 90,641 Reputation points Moderator
    2024-02-12T04:45:29.1666667+00:00

    @Anshal - Thanks for the question and using MS Q&A platform.

    In Databricks, access to data stored in ADLS (Azure Data Lake Storage) is controlled through the use of access control lists (ACLs) and role-based access control (RBAC).

    ACLs are used to grant or deny access to specific files or directories in ADLS. You can set ACLs on files and directories using the Azure portal, Azure Storage Explorer, or the Azure Storage REST API. ACLs can be set for individual users, groups, or service principals.

    RBAC is used to grant or deny access to ADLS at a higher level, such as at the storage account or container level. RBAC roles are defined in Azure Active Directory (AAD) and can be assigned to users, groups, or service principals. There are several built-in RBAC roles in ADLS, such as Storage Blob Data Contributor and Storage Blob Data Reader, which provide different levels of access to the data stored in ADLS.

    When it comes to Databricks, access to data stored in ADLS is controlled through the use of Databricks file system (DBFS) access control lists (ACLs) and Databricks workspace access control. DBFS ACLs are used to grant or deny access to specific files or directories in DBFS, which is a distributed file system that is integrated with Databricks. Databricks workspace access control is used to grant or deny access to Databricks resources, such as notebooks, clusters, and jobs.

    The RBAC roles that are defined in ADLS can be used to control access to data stored in ADLS from Databricks. For example, you can assign the Storage Blob Data Contributor role to a Databricks service principal to grant it write access to data stored in ADLS. Similarly, you can assign the Storage Blob Data Reader role to a Databricks service principal to grant it read access to data stored in ADLS.

    In summary, access to data stored in ADLS from Databricks is controlled through the use of ACLs, RBAC, DBFS ACLs, and Databricks workspace access control. RBAC roles defined in ADLS can be used to control access to data stored in ADLS from Databricks.

    For more details, refer to Access control model in Azure Data Lake Storage Gen2 and Databricks Access control overview.

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.