Access Control using Service Principal in the databricks

Kumari s 40 Reputation points
2023-03-20T01:38:12.6466667+00:00

In order for our Databricks workspace to access different data sets, we need a way to grant access control either at an individual or role level. These data sets are saved as files in Data Lake Gen2 and can be organized to align with our access rights needs, either by assigning a storage account per dataset or by grouping multiple datasets in one container under a storage account. While our architectural mandate requires access to be granted via a service principal, this would give all Databricks workspace users the same access rights to all storage accounts and datasets. Therefore, we need to explore alternative solutions to grant access to storage accounts from Databricks via service principal while also maintaining access control for individual users or roles. Is it possible to achieve this level of control at the container level? I tried to access storage accounts with a service principal from our Databricks workspace, but this approach gave all users access to the storage accounts, which is not ideal. Can you offer any guidance?

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,526 questions
0 comments No comments
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA 90,646 Reputation points Moderator
    2023-03-21T06:23:46.04+00:00


    Kumari s
    - Thanks for the question and using MS Q&A platform.

    Azure Data Lake Storage Gen2 supports the following authorization mechanisms:

    • Shared Key authorization
    • Shared access signature (SAS) authorization
    • Role-based access control (Azure RBAC)
    • Attribute-based access control (Azure ABAC)
    • Access control lists (ACL)

    For more details, refer to Access control model in Azure Data Lake Storage Gen2.

    Yes, it is possible to achieve granular access control for individual users or roles at the container level in Azure Data Lake Gen2 while still using a service principal to access the storage accounts from Databricks.

    One approach is to use Azure ADLS Gen2 Access Control Lists (ACLs) to grant access at the container level. ACLs allow you to specify permissions for individual users or roles on specific files and folders within a container. You can create a service principal and grant it the necessary permissions to access the storage account, and then use the ACLs to grant access to specific containers, files, or folders within the storage account. This way, you can control who has access to each dataset and ensure that access is granted only to the necessary users or roles.

    Here are the general steps to achieve this:

    1. Create a service principal in Azure Active Directory and grant it the necessary permissions to access the storage account.
    2. Create an Azure Data Lake Storage Gen2 account and set up containers and folders to organize the datasets.
    3. Use the Azure portal or Azure CLI to set up Access Control Lists (ACLs) for the containers and folders, granting access to individual users or roles.
    4. In the Databricks workspace, create a secret scope for the service principal credentials.
    5. Use the secret scope to access the service principal credentials in your notebooks and jobs.
    6. Use the DButils.fs library in Databricks to read and write data to the Data Lake Gen2 account.

    By using this approach, you can ensure that each user or role only has access to the specific containers, files, or folders they need, while still allowing the Databricks workspace to access the data via the service principal.

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.