Creating a HDInsight Spark 4.0 cluster with managed identity and a Data Lake Store gen 2 storage account

LHIND, CARSTEN 1 Reputation point
2022-08-16T12:48:09.72+00:00

Hi!

I am trying to create a HDI cluster with an ADLS Gen2 storage account as primary storage account.

I have created multiple containers inside my storage account, and I want to limit the access of the managed identity to the other containers.

Therefore I dont want to give Storage Blob Data Owner rights on STORAGE ACCOUNT level to the managed identity that only is available to the hdinsight cluster.

Is there a way to give less rights to the storage account? I am giving the managed identity Storage Blob Data Owner on container level , but not on Storage account level.

Best regards,
Carsten

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,489 questions
Azure Storage Accounts
Azure Storage Accounts
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
3,227 questions
Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
212 questions
{count} votes

2 answers

Sort by: Most helpful
  1. SaiKishor-MSFT 17,236 Reputation points
    2022-08-16T22:52:50.36+00:00

    @LHIND, CARSTEN Thank you for reaching out to Microsoft Q&A. I understand that you want to limit access to your Storage Account to the managed identity. Please correct me otherwise.

    The following roles permit a security principal to access data in a storage account.

    • Storage Blob Data Owner: Full access to Blob storage containers and data. This access permits the security principal to set the owner an item, and to modify the ACLs of all items.
    • Storage Blob Data Contributor: Read, write, and delete access to Blob storage containers and blobs. This access does not permit the security principal to set the ownership of an item, but it can modify the ACL of items that are owned by the security principal.
    • Storage Blob Data Reader: Read and list Blob storage containers and blobs.

    Note- Roles such as Owner, Contributor, Reader, and Storage Account Contributor permit a security principal to manage a storage account, but do not provide access to the data within that account. However, these roles (excluding Reader) can obtain access to the storage keys, which can be used in various client tools to access the data.

    You can also use ACLs with ADLS Gen2 in combination with Azure Roles for the security principal. You can check out the Permissions table: Combining Azure RBAC and ACL to see how this can work. Hope this helps.
    Please let us know if you have any more questions and we will be glad to assist you further. Thank you!

    Remember:

    Please accept an answer if correct. Original posters help the community find answers faster by identifying the correct answer. Here is how.

    Want a reminder to come back and check responses? Here is how to subscribe to a notification.


  2. PRADEEPCHEEKATLA 90,231 Reputation points
    2022-08-24T10:19:42.537+00:00

    Hello @LHIND, CARSTEN ,

    Apologize for the delay in response.

    Azure HDInsight uses managed identities to secure cluster access to files in Azure Data Lake Storage Gen2. Managed identities are a feature of Azure Active Directory that provides Azure services with a set of automatically managed credentials. These credentials can be used to authenticate to any service that supports Active Directory authentication. Using managed identities doesn't require you to store credentials in code or configuration files.

    Data Lake Storage Gen2 supports the following authorization mechanisms:

    • Shared Key authorization
    • Shared access signature (SAS) authorization
    • Role-based access control (Azure RBAC)
    • Access control lists (ACL)

    Shared Key and SAS authorization grants access to a user (or application) without requiring them to have an identity in Azure Active Directory (Azure AD). With these two forms of authentication, Azure RBAC and ACLs have no effect.

    Azure RBAC and ACL both require the user (or application) to have an identity in Azure AD. Azure RBAC lets you grant "coarse-grain" access to storage account data, such as read or write access to all of the data in a storage account, while ACLs let you grant "fine-grained" access, such as write access to a specific directory or file.

    ACLs give you the ability to apply "finer grain" level of access to directories and files. An ACL is a permission construct that contains a series of ACL entries. Each ACL entry associates security principal with an access level.

    Note: You can associate a security principal with an access level for files and directories. Each association is captured as an entry in an access control list (ACL). Each file and directory in your storage account has an access control list. When a security principal attempts an operation on a file or directory, An ACL check determines whether that security principal (user, group, service principal, or managed identity) has the correct permission level to perform the operation.

    234485-image.png

    For more details, see Access control lists (ACLs) in Azure Data Lake Storage Gen2 and Managed identities in Azure HDInsight

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.