Редактиране

Споделяне чрез


Use Python to manage ACLs in Azure Data Lake Storage

This article shows you how to use the Python to get, set, and update the access control lists of directories and files.

ACL inheritance is already available for new child items that are created under a parent directory. But you can also add, update, and remove ACLs recursively on the existing child items of a parent directory without having to make these changes individually for each child item.

Package (Python Package Index) | Samples | Recursive ACL samples | API reference | Gen1 to Gen2 mapping | Give Feedback

Prerequisites

  • Azure subscription - create one for free.
  • Azure storage account that has hierarchical namespace (HNS) enabled. Follow these instructions to create one.
  • Python 3.8+
  • Azure CLI version 2.6.0 or higher.
  • One of the following security permissions:
    • A provisioned Microsoft Entra ID security principal that has been assigned the Storage Blob Data Owner role, scoped to the target container, storage account, parent resource group, or subscription.
    • Owning user of the target container or directory to which you plan to apply ACL settings. To set ACLs recursively, this includes all child items in the target container or directory.
    • Storage account key.

Set up your project

This section walks you through preparing a project to work with the Azure Data Lake Storage client library for Python.

From your project directory, install packages for the Azure Data Lake Storage and Azure Identity client libraries using the pip install command. The azure-identity package is needed for passwordless connections to Azure services.

pip install azure-storage-file-datalake azure-identity

Then open your code file and add the necessary import statements. In this example, we add the following to our .py file:

from azure.identity import DefaultAzureCredential
from azure.storage.filedatalake import DataLakeServiceClient

Connect to the account

To run the code examples in this article, you need to create a DataLakeServiceClient instance that represents the storage account. You can authorize the client object with Microsoft Entra ID credentials or with an account key.

You can use the Azure identity client library for Python to authenticate your application with Microsoft Entra ID.

Note

If you're using Microsoft Entra ID to authorize access, then make sure that your security principal has been assigned the Storage Blob Data Owner role. To learn more about how ACL permissions are applied and the effects of changing them, see Access control model in Azure Data Lake Storage.

First, assign one of the following Azure role-based access control (Azure RBAC) roles to your security principal:

Role ACL setting capability
Storage Blob Data Owner All directories and files in the account.
Storage Blob Data Contributor Only directories and files owned by the security principal.

Next, create a DataLakeServiceClient instance and pass in a new instance of the DefaultAzureCredential class.

def get_service_client_token_credential(self, account_name) -> DataLakeServiceClient:
    account_url = f"https://{account_name}.dfs.core.windows.net"
    token_credential = DefaultAzureCredential()

    service_client = DataLakeServiceClient(account_url, credential=token_credential)

    return service_client

To learn more about using DefaultAzureCredential to authorize access to data, see Overview: Authenticate Python apps to Azure using the Azure SDK.

Set ACLs

When you set an ACL, you replace the entire ACL including all of its entries. If you want to change the permission level of a security principal or add a new security principal to the ACL without affecting other existing entries, you should update the ACL instead. To update an ACL instead of replace it, see the Update ACLs section of this article.

This section shows you how to:

  • Set the ACL of a directory
  • Set the ACL of a file

Set the ACL of a directory

Get the access control list (ACL) of a directory by calling the DataLakeDirectoryClient.get_access_control method and set the ACL by calling the DataLakeDirectoryClient.set_access_control method.

This example gets and sets the ACL of a directory named my-directory. The string rwxr-xrw- gives the owning user read, write, and execute permissions, gives the owning group only read and execute permissions, and gives all others read and write permission.

def manage_directory_permissions():
    try:
        file_system_client = service_client.get_file_system_client(file_system="my-file-system")

        directory_client = file_system_client.get_directory_client("my-directory")
        
        acl_props = directory_client.get_access_control()
        
        print(acl_props['permissions'])
        
        new_dir_permissions = "rwxr-xrw-"
        
        directory_client.set_access_control(permissions=new_dir_permissions)
        
        acl_props = directory_client.get_access_control()
        
        print(acl_props['permissions'])
    
    except Exception as e:
     print(e)

You can also get and set the ACL of the root directory of a container. To get the root directory, call the FileSystemClient._get_root_directory_client method.

Set the ACL of a file

Get the access control list (ACL) of a file by calling the DataLakeFileClient.get_access_control method and set the ACL by calling the DataLakeFileClient.set_access_control method.

This example gets and sets the ACL of a file named my-file.txt. The string rwxr-xrw- gives the owning user read, write, and execute permissions, gives the owning group only read and execute permissions, and gives all others read and write permission.

def manage_file_permissions():
    try:
        file_system_client = service_client.get_file_system_client(file_system="my-file-system")

        directory_client = file_system_client.get_directory_client("my-directory")
        
        file_client = directory_client.get_file_client("uploaded-file.txt")

        acl_props = file_client.get_access_control()
        
        print(acl_props['permissions'])
        
        new_file_permissions = "rwxr-xrw-"
        
        file_client.set_access_control(permissions=new_file_permissions)
        
        acl_props = file_client.get_access_control()
        
        print(acl_props['permissions'])

    except Exception as e:
     print(e)

Set ACLs recursively

When you set an ACL, you replace the entire ACL including all of its entries. If you want to change the permission level of a security principal or add a new security principal to the ACL without affecting other existing entries, you should update the ACL instead. To update an ACL instead of replace it, see the Update ACLs recursively section of this article.

Set ACLs recursively by calling the DataLakeDirectoryClient.set_access_control_recursive method.

If you want to set a default ACL entry, then add the string default: to the beginning of each ACL entry string.

This example sets the ACL of a directory named my-parent-directory.

This method accepts a boolean parameter named is_default_scope that specifies whether to set the default ACL. If that parameter is True, the list of ACL entries are preceded with the string default:. The entries in this example grant the following permissions: read, write, and execute permissions for the owning user, read and execute permissions for the owning group, and read permissions for all others. The last ACL entry in this example gives a specific user with the object ID xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx read permissions.

def set_permission_recursively(is_default_scope):
    
    try:
        file_system_client = service_client.get_file_system_client(file_system="my-container")

        directory_client = file_system_client.get_directory_client("my-parent-directory")

        acl = 'user::rwx,group::r-x,other::r--,user:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx:r--'   

        if is_default_scope:
           acl = 'default:user::rwx,default:group::r-x,default:other::r--,default:user:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx:r--'

        directory_client.set_access_control_recursive(acl=acl)
        
        acl_props = directory_client.get_access_control()
        
        print(acl_props['permissions'])

    except Exception as e:
     print(e)

To see an example that processes ACLs recursively in batches by specifying a batch size, see the Python sample.

Update ACLs recursively

When you update an ACL, you modify the ACL instead of replacing the ACL. For example, you can add a new security principal to the ACL without affecting other security principals listed in the ACL. To replace the ACL instead of update it, see the Set ACLs section of this article.

To update an ACL recursively, create a new ACL object with the ACL entry that you want to update, and then use that object in update ACL operation. Do not get the existing ACL, just provide ACL entries to be updated. Update an ACL recursively by calling the DataLakeDirectoryClient.update_access_control_recursive method. If you want to update a default ACL entry, then add the string default: to the beginning of each ACL entry string.

This example updates an ACL entry with write permission.

This example sets the ACL of a directory named my-parent-directory. This method accepts a boolean parameter named is_default_scope that specifies whether to update the default ACL. if that parameter is True, the updated ACL entry is preceded with the string default:.

def update_permission_recursively(is_default_scope):
    
    try:
        file_system_client = service_client.get_file_system_client(file_system="my-container")

        directory_client = file_system_client.get_directory_client("my-parent-directory")
              
        acl = 'user:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx:rwx'   

        if is_default_scope:
           acl = 'default:user:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx:rwx'

        directory_client.update_access_control_recursive(acl=acl)

        acl_props = directory_client.get_access_control()
        
        print(acl_props['permissions'])

    except Exception as e:
     print(e)

To see an example that processes ACLs recursively in batches by specifying a batch size, see the Python sample.

Remove ACL entries recursively

You can remove one or more ACL entries. To remove ACL entries recursively, create a new ACL object for ACL entry to be removed, and then use that object in remove ACL operation. Do not get the existing ACL, just provide the ACL entries to be removed.

Remove ACL entries by calling the DataLakeDirectoryClient.remove_access_control_recursive method. If you want to remove a default ACL entry, then add the string default: to the beginning of the ACL entry string.

This example removes an ACL entry from the ACL of the directory named my-parent-directory. This method accepts a boolean parameter named is_default_scope that specifies whether to remove the entry from the default ACL. If that parameter is True, the updated ACL entry is preceded with the string default:.

def remove_permission_recursively(is_default_scope):

    try:
        file_system_client = service_client.get_file_system_client(file_system="my-container")

        directory_client = file_system_client.get_directory_client("my-parent-directory")

        acl = 'user:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'

        if is_default_scope:
           acl = 'default:user:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx'

        directory_client.remove_access_control_recursive(acl=acl)

    except Exception as e:
     print(e)

To see an example that processes ACLs recursively in batches by specifying a batch size, see the Python sample.

Recover from failures

You might encounter runtime or permission errors. For runtime errors, restart the process from the beginning. Permission errors can occur if the security principal doesn't have sufficient permission to modify the ACL of a directory or file that is in the directory hierarchy being modified. Address the permission issue, and then choose to either resume the process from the point of failure by using a continuation token, or restart the process from beginning. You don't have to use the continuation token if you prefer to restart from the beginning. You can reapply ACL entries without any negative impact.

This example returns a continuation token in the event of a failure. The application can call this example method again after the error has been addressed, and pass in the continuation token. If this example method is called for the first time, the application can pass in a value of None for the continuation token parameter.

def resume_set_acl_recursive(continuation_token):
    
    try:
        file_system_client = service_client.get_file_system_client(file_system="my-container")

        directory_client = file_system_client.get_directory_client("my-parent-directory")
              
        acl = 'user::rwx,group::rwx,other::rwx,user:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx:r--'

        acl_change_result = directory_client.set_access_control_recursive(acl=acl, continuation=continuation_token)

        continuation_token = acl_change_result.continuation

        return continuation_token
        
    except Exception as e:
     print(e) 
     return continuation_token

To see an example that processes ACLs recursively in batches by specifying a batch size, see the Python sample.

If you want the process to complete uninterrupted by permission errors, you can specify that.

To ensure that the process completes uninterrupted, don't pass a continuation token into the DataLakeDirectoryClient.set_access_control_recursive method.

This example sets ACL entries recursively. If this code encounters a permission error, it records that failure and continues execution. This example prints the number of failures to the console.

def continue_on_failure():
    
    try:
        file_system_client = service_client.get_file_system_client(file_system="my-container")

        directory_client = file_system_client.get_directory_client("my-parent-directory")
              
        acl = 'user::rwx,group::rwx,other::rwx,user:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx:r--'

        acl_change_result = directory_client.set_access_control_recursive(acl=acl)

        print("Summary: {} directories and {} files were updated successfully, {} failures were counted."
          .format(acl_change_result.counters.directories_successful, acl_change_result.counters.files_successful,
                  acl_change_result.counters.failure_count))
        
    except Exception as e:
     print(e)

To see an example that processes ACLs recursively in batches by specifying a batch size, see the Python sample.

Best practices

This section provides you some best practice guidelines for setting ACLs recursively.

Handling runtime errors

A runtime error can occur for many reasons (For example: an outage or a client connectivity issue). If you encounter a runtime error, restart the recursive ACL process. ACLs can be reapplied to items without causing a negative impact.

Handling permission errors (403)

If you encounter an access control exception while running a recursive ACL process, your AD security principal might not have sufficient permission to apply an ACL to one or more of the child items in the directory hierarchy. When a permission error occurs, the process stops and a continuation token is provided. Fix the permission issue, and then use the continuation token to process the remaining dataset. The directories and files that have already been successfully processed won't have to be processed again. You can also choose to restart the recursive ACL process. ACLs can be reapplied to items without causing a negative impact.

Credentials

We recommend that you provision a Microsoft Entra security principal that has been assigned the Storage Blob Data Owner role in the scope of the target storage account or container.

Performance

To reduce latency, we recommend that you run the recursive ACL process in an Azure Virtual Machine (VM) that is located in the same region as your storage account.

ACL limits

The maximum number of ACLs that you can apply to a directory or file is 32 access ACLs and 32 default ACLs. For more information, see Access control in Azure Data Lake Storage Gen2.

See also