Uredi

Deli z drugimi prek


Use Python to manage files and folders in Microsoft OneLake

This article shows how you can use the Azure Storage Python SDK to manage files and directories in OneLake. This walkthrough covers the same content as Use Python to manage directories and files in ADLS Gen2 and highlights the differences when connecting to OneLake.

Prerequisites

Before starting your project, make sure you have the following prerequisites:

  • A workspace in your Fabric tenant with Contributor permissions.
  • A lakehouse in the workspace. Optionally, have data preloaded to read using Python.

Set up your project

From your project directory, install packages for the Azure Data Lake Storage and Azure Identity client libraries. OneLake supports the same SDKs as Azure Data Lake Storage (ADLS) Gen2 and supports Microsoft Entra authentication, which is provided by the azure-identity package.

pip install azure-storage-file-datalake azure-identity

Next, add the necessary import statements to your code file:

import os
from azure.storage.filedatalake import (
    DataLakeServiceClient,
    DataLakeDirectoryClient,
    FileSystemClient
)
from azure.identity import DefaultAzureCredential

Authorize access to OneLake

The following example creates a service client connected to OneLake that you can use to create filesystem clients for other operations. To authenticate to OneLake, this example uses the DefaultAzureCredential to automatically detect credentials and obtain the correct authentication token. Common methods of providing credentials for the Azure SDK include using the 'az login' command in the Azure Command Line Interface or the 'Connect-AzAccount' cmdlet from Azure PowerShell.

def get_service_client_token_credential(self, account_name) -> DataLakeServiceClient:
    account_url = f"https://{account_name}.dfs.fabric.microsoft.com"
    token_credential = DefaultAzureCredential()

    service_client = DataLakeServiceClient(account_url, credential=token_credential)

    return service_client

To learn more about using DefaultAzureCredential to authorize access to data, see Overview: Authenticate Python apps to Azure using the Azure SDK.

Working with directories

To work with a directory in OneLake, create a filesystem client and directory client. You can use this directory client to perform various operations, including renaming, moving, or listing paths (as seen in the following example). You can also create a directory client when creating a directory, using the FileSystemClient.create_directory method.

def create_file_system_client(self, service_client, file_system_name: str) : DataLakeServiceClient) -> FileSystemClient:
    file_system_client = service_client.get_file_system_client(file_system = file_system_name)
    return file_system_client

def create_directory_client(self, file_system_client : FileSystemClient, path: str) -> DataLakeDirectoryClient: directory_client 
    directory_client = file_system_client.GetDirectoryClient(path)
    return directory_client


def list_directory_contents(self, file_system_client: FileSystemClient, directory_name: str):
    paths = file_system_client.get_paths(path=directory_name)

    for path in paths:
        print(path.name + '\n')

Upload a file

You can upload content to a new or existing file by using the DataLakeFileClient.upload_data method.

def upload_file_to_directory(self, directory_client: DataLakeDirectoryClient, local_path: str, file_name: str):
    file_client = directory_client.get_file_client(file_name)

    with open(file=os.path.join(local_path, file_name), mode="rb") as data:
        file_client.upload_data(dataW, overwrite=True)

Sample

The following code sample lists the directory contents of any folder in OneLake.

#Install the correct packages first in the same folder as this file. 
#pip install azure-storage-file-datalake azure-identity

from azure.storage.filedatalake import (
    DataLakeServiceClient,
    DataLakeDirectoryClient,
    FileSystemClient
)
from azure.identity import DefaultAzureCredential

# Set your account, workspace, and item path here
ACCOUNT_NAME = "onelake"
WORKSPACE_NAME = "<myWorkspace>"
DATA_PATH = "<myLakehouse>.Lakehouse/Files/<path>"

def main():
    #Create a service client using the default Azure credential

    account_url = f"https://{ACCOUNT_NAME}.dfs.fabric.microsoft.com"
    token_credential = DefaultAzureCredential()
    service_client = DataLakeServiceClient(account_url, credential=token_credential)

    #Create a file system client for the workspace
    file_system_client = service_client.get_file_system_client(WORKSPACE_NAME)
    
    #List a directory within the filesystem
    paths = file_system_client.get_paths(path=DATA_PATH)

    for path in paths:
        print(path.name + '\n')

if __name__ == "__main__":
    main()

To run this sample, save the preceding code into a file listOneLakeDirectory.py and run the following command in the same directory. Remember to replace the workspace and path with your own values in the example.

python listOneLakeDirectory.py