Редактиране

Споделяне чрез


Use .NET to manage directories and files in Azure Data Lake Storage

This article shows you how to use .NET to create and manage directories and files in storage accounts that have a hierarchical namespace.

To learn about how to get, set, and update the access control lists (ACL) of directories and files, see Use .NET to manage ACLs in Azure Data Lake Storage.

Package (NuGet) | Samples | API reference | Gen1 to Gen2 mapping | Give Feedback

Prerequisites

  • An Azure subscription. See Get Azure free trial.

  • A storage account that has hierarchical namespace enabled. Follow these instructions to create one.

Set up your project

To get started, install the Azure.Storage.Files.DataLake NuGet package.

For more information about how to install NuGet packages, see Install and manage packages in Visual Studio using the NuGet Package Manager.

Then, add these using statements to the top of your code file.

using Azure;
using Azure.Storage.Files.DataLake;
using Azure.Storage.Files.DataLake.Models;
using Azure.Storage;
using System.IO;

Note

Multi-protocol access on Data Lake Storage enables applications to use both Blob APIs and Data Lake Storage Gen2 APIs to work with data in storage accounts with hierarchical namespace (HNS) enabled. When working with capabilities unique to Data Lake Storage Gen2, such as directory operations and ACLs, use the Data Lake Storage Gen2 APIs, as shown in this article.

When choosing which APIs to use in a given scenario, consider the workload and the needs of your application, along with the known issues and impact of HNS on workloads and applications.

Authorize access and connect to data resources

To work with the code examples in this article, you need to create an authorized DataLakeServiceClient instance that represents the storage account. You can authorize a DataLakeServiceClient object using Microsoft Entra ID, an account access key, or a shared access signature (SAS).

You can use the Azure identity client library for .NET to authenticate your application with Microsoft Entra ID.

Create a DataLakeServiceClient instance and pass in a new instance of the DefaultAzureCredential class.

public static DataLakeServiceClient GetDataLakeServiceClient(string accountName)
{
    string dfsUri = $"https://{accountName}.dfs.core.windows.net";

    DataLakeServiceClient dataLakeServiceClient = new DataLakeServiceClient(
        new Uri(dfsUri),
        new DefaultAzureCredential());

    return dataLakeServiceClient;
}

To learn more about using DefaultAzureCredential to authorize access to data, see How to authenticate .NET applications with Azure services.

Create a container

A container acts as a file system for your files. You can create a container by using the following method:

The following code example creates a container and returns a DataLakeFileSystemClient object for later use:

public async Task<DataLakeFileSystemClient> CreateFileSystem(
    DataLakeServiceClient serviceClient,
    string fileSystemName)
{
    return await serviceClient.CreateFileSystemAsync(fileSystemName);
}

Create a directory

You can create a directory reference in the container by using the following method:

The following code example adds a directory to a container, then adds a subdirectory and returns a DataLakeDirectoryClient object for later use:

public async Task<DataLakeDirectoryClient> CreateDirectory(
    DataLakeFileSystemClient fileSystemClient,
    string directoryName,
    string subdirectoryName)
{
    DataLakeDirectoryClient directoryClient =
        await fileSystemClient.CreateDirectoryAsync(directoryName);

    return await directoryClient.CreateSubDirectoryAsync(subdirectoryName);
}

Rename or move a directory

You can rename or move a directory by using the following method:

Pass the path of the desired directory as a parameter. The following code example shows how to rename a subdirectory:

public async Task<DataLakeDirectoryClient> RenameDirectory(
    DataLakeFileSystemClient fileSystemClient,
    string directoryPath,
    string subdirectoryName,
    string subdirectoryNameNew)
{
    DataLakeDirectoryClient directoryClient =
        fileSystemClient.GetDirectoryClient(string.Join('/', directoryPath, subdirectoryName));

    return await directoryClient.RenameAsync(string.Join('/', directoryPath, subdirectoryNameNew));
}

The following code example shows how to move a subdirectory from one directory to a different directory:

public async Task<DataLakeDirectoryClient> MoveDirectory(
    DataLakeFileSystemClient fileSystemClient,
    string directoryPathFrom,
    string directoryPathTo,
    string subdirectoryName)
{
    DataLakeDirectoryClient directoryClient =
         fileSystemClient.GetDirectoryClient(string.Join('/', directoryPathFrom, subdirectoryName));

    return await directoryClient.RenameAsync(string.Join('/', directoryPathTo, subdirectoryName));
}

Upload a file to a directory

You can upload content to a new or existing file by using the following method:

The following code example shows how to upload a local file to a directory using the UploadAsync method:

public async Task UploadFile(
    DataLakeDirectoryClient directoryClient,
    string fileName,
    string localPath)
{
    DataLakeFileClient fileClient = 
        directoryClient.GetFileClient(fileName);

    FileStream fileStream = File.OpenRead(localPath);

    await fileClient.UploadAsync(content: fileStream, overwrite: true);
}

You can use this method to create and upload content to a new file, or you can set the overwrite parameter to true to overwrite an existing file.

Append data to a file

You can upload data to be appended to a file by using the following method:

The following code example shows how to append data to the end of a file using these steps:

public async Task AppendDataToFile(
    DataLakeDirectoryClient directoryClient,
    string fileName,
    Stream stream)
{
    DataLakeFileClient fileClient = 
        directoryClient.GetFileClient(fileName);

    long fileSize = fileClient.GetProperties().Value.ContentLength;

    await fileClient.AppendAsync(stream, offset: fileSize);

    await fileClient.FlushAsync(position: fileSize + stream.Length);
}

Download from a directory

The following code example shows how to download a file from a directory to a local file using these steps:

  • Create a DataLakeFileClient instance to represent the file that you want to download.
  • Use the DataLakeFileClient.ReadAsync method, then parse the return value to obtain a Stream object. Use any .NET file processing API to save bytes from the stream to a file.

This example uses a BinaryReader and a FileStream to save bytes to a file.

public async Task DownloadFile(
    DataLakeDirectoryClient directoryClient,
    string fileName,
    string localPath)
{
    DataLakeFileClient fileClient =
        directoryClient.GetFileClient(fileName);

    Response<FileDownloadInfo> downloadResponse = await fileClient.ReadAsync();

    BinaryReader reader = new BinaryReader(downloadResponse.Value.Content);

    FileStream fileStream = File.OpenWrite(localPath);

    int bufferSize = 4096;

    byte[] buffer = new byte[bufferSize];

    int count;

    while ((count = reader.Read(buffer, 0, buffer.Length)) != 0)
    {
        fileStream.Write(buffer, 0, count);
    }

    await fileStream.FlushAsync();

    fileStream.Close();
}

List directory contents

You can list directory contents by using the following method and enumerating the result:

Enumerating the paths in the result may make multiple requests to the service while fetching the values.

The following code example prints the names of each file that is located in a directory:

public async Task ListFilesInDirectory(
    DataLakeFileSystemClient fileSystemClient,
    string directoryName)
{
    IAsyncEnumerator<PathItem> enumerator =
        fileSystemClient.GetPathsAsync(directoryName).GetAsyncEnumerator();

    await enumerator.MoveNextAsync();

    PathItem item = enumerator.Current;

    while (item != null)
    {
        Console.WriteLine(item.Name);

        if (!await enumerator.MoveNextAsync())
        {
            break;
        }

        item = enumerator.Current;
    }

}

Delete a directory

You can delete a directory by using the following method:

The following code example shows how to delete a directory:

public async Task DeleteDirectory(
    DataLakeFileSystemClient fileSystemClient,
    string directoryName)
{
    DataLakeDirectoryClient directoryClient =
        fileSystemClient.GetDirectoryClient(directoryName);

    await directoryClient.DeleteAsync();
}

Restore a soft-deleted directory

You can use the Azure Storage client libraries to restore a soft-deleted directory. Use the following method to list deleted paths for a DataLakeFileSystemClient instance:

Use the following method to restore a soft-deleted directory:

The following code example shows how to list deleted paths and restore a soft-deleted directory:

public async Task RestoreDirectory(
    DataLakeFileSystemClient fileSystemClient,
    string directoryName)
{
    DataLakeDirectoryClient directoryClient =
        fileSystemClient.GetDirectoryClient(directoryName);

    // List deleted paths
    List<PathDeletedItem> deletedItems = new List<PathDeletedItem>();
    await foreach (PathDeletedItem deletedItem in fileSystemClient.GetDeletedPathsAsync(directoryName))
    {
        deletedItems.Add(deletedItem);
    }

    // Restore deleted directory
    Response<DataLakePathClient> restoreResponse = await fileSystemClient.UndeletePathAsync(
        deletedItems[0].Path,
        deletedItems[0].DeletionId);
}

If you rename the directory that contains the soft-deleted items, those items become disconnected from the directory. If you want to restore those items, you have to revert the name of the directory back to its original name or create a separate directory that uses the original directory name. Otherwise, you receive an error when you attempt to restore those soft-deleted items.

Create a user delegation SAS for a directory

To work with the code examples in this section, add the following using directive:

using Azure.Storage.Sas;

The following code example shows how to generate a user delegation SAS for a directory when a hierarchical namespace is enabled for the storage account:

async static Task<Uri> GetUserDelegationSasDirectory(DataLakeDirectoryClient directoryClient)
{
    try
    {
        // Get service endpoint from the directory URI.
        DataLakeUriBuilder dataLakeServiceUri = new DataLakeUriBuilder(directoryClient.Uri)
        {
            FileSystemName = null,
            DirectoryOrFilePath = null
        };

        // Get service client.
        DataLakeServiceClient dataLakeServiceClient =
            new DataLakeServiceClient(dataLakeServiceUri.ToUri(),
                                      new DefaultAzureCredential());

        // Get a user delegation key that's valid for seven days.
        // You can use the key to generate any number of shared access signatures 
        // over the lifetime of the key.
        Azure.Storage.Files.DataLake.Models.UserDelegationKey userDelegationKey =
            await dataLakeServiceClient.GetUserDelegationKeyAsync(DateTimeOffset.UtcNow,
                                                                  DateTimeOffset.UtcNow.AddDays(7));

        // Create a SAS token that's valid for seven days.
        DataLakeSasBuilder sasBuilder = new DataLakeSasBuilder()
        {
            // Specify the file system name and path, and indicate that
            // the client object points to a directory.
            FileSystemName = directoryClient.FileSystemName,
            Resource = "d",
            IsDirectory = true,
            Path = directoryClient.Path,
            ExpiresOn = DateTimeOffset.UtcNow.AddDays(7)
        };

        // Specify racwl permissions for the SAS.
        sasBuilder.SetPermissions(
            DataLakeSasPermissions.Read |
            DataLakeSasPermissions.Add |
            DataLakeSasPermissions.Create |
            DataLakeSasPermissions.Write |
            DataLakeSasPermissions.List
            );

        // Construct the full URI, including the SAS token.
        DataLakeUriBuilder fullUri = new DataLakeUriBuilder(directoryClient.Uri)
        {
            Sas = sasBuilder.ToSasQueryParameters(userDelegationKey,
                                                  dataLakeServiceClient.AccountName)
        };

        Console.WriteLine("Directory user delegation SAS URI: {0}", fullUri);
        Console.WriteLine();
        return fullUri.ToUri();
    }
    catch (Exception e)
    {
        Console.WriteLine(e.Message);
        throw;
    }
}

The following example tests the user delegation SAS created in the previous example from a simulated client application. If the SAS is valid, the client application is able to list file paths for this directory. If the SAS is invalid (for example, the SAS is expired), the Storage service returns error code 403 (Forbidden).

private static async Task ListFilesPathsWithDirectorySasAsync(Uri sasUri)
{
    // Try performing an operation using the directory SAS provided.

    // Create a directory client object for listing operations.
    DataLakeDirectoryClient dataLakeDirectoryClient = new DataLakeDirectoryClient(sasUri);

    // List file paths in the directory.
    try
    {
        // Call the listing operation and return pages of the specified size.
        var resultSegment = dataLakeDirectoryClient.GetPathsAsync(false, false).AsPages();

        // Enumerate the file paths returned with each page.
        await foreach (Page<PathItem> pathPage in resultSegment)
        {
            foreach (PathItem pathItem in pathPage.Values)
            {
                Console.WriteLine("File name: {0}", pathItem.Name);
            }
            Console.WriteLine();
        }

        Console.WriteLine();
        Console.WriteLine("Directory listing operation succeeded for SAS {0}", sasUri);
    }
    catch (RequestFailedException e)
    {
        // Check for a 403 (Forbidden) error. If the SAS is invalid, 
        // Azure Storage returns this error.
        if (e.Status == 403)
        {
            Console.WriteLine("Directory listing operation failed for SAS {0}", sasUri);
            Console.WriteLine("Additional error information: " + e.Message);
            Console.WriteLine();
        }
        else
        {
            Console.WriteLine(e.Message);
            Console.ReadLine();
            throw;
        }
    }
}

To learn more about creating a user delegation SAS, see Create a user delegation SAS with .NET.

Create a service SAS for a directory

In a storage account with a hierarchical namespace enabled, you can create a service SAS for a directory. To create the service SAS, make sure you have installed version 12.5.0 or later of the Azure.Storage.Files.DataLake package.

The following example shows how to create a service SAS for a directory:

private static Uri GetServiceSasUriForDirectory(DataLakeDirectoryClient directoryClient,
                                          string storedPolicyName = null)
{
    if (directoryClient.CanGenerateSasUri)
    {
        // Create a SAS token that's valid for one hour.
        DataLakeSasBuilder sasBuilder = new DataLakeSasBuilder()
        {
            // Specify the file system name, the path, and indicate that
            // the client object points to a directory.
            FileSystemName = directoryClient.FileSystemName,
            Resource = "d",
            IsDirectory = true,
            Path = directoryClient.Path,
        };

        // If no stored access policy is specified, create the policy
        // by specifying expiry and permissions.
        if (storedPolicyName == null)
        {
            sasBuilder.ExpiresOn = DateTimeOffset.UtcNow.AddHours(1);
            sasBuilder.SetPermissions(DataLakeSasPermissions.Read |
                DataLakeSasPermissions.Write |
                DataLakeSasPermissions.List);
        }
        else
        {
            sasBuilder.Identifier = storedPolicyName;
        }

        // Get the SAS URI for the specified directory.
        Uri sasUri = directoryClient.GenerateSasUri(sasBuilder);
        Console.WriteLine("SAS URI for ADLS directory is: {0}", sasUri);
        Console.WriteLine();

        return sasUri;
    }
    else
    {
        Console.WriteLine(@"DataLakeDirectoryClient must be authorized with Shared Key 
                          credentials to create a service SAS.");
        return null;
    }
}

To learn more about creating a service SAS, see Create a service SAS with .NET.

See also