Download large amounts of random data from Azure storage

This tutorial is part three of a series. This tutorial shows you how to download large amounts of data from Azure storage.

In part three of the series, you learn how to:

  • Update the application
  • Run the application
  • Validate the number of connections

Prerequisites

To complete this tutorial, you must have completed the previous Storage tutorial: Upload large amounts of random data in parallel to Azure storage.

Remote into your virtual machine

To create a remote desktop session with the virtual machine, use the following command on your local machine. Replace the IP address with the publicIPAddress of your virtual machine. When prompted, enter the credentials used when creating the virtual machine.

mstsc /v:<publicIpAddress>

Update the application

In the previous tutorial, you only uploaded files to the storage account. Open D:\git\storage-dotnet-perf-scale-app\Program.cs in a text editor. Replace the Main method with the following sample. This example comments out the upload task and uncomments the download task and the task to delete the content in the storage account when complete.

public static void Main(string[] args)
{
    Console.WriteLine("Azure Blob storage performance and scalability sample");
    // Set threading and default connection limit to 100 to 
    // ensure multiple threads and connections can be opened.
    // This is in addition to parallelism with the storage 
    // client library that is defined in the functions below.
    ThreadPool.SetMinThreads(100, 4);
    ServicePointManager.DefaultConnectionLimit = 100; // (Or More)

    bool exception = false;
    try
    {
        // Call the UploadFilesAsync function.
        // await UploadFilesAsync();

        // Uncomment the following line to enable downloading of files from the storage account.
        // This is commented out initially to support the tutorial at 
        // https://learn.microsoft.com/azure/storage/blobs/storage-blob-scalable-app-download-files
        await DownloadFilesAsync();
    }
    catch (Exception ex)
    {
        Console.WriteLine(ex.Message);
        exception = true;
    }
    finally
    {
        // The following function will delete the container and all files contained in them.
        // This is commented out initially as the tutorial at 
        // https://learn.microsoft.com/azure/storage/blobs/storage-blob-scalable-app-download-files
        // has you upload only for one tutorial and download for the other.
        if (!exception)
        {
            // await DeleteExistingContainersAsync();
        }
        Console.WriteLine("Press any key to exit the application");
        Console.ReadKey();
    }
}

After the application has been updated, you need to build the application again. Open a Command Prompt and navigate to D:\git\storage-dotnet-perf-scale-app. Rebuild the application by running dotnet build as seen in the following example:

dotnet build

Run the application

Now that the application has been rebuilt it is time to run the application with the updated code. If not already open, open a Command Prompt and navigate to D:\git\storage-dotnet-perf-scale-app.

Type dotnet run to run the application.

dotnet run

The DownloadFilesAsync task is shown in the following example:

The application reads the containers located in the storage account specified in the storageconnectionstring. It iterates through the blobs using the GetBlobs method and downloads them to the local machine using the DownloadToAsync method.

private static async Task DownloadFilesAsync()
{
    BlobServiceClient blobServiceClient = GetBlobServiceClient();

    // Path to the directory to upload
    string downloadPath = Directory.GetCurrentDirectory() + "\\download\\";
    Directory.CreateDirectory(downloadPath);
    Console.WriteLine($"Created directory {downloadPath}");

    // Specify the StorageTransferOptions
    var options = new StorageTransferOptions
    {
        // Set the maximum number of workers that 
        // may be used in a parallel transfer.
        MaximumConcurrency = 8,

        // Set the maximum length of a transfer to 50MB.
        MaximumTransferSize = 50 * 1024 * 1024
    };

    List<BlobContainerClient> containers = new List<BlobContainerClient>();

    foreach (BlobContainerItem container in blobServiceClient.GetBlobContainers())
    {
        containers.Add(blobServiceClient.GetBlobContainerClient(container.Name));
    }

    // Start a timer to measure how long it takes to download all the files.
    Stopwatch timer = Stopwatch.StartNew();

    // Download the blobs
    try
    {
        int count = 0;

        // Create a queue of tasks that will each upload one file.
        var tasks = new Queue<Task<Response>>();

        foreach (BlobContainerClient container in containers)
        {                     
            // Iterate through the files
            foreach (BlobItem blobItem in container.GetBlobs())
            {
                string fileName = downloadPath + blobItem.Name;
                Console.WriteLine($"Downloading {blobItem.Name} to {downloadPath}");

                BlobClient blob = container.GetBlobClient(blobItem.Name);

                // Add the download task to the queue
                tasks.Enqueue(blob.DownloadToAsync(fileName, default, options));
                count++;
            }
        }

        // Run all the tasks asynchronously.
        await Task.WhenAll(tasks);

        // Report the elapsed time.
        timer.Stop();
        Console.WriteLine($"Downloaded {count} files in {timer.Elapsed.TotalSeconds} seconds");
    }
    catch (RequestFailedException ex)
    {
        Console.WriteLine($"Azure request failed: {ex.Message}");
    }
    catch (DirectoryNotFoundException ex)
    {
        Console.WriteLine($"Error parsing files in the directory: {ex.Message}");
    }
    catch (Exception ex)
    {
        Console.WriteLine($"Exception: {ex.Message}");
    }
}

Validate the connections

While the files are being downloaded, you can verify the number of concurrent connections to your storage account. Open a console window and type netstat -a | find /c "blob:https". This command shows the number of connections that are currently opened. As you can see from the following example, over 280 connections were open when downloading files from the storage account.

C:\>netstat -a | find /c "blob:https"
289

C:\>

Next steps

In part three of the series, you learned about downloading large amounts of data from a storage account, including how to:

  • Run the application
  • Validate the number of connections

Go to part four of the series to verify throughput and latency metrics in the portal.