Download large amounts of random data from Azure storage
This tutorial is part three of a series. This tutorial shows you how to download large amounts of data from Azure storage.
In part three of the series, you learn how to:
- Update the application
- Run the application
- Validate the number of connections
Prerequisites
To complete this tutorial, you must have completed the previous Storage tutorial: Upload large amounts of random data in parallel to Azure storage.
Remote into your virtual machine
To create a remote desktop session with the virtual machine, use the following command on your local machine. Replace the IP address with the publicIPAddress of your virtual machine. When prompted, enter the credentials used when creating the virtual machine.
mstsc /v:<publicIpAddress>
Update the application
In the previous tutorial, you only uploaded files to the storage account. Open D:\git\storage-dotnet-perf-scale-app\Program.cs
in a text editor. Replace the Main
method with the following sample. This example comments out the upload task and uncomments the download task and the task to delete the content in the storage account when complete.
public static void Main(string[] args)
{
Console.WriteLine("Azure Blob storage performance and scalability sample");
// Set threading and default connection limit to 100 to
// ensure multiple threads and connections can be opened.
// This is in addition to parallelism with the storage
// client library that is defined in the functions below.
ThreadPool.SetMinThreads(100, 4);
ServicePointManager.DefaultConnectionLimit = 100; // (Or More)
bool exception = false;
try
{
// Call the UploadFilesAsync function.
// await UploadFilesAsync();
// Uncomment the following line to enable downloading of files from the storage account.
// This is commented out initially to support the tutorial at
// https://learn.microsoft.com/azure/storage/blobs/storage-blob-scalable-app-download-files
await DownloadFilesAsync();
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
exception = true;
}
finally
{
// The following function will delete the container and all files contained in them.
// This is commented out initially as the tutorial at
// https://learn.microsoft.com/azure/storage/blobs/storage-blob-scalable-app-download-files
// has you upload only for one tutorial and download for the other.
if (!exception)
{
// await DeleteExistingContainersAsync();
}
Console.WriteLine("Press any key to exit the application");
Console.ReadKey();
}
}
After the application has been updated, you need to build the application again. Open a Command Prompt
and navigate to D:\git\storage-dotnet-perf-scale-app
. Rebuild the application by running dotnet build
as seen in the following example:
dotnet build
Run the application
Now that the application has been rebuilt it is time to run the application with the updated code. If not already open, open a Command Prompt
and navigate to D:\git\storage-dotnet-perf-scale-app
.
Type dotnet run
to run the application.
dotnet run
The DownloadFilesAsync
task is shown in the following example:
The application reads the containers located in the storage account specified in the storageconnectionstring. It iterates through the blobs using the GetBlobs method and downloads them to the local machine using the DownloadToAsync method.
private static async Task DownloadFilesAsync()
{
BlobServiceClient blobServiceClient = GetBlobServiceClient();
// Path to the directory to upload
string downloadPath = Directory.GetCurrentDirectory() + "\\download\\";
Directory.CreateDirectory(downloadPath);
Console.WriteLine($"Created directory {downloadPath}");
// Specify the StorageTransferOptions
var options = new StorageTransferOptions
{
// Set the maximum number of workers that
// may be used in a parallel transfer.
MaximumConcurrency = 8,
// Set the maximum length of a transfer to 50MB.
MaximumTransferSize = 50 * 1024 * 1024
};
List<BlobContainerClient> containers = new List<BlobContainerClient>();
foreach (BlobContainerItem container in blobServiceClient.GetBlobContainers())
{
containers.Add(blobServiceClient.GetBlobContainerClient(container.Name));
}
// Start a timer to measure how long it takes to download all the files.
Stopwatch timer = Stopwatch.StartNew();
// Download the blobs
try
{
int count = 0;
// Create a queue of tasks that will each upload one file.
var tasks = new Queue<Task<Response>>();
foreach (BlobContainerClient container in containers)
{
// Iterate through the files
foreach (BlobItem blobItem in container.GetBlobs())
{
string fileName = downloadPath + blobItem.Name;
Console.WriteLine($"Downloading {blobItem.Name} to {downloadPath}");
BlobClient blob = container.GetBlobClient(blobItem.Name);
// Add the download task to the queue
tasks.Enqueue(blob.DownloadToAsync(fileName, default, options));
count++;
}
}
// Run all the tasks asynchronously.
await Task.WhenAll(tasks);
// Report the elapsed time.
timer.Stop();
Console.WriteLine($"Downloaded {count} files in {timer.Elapsed.TotalSeconds} seconds");
}
catch (RequestFailedException ex)
{
Console.WriteLine($"Azure request failed: {ex.Message}");
}
catch (DirectoryNotFoundException ex)
{
Console.WriteLine($"Error parsing files in the directory: {ex.Message}");
}
catch (Exception ex)
{
Console.WriteLine($"Exception: {ex.Message}");
}
}
Validate the connections
While the files are being downloaded, you can verify the number of concurrent connections to your storage account. Open a console window and type netstat -a | find /c "blob:https"
. This command shows the number of connections that are currently opened. As you can see from the following example, over 280 connections were open when downloading files from the storage account.
C:\>netstat -a | find /c "blob:https"
289
C:\>
Next steps
In part three of the series, you learned about downloading large amounts of data from a storage account, including how to:
- Run the application
- Validate the number of connections
Go to part four of the series to verify throughput and latency metrics in the portal.
Feedback
Submit and view feedback for