Azure Storage account upload throttling for millions of files

Manish Gupta 0 Reputation points
2023-12-14T08:09:16.9666667+00:00

I have a VM on which I have mounted 4 disk of 4 TB each (Disk type is StandardSSD_LRS). Each disk is full. There are 2 million + files and around 16TB of data in total.

I want to store all that data in azure storage account in a single container. So I created a storage account and a SAS url. I am using this sas url to authorize the account.

This VM is only being used to upload this data to azure storage account and for no other work. Also the storage account does not contain any other data.

The issue is that after uploading certain amount of data, the upload performance drops drastically. I have used azcopy and rclone to upload and in both cases I see the similar trend. In azcopy case, I also saw an addition OOM issue which I did not see with rclone.

How can i get a consistent performance in uploading all this data? Or any other alternative I way can upload this much data to azure blob storage?

AzCopy command:

azcopy sync $SRC_ROOT $container_uri --recursive

Rclone command:

rclone copy $SRC_ROOT az:${storage_container_name} --config rclone.conf -v

Screenshot from 2023-12-14 13-29-33

Screenshot from 2023-12-14 13-29-50

Screenshot from 2023-12-14 13-30-18

Azure Storage Accounts
Azure Storage Accounts
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
3,363 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Anand Prakash Yadav 7,820 Reputation points Microsoft Vendor
    2023-12-15T09:18:43.0933333+00:00

    Hello Manish Gupta,

    Thank you for posting your query here!

    If you see a large file failing, try to limit the concurrent network connections or throughput limit depending on your specific case. We suggest you lower the performance drastically at first, observe whether it solved the initial problem, then ramp up the performance again until an overall balance is achieved.

    In cases of low-bandwidth or intermittent network conditions, you can try adjusting the following values/parameters:

    Set the environment variable AZCOPY_CONCURRENCY_VALUE to "AUTO". That helps a lot in low-bandwidth cases, since it results in AzCopy using far fewer connections than normal.

    Set the environment variable AZCOPY_CONCURRENT_FILES to 1 Adjusting the concurrency for file transfers can be beneficial, especially when dealing with large or small files. Lowering the concurrency (e.g., to 1 or a small number) might reduce the chances of failures and provide better control over the transfer process.

    Probably also give it an explicit speed cap, by adding this to the command line: --cap-mbps 2

    -- block-size-mb (in line parameter): to set the chuck size 

    For further information: https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-optimize
    https://learn.microsoft.com/en-us/troubleshoot/azure/azure-storage/storage-use-azcopy-troubleshoot

    Also, you have several options for moving data into or out of Azure Storage. Which option you choose depends on the size of your dataset and your network bandwidth. For more information, see Choose an Azure solution for data transfer.

    Also check Azure Data Box: It is a physical device provided by Microsoft for offline data transfer. This method is suitable for large data sets and can significantly reduce the time it takes to transfer the data.

    If you want to upload larger files to file share or blob storage, there is an Azure Storage Data Movement Library.

    Upload large amounts of random data in parallel to Azure Storage | Microsoft Learn

    Note: There are still more tool options which you can use to transfer data from on-premises to cloud, this article provides an overview of the data transfer solutions when you are transferring data periodically. Periodic data transfer over the network can be categorized as recurring at regular intervals or continuous data movement.
    Solutions for periodic data transfer

     

    Please let us know if you have any further queries. I’m happy to assist you further.


    Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

    1 person found this answer helpful.

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.