Azure datalake storage Gen2 : Maximum Number of Files/second that we can create

Venkatesh Bandaru 61 Reputation points Microsoft Employee
2022-07-28T13:56:11.177+00:00
  1. how many files I can create per second in ADLS2
  2. is it depends on file size?
  3. If it is 1kb file how many files I can create?
  4. What does it mean default max request rate is 20000/sec? Can i say we can create 20000 files per second.
Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,562 questions
Azure Storage
Azure Storage
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
3,535 questions
0 comments No comments
{count} votes

Accepted answer
  1. SaiKishor-MSFT 17,336 Reputation points
    2022-08-02T22:50:42.273+00:00

    @Venkatesh Bandaru Thank you for reaching out to Microsoft Q&A. I understand that you are having questions regarding ADLS Gen2.

    In terms of how many files we can ingest a sec, it depends on the file type, type of data, file size etc.,

    Please refer to this document for more information regarding Best Practices for using ADLS Gen2- https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-best-practices#consider-premium

    You can also consider using query accelerator if this is analytical type: https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-query-acceleration

    Here is another example of latency expectation: Latency in Blob storage - https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blobs-latency

    Regarding request rate, it should encompass all the requests made to the storage account per second which also includes the create request. And once again, the request rate and bandwidth achieved by your storage account depends upon the size of objects stored, the access patterns utilized, and the type of workload your application performs. Hope this helps.

    Please let us know if you have any further questions and we will be glad to assist you further. Thank you!

    Remember:

    Please accept an answer if correct. Original posters help the community find answers faster by identifying the correct answer. Here is how.

    Want a reminder to come back and check responses? Here is how to subscribe to a notification.

    0 comments No comments

4 additional answers

Sort by: Most helpful
  1. risolis 8,741 Reputation points
    2022-07-30T22:52:44.747+00:00

    Hello @Venkatesh Bandaru

    Thank you for posting this concern here

    I would like to a best effort to give you the most accurate answer for each of your question made above so, let me paste them back below:

    how many files I can create per second in ADLS2

    Azure Storage is scalable by design whether you access via Data Lake Storage Gen2 or Blob storage interfaces. It is able to store and serve many exabytes of data. This amount of storage is available with throughput measured in gigabits per second (Gbps) at high levels of input/output operations per second (IOPS). Processing is executed at near-constant per-request latencies that are measured at the service, account, and file levels.

    is it depends on file size?

    Yes, you are fully right on this one : )

    If it is 1kb file how many files I can create?

    Well on this one, this depends since remember that you might have encryption enabled , throughput and among others....

    What does it mean default max request rate is 20000/sec? Can i say we can create 20000 files per second.?

    You can double check this from the Monitoring Blade and then, go to metrics... So, you can check capacity, performance, failures and so on...

    You might need to keep in mind this info below as well:

    226462-image.png

    Furthermore, this is what I was talking about previously:

    226408-image.png

    I hope this was in some way helpful to get a better understanding of this.

    Cheers,

    Please "Accept the answer" if the information helped you. This will help us and others in the community as well.

    0 comments No comments

  2. risolis 8,741 Reputation points
    2022-07-31T18:25:29.343+00:00

    Hello @Venkatesh Bandaru

    I hope you are doing fine.

    I wonder if you were able to go through the previous answer posted before.

    Cheers,

    Please "Accept the answer" if the information helped you. This will help us and others in the community as well.

    0 comments No comments

  3. Venkatesh Bandaru 61 Reputation points Microsoft Employee
    2022-08-01T17:01:04.75+00:00

    Hi @risolis

    Thanks for taking your time. Actually I was OOF, hence I couldn't reply on time, sorry about that.

    I am still not clear on how many files i can create per second in ADLS Gen2? I need approximate number, that helps me to decide whether i need one storage account or more than one.

    let's say if I upload 100MB files to ADLS Gen2, how many parallel files i can upload to single account?

    regards
    Venkatesh B

    0 comments No comments

  4. risolis 8,741 Reputation points
    2022-08-02T03:46:42.017+00:00

    Hello @Venkatesh Bandaru

    Thank you for your reply.

    I want to provide 2 articles that can make easier to understand how this can be deduced as shown below:

    226960-image.png

    Refer to >>> https://docs.streamsets.com/platform-datacollector/latest/datacollector/UserGuide/Destinations/ADLS-G2-D.html

    Also, please review this other as well:

    227034-image.png

    Refer to >>> **

    https://docs.streamsets.com/platform-datacollector/latest/datacollector/UserGuide/Origins/ADLS-G2.html#concept_jbn_md3_ldb

    **

    As friendly reminder that there are some variables that can come in place like, Network throughput, Caching storage capacity, encryption and so on....

    Looking forward to your feedback,

    Best Regards,

    Please "Accept the answer" if the information helped you. This will help us and others in the community as well.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.