Azure Data Lake Storage Gen2: BlobContainerClient.UploadBlob

Venkatesh Bandaru 61 Reputation points Microsoft Employee
2023-11-14T16:48:23.4633333+00:00

I am using BlobContainerClient.UploadBlob to upload files to ADLS Gen2 Storage Account. In the below scenario If we upload files from two machines with uploadblob API, it has to create directories 2023/11/12/05. When both the machines are attempting upload at the same time, will it cause any issue or UploadBlob API handles this internally? I have tried to upload from two machines, at the same time. i didn't see any issue. But would like to cross check with you. Could you please clarify here ?

Machine1 :

2023/11/12/05/1.csv

Machine2

2023/11/12/05/2.csv

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,466 questions
Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,843 questions
0 comments No comments
{count} votes

Accepted answer
  1. Luis Arias 6,796 Reputation points
    2023-11-14T22:04:14.8+00:00

    Hi @Venkatesh Bandaru ,

    Creating directories and file by upload is one (Atomic Directory Manipulation) of the many benefits of herarchical namespace. https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-namespace

    One error that you can faced on future is about existing files and you can solve setting override parameter set to true.:

    A RequestFailedException will be thrown if the blob already exists. To overwrite an existing block blob, get a BlobClient by calling GetBlobClient(String), and then call UploadAsync(Stream, Boolean, CancellationToken) with the override parameter set to true. https://learn.microsoft.com/en-us/dotnet/api/azure.storage.blobs.blobcontainerclient.uploadblob?view=azure-dotnet

    Finally an extra information about ADLS, is about Azure blob file system has Hadoop file system logic that's why improve the operation with large amount of data.

    https://learn.microsoft.com/en-us/azure/hdinsight/overview-data-lake-storage-gen2

    https://www.databricks.com/glossary/hadoop-distributed-file-system-hdfs

    Cheers,

    Luis Arias


    If the information helped address your question, please Accept the answer.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.