BlobModifiedWhileReading exception when writing to azure data lake storage from multiple k8s pods

Deole, Pushkar (Pushkar) 60 Reputation points
2024-02-28T06:32:51.48+00:00

Hello, We are using Azure data lake storage gen 2 for storing data consumed and processed in our microservices. The data written is in JSONL format i.e. JSON messages separated by newline character. The data is being stored in a blob file which is appended with new messages using AppendBlobClient.appendBlock method of Java SDK. The blob file is only written and is not read however before creating the file there is a check: BlobClient blobClient = blobContainerClient.getBlobClient(String.format("%s/%s", dirName, fileName)); blobClient.getAppendBlobClient().createIfNotExists The appendblob is then appended using appendBlock method. This runs fine with 1 microservice instance in 1 pod. When we create 2 or more pods, one of the pod gets below exception and stops writing to append blob: Can someone help why this error being thrown when none of the pods is reading from blob but only writing (appending) to the blob? {"@timestamp":"2024-02-28T03:57:25.394Z","@version":"1","message":"LakeLocator.process: error","logger_name":"com.avaya.analytics.kafka.AdminTopicConsumer","thread_name":"Thread-5","level":"ERROR","level_value":40000,"stack_trace":"com.azure.storage.blob.models.BlobStorageException: Status code 409, "<?xml version="1.0" encoding="utf-8"?><Error><Code>BlobModifiedWhileReading</Code><Message>The blob has been modified while being read.\nRequestId:ac22d9bf-b01e-0067-3efa-695902000000\nTime:2024-02-28T03:57:25.3473746Z</Message></Error>"\n\tat java.base/java.lang.invoke.MethodHandle.invokeWithArguments(Unknown Source)\n\tat com.azure.core.implementation.http.rest.ResponseExceptionConstructorCache.invoke(ResponseExceptionConstructorCache.java:56)\n\tat com.azure.core.implementation.http.rest.RestProxyBase.instantiateUnexpectedException(RestProxyBase.java:356)\n\tat com.azure.core.implementation.http.rest.AsyncRestProxy.lambda$ensureExpectedStatus$1(AsyncRestProxy.java:128)\n\tat reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:113)\n\tat reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2400)\n\tat reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.request(FluxMapFuseable.java:171)\n\tat reactor.core.publisher.Operators$MultiSubscriptionSubscriber.set(Operators.java:2196)\n\tat reactor.core.publisher.Operators$MultiSubscriptionSubscriber.onSubscribe(Operators.java:2070)\n\tat reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onSubscribe(FluxMapFuseable.java:96)\n\tat reactor.core.publisher.MonoJust.subscribe(MonoJust.java:55)\n\tat reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:64)\n\tat reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:157)\n\tat reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:129)\n\tat reactor.core.publisher.FluxHide$SuppressFuseableSubscriber.onNext(FluxHide.java:137)\n\tat reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:129)

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,466 questions
{count} votes

Accepted answer
  1. Bhargava-MSFT 30,896 Reputation points Microsoft Employee
    2024-03-07T00:28:24.0666667+00:00

    another option is to use a partitioning scheme to distribute the data across multiple append blobs. For example, you can partition the data based on a specific field or key, and write each partition to a separate append blob. This can help distribute the load across multiple append blobs and avoid conflicts when multiple pods are writing to the same block.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.