BlobModifiedWhileReading exception when writing to azure data lake storage from multiple k8s pods

Question

BlobModifiedWhileReading exception when writing to azure data lake storage from multiple k8s pods

Anonymous

Hello, We are using Azure data lake storage gen 2 for storing data consumed and processed in our microservices. The data written is in JSONL format i.e. JSON messages separated by newline character. The data is being stored in a blob file which is appended with new messages using AppendBlobClient.appendBlock method of Java SDK. The blob file is only written and is not read however before creating the file there is a check: BlobClient blobClient = blobContainerClient.getBlobClient(String.format("%s/%s", dirName, fileName)); blobClient.getAppendBlobClient().createIfNotExists The appendblob is then appended using appendBlock method. This runs fine with 1 microservice instance in 1 pod. When we create 2 or more pods, one of the pod gets below exception and stops writing to append blob: Can someone help why this error being thrown when none of the pods is reading from blob but only writing (appending) to the blob? {"@timestamp":"2024-02-28T03:57:25.394Z","@version":"1","message":"LakeLocator.process: error","logger_name":"com.avaya.analytics.kafka.AdminTopicConsumer","thread_name":"Thread-5","level":"ERROR","level_value":40000,"stack_trace":"com.azure.storage.blob.models.BlobStorageException: Status code 409, "<?xml version="1.0" encoding="utf-8"?><Error><Code>BlobModifiedWhileReading</Code><Message>The blob has been modified while being read.\nRequestId:ac22d9bf-b01e-0067-3efa-695902000000\nTime:2024-02-28T03:57:25.3473746Z</Message></Error>"\n\tat java.base/java.lang.invoke.MethodHandle.invokeWithArguments(Unknown Source)\n\tat com.azure.core.implementation.http.rest.ResponseExceptionConstructorCache.invoke(ResponseExceptionConstructorCache.java:56)\n\tat com.azure.core.implementation.http.rest.RestProxyBase.instantiateUnexpectedException(RestProxyBase.java:356)\n\tat com.azure.core.implementation.http.rest.AsyncRestProxy.lambda$ensureExpectedStatus$1(AsyncRestProxy.java:128)\n\tat reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:113)\n\tat reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2400)\n\tat reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.request(FluxMapFuseable.java:171)\n\tat reactor.core.publisher.Operators$MultiSubscriptionSubscriber.set(Operators.java:2196)\n\tat reactor.core.publisher.Operators$MultiSubscriptionSubscriber.onSubscribe(Operators.java:2070)\n\tat reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onSubscribe(FluxMapFuseable.java:96)\n\tat reactor.core.publisher.MonoJust.subscribe(MonoJust.java:55)\n\tat reactor.core.publisher.InternalMonoOperator.subscribe(InternalMonoOperator.java:64)\n\tat reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:157)\n\tat reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:129)\n\tat reactor.core.publisher.FluxHide$SuppressFuseableSubscriber.onNext(FluxHide.java:137)\n\tat reactor.core.publisher.FluxMapFuseable$MapFuseableSubscriber.onNext(FluxMapFuseable.java:129)

Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2024-02-28T18:59:04.9233333+00:00

Hello Deole, Pushkar (Pushkar),

The error "the blob is modified while being read", can happen when multiple instances are trying to append to the same blob simultaneously.

In your case, it seems like multiple pods are trying to write to the same append blob at the same time, which is causing the error. The append blob is designed to allow multiple writers to append data to the same blob, but it is not designed to allow multiple writers to write to the same block of the blob at the same time.

To avoid this error, you can use a distributed lock mechanism to ensure that only one pod is writing to the append blob at any given time. For example, you can use Azure Blob Storage's lease mechanism to acquire a lease on the append blob before writing to it, and release the lease after writing is complete.

https://learn.microsoft.com/en-us/rest/api/storageservices/lease-blob?tabs=microsoft-entra-id https://learn.microsoft.com/en-us/dotnet/api/azure.storage.blobs.specialized.blobleaseclient?view=azure-dotnet

I hope this helps. Please let me know if you have any further questions.
Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2024-03-04T20:35:56.8833333+00:00

Hello Deole, Pushkar (Pushkar),

I am checking to see if you had a chance to look into the above response. Please let me know if you have any further questions.
Anonymous

2024-03-05T04:44:52.21+00:00

Hello,

Yes, I reviewed the answer and it seems the distributed lock mechanism would hamper the performance and it may not give us the required level of concurrency and throughput that we expect by adding multiple pods to the topology.

Is there any other alternative to make sure multiple pods don't write to same block, thus achieving the required level of concurrency and throughput?

Accepted answer

0 additional answers

Your answer

Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2024-02-28T18:59:04.9233333+00:00

Hello Deole, Pushkar (Pushkar),

The error "the blob is modified while being read", can happen when multiple instances are trying to append to the same blob simultaneously.

In your case, it seems like multiple pods are trying to write to the same append blob at the same time, which is causing the error. The append blob is designed to allow multiple writers to append data to the same blob, but it is not designed to allow multiple writers to write to the same block of the blob at the same time.

To avoid this error, you can use a distributed lock mechanism to ensure that only one pod is writing to the append blob at any given time. For example, you can use Azure Blob Storage's lease mechanism to acquire a lease on the append blob before writing to it, and release the lease after writing is complete.

https://learn.microsoft.com/en-us/rest/api/storageservices/lease-blob?tabs=microsoft-entra-id https://learn.microsoft.com/en-us/dotnet/api/azure.storage.blobs.specialized.blobleaseclient?view=azure-dotnet

I hope this helps. Please let me know if you have any further questions.
Bhargava-MSFT 31,261 Reputation points Microsoft Employee Moderator

2024-03-04T20:35:56.8833333+00:00

Hello Deole, Pushkar (Pushkar),

I am checking to see if you had a chance to look into the above response. Please let me know if you have any further questions.
Anonymous

2024-03-05T04:44:52.21+00:00

Hello,

Yes, I reviewed the answer and it seems the distributed lock mechanism would hamper the performance and it may not give us the required level of concurrency and throughput that we expect by adding multiple pods to the topology.

Is there any other alternative to make sure multiple pods don't write to same block, thus achieving the required level of concurrency and throughput?

Answer 1

Bhargava-MSFT 31,261 Microsoft Employee Moderator

another option is to use a partitioning scheme to distribute the data across multiple append blobs. For example, you can partition the data based on a specific field or key, and write each partition to a separate append blob. This can help distribute the load across multiple append blobs and avoid conflicts when multiple pods are writing to the same block.

Share via

BlobModifiedWhileReading exception when writing to azure data lake storage from multiple k8s pods

0 additional answers

Your answer