Stream Analytics + ADLS output: Please ensure that blob file is not being modified by another process at the same time. Error: InvalidBlockList

Jona 335 Reputation points
2023-05-28T19:59:06.5933333+00:00

Suddenly, the Stream Analytics Job stopped writing to ADLS Storage

I have an Event Hub Namespace which connects to a Stream Analytics Job, which finally writes to an ADLS Storage. Then, the data is queried in SQL Synapse (Serverless)

Everything was fine in the dev & test:

I sent messages to Event Hub using Generate Data (Preview)

User's image

I run my queries on Synapse

User's image

The output files on ADLS are JSON lines, means that every message is stored in a blob, which make bigger every time since new message are appended to the same file.

User's image

Suddenly, I was sending message to Evet Hub via Azure Portal and a second after launching my query to see "my real time data". At some moment, sending new message wouldn't translate on files stored at Blob. This was rare and jump into the Stream Analytics Job to debug, an every thing seems to be on place.

Watching the activity log of the Stream Analytics Job, I realized there was something rare.

User's image

First Occurred: 5/28/2023 12:49:48 AM UTC | Resource Name: sporadic-contribution-adls-output-json | Message: Unable to upload blobs to storage because of invalid blob block ids. Please ensure that blob file 'event=sporadic_contribution/date=2023-05-28/hour=00/0_62b04999def54281be2fb6de7f556b63_1.json' is not being modified by another process at the same time. Blob storage error code: InvalidBlockList

I wonder if this error is related to the fact that I launch my SQL queries while the blob is getting updated on new entries coming from Event Hub

  1. ¿Is there any race condition? It should not. Besides, the SQL queries just read the Blob, not write
  2. While new entries are coming from Event Hub, and pushing to Blob (appended to the same file) via Stream Analytics, ¿The blob enters to and inconsistent state that conflicts with my SQL queries?
  3. ¿Sending messages to EventHub rapidly could cause concurrency problems on ADLS? Since all messages are write in the same file in a partition.

Best regards

Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,427 questions
Azure Event Hubs
Azure Event Hubs
An Azure real-time data ingestion service.
556 questions
Azure Stream Analytics
Azure Stream Analytics
An Azure real-time analytics service designed for mission-critical workloads.
330 questions
{count} votes

1 answer

Sort by: Most helpful
  1. VasimTamboli 4,410 Reputation points
    2023-05-30T14:31:19.9066667+00:00

    Based on the error message you received, "Unable to upload blobs to storage because of invalid blob block ids. Please ensure that blob file 'event=sporadic_contribution/date=2023-05-28/hour=00/0_62b04999def54281be2fb6de7f556b63_1.json' is not being modified by another process at the same time," it suggests that there might be multiple processes or applications attempting to modify the same blob file simultaneously.

    To troubleshoot this issue, please consider the following steps:

    Ensure exclusive access: Make sure that there are no other processes or applications that are concurrently writing to or modifying the same blob file. Check if any other components or services in your architecture are interacting with the same blob file.

    Check Stream Analytics output settings: Review your Stream Analytics output settings to ensure that it is configured correctly and is not causing any conflicts. Verify that the output is correctly set to append data to the existing blob file.

    Consider data partitioning: If you are sending messages to Event Hub rapidly and all the messages are being written to the same file in a single partition, it may result in concurrency issues. To mitigate this, you can consider implementing data partitioning in your Stream Analytics job. By partitioning the data, you can distribute the load across multiple blobs or partitions and reduce the chances of conflicts.

    Check for overlapping triggers: If you have any other triggers or processes that operate on the same blob file, ensure that there are no overlapping or conflicting schedules. Verify that the processes are not attempting to modify the blob file simultaneously.

    By reviewing these points and ensuring exclusive access to the blob file, you can resolve the "InvalidBlockList" error and prevent concurrency issues when writing to the blob storage from Stream Analytics.

    0 comments No comments