How to prevent data loss in event hubs with azure functions

Davi Sena 41 Reputation points

In Event Hubs with Azure Function.

Q1: Is there any way I could configure the system, so that a batch is marked by new checkpoint if and only if it is the batch processing was successful (no errors).

Q2: If the answer for Q1 yes, is it possible that while an azure function is processing some batch, another function could be processing the same batch?

Q3: If the answer for Q1 is yes, why would anyone set batchFrequencyCheckpoint to be different than 1, from my understanding batchFrequencyCheckpoint > 1 only makes sense to prevent loss of data, but if we do the procedure described in Q1 we would prevent loss of data.

Azure Functions
Azure Functions
An Azure service that provides an event-driven serverless compute platform.
4,407 questions
Azure Event Hubs
Azure Event Hubs
An Azure real-time data ingestion service.
572 questions
{count} votes

Accepted answer
  1. MartinJaffer-MSFT 26,041 Reputation points

    Hello @Davi Sena ,
    Thanks for the question and using MS Q&A platform.

    Are you coming from this article link from the architecture center articles?
    That article has links to the legacy "Event Processor Host SDK" as opposed to the new "Event Processor (Client) SDK". The new one should answer your questions.

    Q1 yes

    Q2 Yes, there are 2 categories this can happen in. You only need to really worry about one of them. Source for below excerpt. As EventProcessor clients try to work together and distribute partitions among them, adding more instances to an already-establish pool of EventProcessors may read an event currently being used by another, when it tries to steal a partition.

    If more than one EventProcessorClient is configured to process an Event Hub, belongs to the same consumer group, and make use of the same Blob Storage container, those processors will collaborate using Blob storage to share responsibility for processing the partitions of the Event Hub. Each EventProcessorClient will claim ownership of partitions until each had an equal share; the processors will ensure that each partition belongs to only a single processor. As processors are added or removed from the group, the partitions will be redistributed to keep the work even.

    An important call-out is that Event Hubs has an at-least-once delivery guarantee; it is highly recommended to ensure that your processing is resilient to event duplication in whatever way is appropriate for your application scenarios.

    This can be observed when a processor is starting up, as it will attempt to claim ownership of partitions by taking those that do not currently have owners. In the case where a processor isn’t able to reach its fair share by claiming unowned partitions, it will attempt to steal ownership from other processors. During this time, the new owner will begin reading from the last recorded checkpoint. At the same time, the old owner may be dispatching the events that it last read to the handler for processing; it will not understand that ownership has changed until it attempts to read the next set of events from the Event Hubs service.

    As a result, you are likely to see some duplicate events being processed when EventProcessorClients join or leave the consumer group, which will subside when the processors have reached a stable state with respect to load balancing. The duration of that window will differ depending on the configuration of your processor and your checkpointing strategy.

    The other category, you do not have to worry about by design.
    A message is placed into exactly one (1) partition. Each Consumer Group gets its own checkpoint in that Partition. This means you could have 2 Event Processors read the same message because they are both reading the same partition. However, Consumer Groups are meant to separate functionality, each with a different job.
    For example, suppose our messages contain reservations for a concert with limited seating. You want to do 2 things, send out confirmation replies, and add the person to your marketing/sales/advertising list. The two activities take different amounts of time. You could assign each task to a consumer group. This way when the seating limit is reached, you can let the front-end app know tickets are sold out. This is without waiting for the other task to finish. Also, the other task can then continue reading more messages.

    This excerpt from latest SDK for event processor should answer Q3.

    When the checkpoint is performed to mark an event as processed, an entry in checkpoint store is added or updated with the event's offset and sequence number. Users should decide the frequency of updating the checkpoint. Updating after each successfully processed event can have performance and cost implications as it triggers a write operation to the underlying checkpoint store. Also, checkpointing every single event is indicative of a queued messaging pattern for which a Service Bus queue might be a better option than an event hub. The idea behind Event Hubs is that you get "at least once" delivery at great scale. By making your downstream systems idempotent, it's easy to recover from failures or restarts that result in the same events being received multiple times.

    Please do let me if you have any queries.



    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators
    0 comments No comments

0 additional answers

Sort by: Most helpful