Capture Event Hub without duplication

Davi Sena 41 Reputation points
2022-09-29T17:33:20.83+00:00

What is the best way to capture event hubs without duplication. Since event hubs delivery at least once. We can have the same event multiple times.
As you can imagine I don't want to store the same event hub twice. I just want to store each event hub once.

What is the best way to do this?

Azure Event Hubs
Azure Event Hubs
An Azure real-time data ingestion service.
556 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,358 questions
Azure Stream Analytics
Azure Stream Analytics
An Azure real-time analytics service designed for mission-critical workloads.
330 questions
{count} votes

Accepted answer
  1. Bruno Lucas 4,411 Reputation points MVP
    2022-09-30T03:38:51.017+00:00

    hi @Davi Sena

    Regarding duplication, there is no solution that will guarantee that because of the concept of at-least once delivery:

    Better read the official documentation to understand why:
    https://learn.microsoft.com/en-us/azure/architecture/serverless/event-hubs-functions/resilient-design#duplicate-events
    https://learn.microsoft.com/en-us/azure/architecture/serverless/event-hubs-functions/resilient-design#deduplication-techniques
    https://learn.microsoft.com/en-us/azure/azure-functions/functions-idempotent

    You can tune it up to reduce the chances of duplication. As this article mentions, you can reduce batch size, azure function parallelism, but those actions will slow down performance and still not guarantee the end of duplications

    https://medium.com/@jeffhollan/in-order-event-processing-with-azure-functions-bb661eb55428

    in other words, it's impossible to avoid duplications 100% and changing settings will compromise performance. the best approach is to have a field with some unique value like a pk or correlation id and add code to detect duplicates and prevent it from happening

    if your scenario is to complicated to implement a duplication detection and you don't need to process a massive amount of messages in a short period of time , check azure service bus : https://learn.microsoft.com/en-us/azure/service-bus-messaging/duplicate-detection

    Hope this helps!

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful