About Disaster Recovery on Event Hubs + Functions

Jona 475 Reputation points
2024-06-06T17:32:30.5533333+00:00

Hi,

I'm facing an architectural challenge. I have:

  • A Blob Storage where files will be landing
  • A Function environment where one functions gets executed every time a file is uploaded. This is done via Event Grid notification source on Blob trigger
  • An Event Hubs namespace where a destination event hub resides.

I'm already familiar to disaster recovery options, geo redundancy/replication. However, I need to implement some logics in a PoC to achieve some requirements from architect team.

The requirement:

  • We have a stream of files with an unpredictible workloads and peaks. Sometimes, files arrives with a size of some KB's a others with dozens of MB's. The max file size that has arrived is 150MB
  • We have only 5 TU in our Standard Event Hub namespace
  • During the PoC, we will "shut down" Event Hubs

So, I've implemented (or I have in mind):

  1. A Function that takes the landed file anc checks the file size. If needed, the file is splitted on smaller pieces to fit the 1MB publication limit on Event Hub. Those pieces are store on a different container for a another Function to sent it to EVent Hub
  2. The Function sending the pieces to Event Hub must check that Event Hub is up and running. I know that this can be mitigated with DR and Redundancy features. However, I've been asked to check the availabilty of Event Hub ... ¿how can this be achieved?
  3. If Event Hub is not up, in order no not queu many Functions, I want to implement that the files/pieces to be uploaded to another container. That way I terminate the Function execution. If a Function has an output binding (event hub) and the output service is not up... ¿how can I terminate the Function?, ¿What the return value should be?

Any opinion would be appreciated

Regards

Jona

Azure Functions
Azure Functions
An Azure service that provides an event-driven serverless compute platform.
5,127 questions
Azure Event Hubs
Azure Event Hubs
An Azure real-time data ingestion service.
647 questions
0 comments No comments
{count} votes

3 answers

Sort by: Most helpful
  1. Pieter de Bruin 321 Reputation points Microsoft Employee
    2024-06-14T08:18:13.51+00:00

    Hi Jona,

    Let me see if I understand this correctly: You want to split large files into smaller pieces to pass to event hub. Do you have a good reason to send the blobs to event hub? In most cases, event grid is used for low-latency triggering of a function when a new blog is received, like I think you do.

    https://learn.microsoft.com/azure/azure-functions/functions-event-grid-blob-trigger Next, if you want to pass this event on, you can publish a message to event hub https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-event-hubs-output The component that received the message can then get the blog directly from storage. Passing on a message with a pointer to a blob is much more reliable than cutting, sending and stitching 150 times 1MB messages.

    If you want to increase reliability of your solution, check out https://learn.microsoft.com/azure/azure-functions/performance-reliability and https://learn.microsoft.com/azure/azure-functions/functions-bindings-error-pages. In short, make sure you can retry a failed operation. And you can add queues or durable functions if individual events are important.

    Hope that helps,

    Pieter

    0 comments No comments

  2. Jona 475 Reputation points
    2024-06-17T16:34:16.58+00:00

    To explain myself better, this is the design we will try. The main non-functional aspect is that we can control the stream or the file size when arriving at Landing container, and Event Hubs will be provisioned with the Standard SKU on 1TU of capacity.

    Screenshot 2024-06-17 123255

    Regards

    0 comments No comments

  3. Jona 475 Reputation points
    2024-06-23T06:30:46.3733333+00:00

    Just walking around if somebody can give an opinion ...

    Regards

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.