How Azure Event Grid Retry policy works when consumer is down for few hours

suvra jyoti 151 Reputation points
2024-05-14T15:48:52.36+00:00

I have Azure Event Grid that is delivering events to a consumer(webhook endpoint). Lets say the consumer is down for 6 hours and is up after that . Given the retry schedule given below(refer : https://learn.microsoft.com/en-us/azure/event-grid/delivery-and-retry)

  • 10 seconds
  • 30 seconds
  • 1 minute
  • 5 minutes
  • 10 minutes
  • 30 minutes
  • 1 hour
  • 3 hours
  • 6 hours
  • Every 12 hours up to 24 hours

Will the events get delivered only in the 12th hour or it can be delivered before that as well. Based on documentation it looks like it can happen before the 12th hours as well depending on how the event grid skips the retries(for hours) and exponential backoff. Please confirm how it would work

Azure Event Grid
Azure Event Grid
An Azure event routing service designed for high availability, consistent performance, and dynamic scale.
330 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. MayankBargali-MSFT 69,761 Reputation points
    2024-05-17T10:53:08.45+00:00

    @suvra jyoti Thanks for reaching out.

    Will the events get delivered only in the 12th hour or it can be delivered before that as well.

    Event Grid retries delivery schedule on a best effort basis. Event Grid adds a small randomization to all retry steps and may opportunistically skip certain retries if an endpoint is consistently unhealthy, down for a long period, or appears to be overwhelmed.

    In your case if your events was sent when the consumer was down and your consumer was continue down for 6 hours then event grid will still try to deliver that particular event multiple times (10 sec, 30 sec and so on) as per the timeframe that is documented and there could be chances that it skips a few in between as your endpoint was unhealthy. But after the 6th hour try the next try will be on 12th hour for that particular event.

    If there are other events coming in between during your service down so the retry will be calculated as per the individual events retried (and some retry may be skipped as your service is down for a long time) and not as per your service down time.