Offline TTL and loss of power technical question

Johan Karlsson 31 Reputation points
2021-03-29T06:07:22.547+00:00

I am looking for more information regarding how TTL is handled on the Edge Hub in different scenarios. By default, the TTL is 7200 seconds (2 hours). Consider the following scenarios and please correct me if I'm wrong here:

Sunshine case

  1. System is online
  2. A message appears on the Edge Hub at timestamp 1.
  3. The message is forwarded upstream immediately

Offline case
4. System is offline
5. A message appears on the Edge Hub at timestamp 1
6. The system remains offline for 1 hour (half the TTL)
7. When online, all received messages are forwarded to the IoT Hub

Power outage case
8. System is offline
9. A message appears on the Edge Hub at timestamp 1
10. The site loses power for 4 hours (TTL has long passed)
11. System boots up again and is now online
12. What happens to the messages in this scenario?

Say that for instance the retry interval of the edge hub is 2, the ttl is 3 and my message is sent at 1 and is not successfully delivered. We then suffer a power outage, and the hub comes back up on when the time is 10. Does the hub solve this scenario, or do I handle it myself?

Had a quick look at the edge module source and it looks like it uses System.currentTimeMillis() for Java in the EdgeModule, and if the messages timestamp is older than that a call back is fired with message status MESSAGE_EXPIRED. So I guess either increase TTL or handle this callback?

Hope you can shed some light on this! Loving the offline features so far, has saved us a ton of work.

Update:

So I realized that looking at the Java source is only half the puzzle, since the messages should be handled by the edgeHub container. Looking at the source for that, it looks like the same logic appears again. How these two interact is beyond me.

Azure IoT Edge
Azure IoT Edge
An Azure service that is used to deploy cloud workloads to run on internet of things (IoT) edge devices via standard containers.
574 questions
{count} vote

Accepted answer
  1. Sander van de Velde | MVP 32,726 Reputation points MVP
    2021-04-07T20:23:00.097+00:00

    Hello @Johan Karlsson , @QuantumCache ,

    We have used Azure IoT edge in numerous projects in different situations, up to the jungle of Malaysia.

    From the very start, we implemented this heartbeat module so we could become aware if there were irregularities together with the Azure Stream Analytics LAG query.

    Personally, I prefer the current way of guaranteed delivery. If we experienced missing messages it almost always was due to external factors (failing hardware or network or misconfiguration).

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. QuantumCache 20,266 Reputation points
    2021-04-07T19:16:20.88+00:00

    Hello @Johan Karlsson , the below scenario is a very good question and may address real-time industry problems.

    Power outage case

    1. System is offline
    2. A message appears on the Edge Hub at timestamp 1
    3. The site loses power for 4 hours (TTL has long passed)
    4. System boots up again and is now online
    5. What happens to the messages in this scenario?

    Below is the quoted response from the product team, which gives a brief on the initial query and suggestion.

    If we don't want messages to be dropped due to TTL, the built-in way is to just set it as some large number. Once edgehub ACKs the incoming message, the downstream sender is no longer responsible, if TTL is a concern then its length needs to be increased. Once EdgeHub receives a message, it guarantees delivery so the downstream sender wouldn't need to worry about reprocessing.

    One of the suggestions, implementing own message order validation by putting some kind of sequential ID in the message headers and having their backend processor ensure that there are no gaps and if a gap is found, the backend would trigger a direct method back down to the message sender to resend the missing messages.

    I will let you know if I find more content on this to help you with...

    We may also take suggestions from real-time industry experts & MVP such as Sander.

    Cc: @Sander van de Velde | MVP , could you please share your experience on this scenario, how to handle lengthy TTL during power outages?

    1 person found this answer helpful.

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.