Azure Sphere: IoTHubDeviceClient_LL_SendEventAsync() queuing behavior?

Question

We have a customer developing an Azure Sphere based solution and they are working through some failure scenarios specifically as they pertain to the WiFi network. I think other Azure Sphere developers will also have these questions so I wanted to capture them here in this public forum for future reference.

What is the theoretical limit to how many messages or how much data can be queued up when sending messages to the IoT Hub using IoTHubDeviceClient_LL_SendEventAsync()? Assume that the data connection to the IoT Hub is down when attempting to send these messages.
If there are pending messages in the queue from calling IoTHubDeviceClient_LL_SendEventAsync(), is there any action that the application could take that would force the queue to flush? FYI, this would be undesirable. For example if the application has messages pending and then calls SetupAzureClient() to re-establish the IoT connection, are the messages preserved in the queue, or do they get flushed.

Thank you for the help

Accepted Answer

Hello @WillessBrian-1191 , We have contacted the Azure PG team on this question and below is the response, hope fully this response helps you, please confirm!

The messages are allocated on the heap dynamically and it is not pre-allocated.
It would depend on how much memory other allocations are performing.

As an aside, this particular issue is purely an IoT C SDK question which can be asked on the Issues tab on the GitHub repo: azure-iot-sdk-c.

We will let you know if there are any further updates on this thread. Please let us know if you need further help in this matter.

Answer

Hi Brian (I believe it is Brian),
We have a Github issue here with a similar question. The short of it is that we do not have SDK abilities to flush the queue or control the queue size. The user can do that with some application code if they desire (more details in the GH issue). Hope that answers that for ya!

Answer

Hello @WillessBrian-1191 , Thanks for your contribution to Microsoft's Q&A forum

So there are basically two questions which we need to focus over here.

1) Throttle limits

The best place I would recommend is to visit Reference - IoT Hub quotas and throttling to get understanding of all updated and latest Quotas and Throttling. Please bookmark this URL for all future reference purpose.

You may want to read more on this 'IoT Hub throttling and you' blog post.

Another useful GitHub link which talks about IoTHub limits in a nut-shell is azure-docs/includes/iot-hub-limits.md

2) Queue management

A) timeToLive

By default, the client SDKs and the Edge Hub use QoS level 1 (at least once). When the Edge Hub sends an acknowledgement of the message, it has already saved the message to disk. The Edge Hub keeps track of the message offset for each receiver and will resume from this point. This works because the client SDK by default also uses QoS 1 and must acknowledge processing of the message. (message completion).

Putting these acknowledgements together and the persistent storage of message gets you at least once delivery. Messages are also queued per endpoint and are guaranteed to be delivered in order.If the Edge Hub fails, then it is expected that the module will continue to retry sending a message.Something to be aware of is that messages can be removed from the queue if their TTL expires. This means that all of these retries must complete before the TTL. This TTL value is configurable via the Edge Hub's twin and defaults to 2 hours. For example: "timeToLiveSecs": 7200

 "$edgeHub": { 
   "properties.desired": { 
       "schemaVersion": "1.0", "routes": { 
           "route": "FROM /messages/* INTO $upstream" 
           }, 
           "storeAndForwardConfiguration": { 
               "timeToLiveSecs": 7200 
           } 
       } 
   }

Also refer to Azure/azure-iot-sdk-csharp

B) Storage management

If you wanted to enable the IoT Edge hub to store messages in your device's local storage and retrieve them later, you can configure the environment variables and the create options in the Azure portal in the Runtime Settings section or you can configure the local storage directly in the deployment manifest. Please refer to Link module storage to device storage

Understand extended offline capabilities for IoT Edge devices, modules, and child devices

C) Retry

The SDKs provide three retry policies:

Exponential back-off with jitter: This default retry policy tends to be aggressive at the start and slow down over time until it reaches a maximum delay. The design is based on Retry guidance from Azure Architecture Center.
Custom retry: For some SDK languages, you can design a custom retry policy that is better suited for your scenario and then inject it into the RetryPolicy. Custom retry isn't available on the C SDK, and it is not currently supported on the Python SDK. The Python SDK reconnects as-needed.
No retry: You can set retry policy to "no retry," which disables the retry logic. The SDK tries to connect once and send a message once, assuming the connection is established. This policy is typically used in scenarios with bandwidth or cost concerns. If you choose this option, messages that fail to send are lost and can't be recovered.

For more info please visit: Manage connectivity and reliable messaging by using Azure IoT Hub device SDKs

// define/set default retry policy 
IRetryPolicy retryPolicy = new ExponentialBackoff(int.MaxValue, TimeSpan.FromMilliseconds(100), TimeSpan.FromSeconds(10), TimeSpan.FromMilliseconds(100)); 
SetRetryPolicy(retryPolicy);

A similar Github issue to be read is 01/2019 dated : When a consuming module is offline #23265

Please let us know if you need further help in matter, We are happy to help you!

Azure Sphere: IoTHubDeviceClient_LL_SendEventAsync() queuing behavior?

2 additional answers