IoT Hub does not allow near real time messaging

We are using IoT Hub in a scenario, where near real time communication is required.
We define near real time communication as less than 2 seconds for a message loop (C2D -> D2C).
We have observed, that even though we are running emulated devices on a virtual server located in Azure data center, some messages are still requiring 10 seconds or more for delivery. Most of the messages are completed in 400ms.
What is causing the huge difference in delivery time, and what can be changed?
Is IoT Hub useable for any near real time communication?
The Microsoft comment: "Avoid making any assumptions about the maximum latency of any IoT Hub operation" (https://learn.microsoft.com/en-us/azure/iot-hub/iot-hub-devguide-quotas-throttling#latency) does not sound promising...
Thanks
Martin
IoT Hub and VM are both in "West Europe".
We have logged additional data, and that data still show delay, but not 10s delays. But I would like to get more understanding about, why almost all IoT method calls are completed within 300ms, and when not, the time is typically around 2-3 seconds. Why do we not see more varying delays?
Hey @SatishBoddu-MSFT , any insights from your side? I think on the questions "should you use IoT Hub for near real-time messaging" is just no, but these regular delays are higher than what I've seen before.
Hello @Matthijs van der Veer We are investigating this internally with the support team. I will let here know the final outcome.
1/2
@Martin Jørgensen thank you for your great questions and feedback! I wanted to understand better your scenario, please provide more details about:
1) What
MessageSentTicks
andMessageResponseTicks
means? Are you counting the time it takes when you send a D2C and C2D message and receive a confirmation that message was delivered? How do you validate that a message was delivered and how do you measure it? Example: "Receive delivery feedback"2) In the original post you mentioned that you are measuring C2D messages but then mentioned "why almost all IoT method calls are completed within 300ms, and when not, the time is typically around 2-3 seconds. " . I want to be sure that we are not referring to Direct Methods as their behaviour is substantially different from C2D messages.
2/2
Please also take into consideration the following doc: Trace Azure IoT device-to-cloud messages with distributed tracing (preview)
If the intention is to measure IoTHub performance we should look at these:
Hi @António Sérgio Azevedo ,
I have an open support ticket at the moment.
A full round trip for us is:
1.1 Device received method and responds to method immediately
The cloud will know, that command was accepted (received) due to direct method response.
When async D2C message is received, all data will be correlated via a transaction ID, which the cloud generates for the direct method.
When running the tests, both "cloud" and device is running at the same virtual server in the datacenter of IoT Hub.
The "MessageSentTicks" is the "System.Environment.Ticks" for direct method response.
The "MessageResponseTicks" is the "System.Environment.Ticks" for async D2C message.
The test system was running for 30+ days during christimas holiday, and at some point in time the delay was very high for a long time...
@Martin Jørgensen could you please share the support ticket number?
1) Have ever "MessageSentTicks" aka "System.Environment.Ticks" for direct method response , took 10 seconds or more?
2) Since the device is processing the command before sending an async reponse via D2C, I would argue that latency can be elsewhere rather than IoTHub.
Can you provide a quick skecth on how the "light" and "button" communicate? This is when C2D, D2C , Direct Methods are used?
Thanks.
Support ticket: 120120325001340
The support ticket also includes source code for the client/cloud test applications.
1) Yes... And sometimes the async D2C message arrives before the direct method response is received. But that might be okay, since there is no guaranty regarding which order direct method is handled and D2C messages
2) That could be the case. But the virtual server is only running these 2 applications. But an improvement to the "Client" application would be to add timestamp of when D2C was queued...
Regarding the scenario (in reality it is not bulb/switch)..:
(Now APP knows, that bulb in online, and accepted the command)
The direct method + async D2C combination was selected to make it possible to have long running commands (>10 seconds, like running motor), and still be able to report, when command has completed successfully.
Like:
Send command => "Accepted"
Run command (motor) => "Started"
Command complete => "Success"
I am connecting with the Engineer and will update this thread when there is an accepted resolution for the support ticket.
Definitely this will give us more reliable values and remove
device's CPU, Memory issues
from the picture.To me this looks like a good approach! Would like to give you more confidence that the delay of 10 secodns is not on IoTHub side, but let's continue the research over the support ticket and update the community with results :)!
Sign in to comment
Hello @Martin Jørgensen ,
Thank you so much for your time in the Azure Support Ticket. I am now posting our conclusions and would appreciate if you can verify this as the answer or add any other comments for further explanation.
When analyzing more carefully the logs you provided we realized that two messages in a row never took more than 1 second to be delivered - which is a totally acceptable behavior. The strategy to overcome that is to send the message again if no ack is received after 1 second (or based on any other benchmark for retry interval you define on your own). We do already have a document on no guarantees around the message delivery latency - Reference - IoT Hub quotas and throttling#Latency | Microsoft Learn - and as expected the retry period (example: 1 second if no ack) would vary depending upon many factors, including device’s network connection and device’s processing. Note that, if we setup a very aggressive retry policy we may be throttled, so there needs to be a balance and delay expectations should be well set in customer experience design.
Thank you so much for these great questions and we hope we have provided you the right tools to proceed with your development.
Remember:
I accept the answer, that you can not guarantee any message delivery latency. In real life, if is possible to observe 10s+ delays, and you are right, that 99% of the time, the delay is very short. The delays observed was not depending on any network connection outside Microsoft, because all tests were done inside same Microsoft data center. If delivery time/delay is critical to a solution, IoT Hub is not optimal.
@Martin Jørgensen yes you are correct, IoTHub didn't achieve 100% Service Level Agreemen (SLA) yet : https://azure.microsoft.com/en-us/support/legal/sla/iot-hub/v1_2/
"For IoT Hub, we promise that at least 99.9% of the time deployed IoT hubs will be able to send messages to and receive messages from registered devices and the Service will able to perform create, read, update, and delete operations on IoT hubs."
In a 30day billing period the SLA is met even if we have a downtime\delay of around 43 minutes.
Let me know if you have further questions?
Thank you!
I understand.
This question was related to "real time messaging".
And all the tests show, that delivery delays of 10 seconds or more is not unusual.
But most of the time, it is less than 200ms.
In a "real time scenario", like sending a command from an APP to a device attached to IoT Hub, a 10 seconds delay is not always acceptable - even though it happens in less than 1% of the time.
Thanks
Sign in to comment
2 additional answers
Sort by: Most helpful
Well, if the internet connection is lost, the delivery times will be even larger ;-)
My impression is that the IoT Hub is built for scalability and reliability. Though, the waiting times of 10 seconds seem strange.
Did you look at the partition setting already? Under the covers, the number of parallel processes is set with this setting:
If you rely on sub-second response times, please consider an Edge solution where the cloud logic you use to make decisions, is put on an edge device. This takes out at least the internet component of your roundtrip.
Update: Check for the nearest region for the IoT Hub (lowest latency)
Sign in to comment
At the moment the IoT Hub only handles a few messages per second, so it should not be a load issue.
We are using "S1 - Standard" and 4 partitions.
The test setup I have made is using the IoT Hub directly, without use of Azure function triggers etc.
Our complete solution involving several EventHubs/ServiceBusses etc. shows additional latency.
Often the latency in each "message" component in Azure is about 1 second!!! And other times it is perhaps 100ms.
And we have observed even higher delays in "message" components, but Microsoft reply that is just because we are using a "shared ressource", and someone else in Microsoft infrastructure put pressure on "message" components.
The sub-second is definitely not something you can rely on (except if you go for single-tenant deployments. Have not tested it...)...
Our solution is based on a customer initiating an operation on a device. Like turning on light. That functions is difficult to implement as an edge solution...
And it does make a difference, if the light turns on in 1 second or 10 seconds.
Also check out if you are actually running on the 'nearest' IoTHub...
Why is edge working in your case regarding customer actions?
Edge computing is a very good solution, especially in this case. If your button/light is connected to the IoTHub (by some direct-connected IoT device) you should definitely look into the transparent gateway option of IoT Edge.
There, child devices push their messages through the edge to the cloud while based on these messages commands (direct methods) can be sent back to these or other devices.
So just move over your cloud logic to the edge and have a much better latency.
I recommend getting in contact with the Global Blackbelt team of Microsoft through your regional Microsoft office to discuss your requirements and architecture.
See this blog for an example of a transparent gateway implementation.
In our system, the "button" and "light" is not on the same network...
My question is regarding the "expected"/"acceptable" delay for IoT Hub, and if IoT Hub is suitable for (near) real time control.
For us edge computing does not solve the problem.
I will leave the question open until someone can confirm, that 10 seconds or more is what to expect from IoT Hub (when test in performed on "internal Microsoft" network).
This is not expected to happen on multiple messages sent in a row, though we need to plan for latency as explained in the answer provided.
Sign in to comment
Activity