Examine the IoT Lambda architecture

Completed

The following list includes three distinct purposes for storing telemetry readings that are generated by IoT devices:

  • To be analyzed for anomalies - for preventive maintenance.
  • For visualization by a remote human operator - to help in decision making.
  • To be archived - for later analysis.

Each of these scenarios has conflicting storage requirements. However, having conflicting goals doesn't need to be a bad thing. Conflicting goals for data storage lead us to hybrid systems, which can be flexible and powerful.

The following sections describe the hybrid nature of the IoT lambda architecture.

Data paths

The apparent conflict with Azure IoT data is as follows. Telemetry data is coming in hot, there's lots of it, and it needs to be analyzed quickly. Preventive maintenance is the goal of this analysis, where the data should be stored, both to archive it, and to run some deeper analysis over longer time periods. The deeper analysis is used to detect longer term trends or failure patterns that might be difficult to detect with a shorter real-time sample.

One of the easiest ways of handling this duality at the device sensor end of things, is to send two messages:

  • The first message contains only the telemetry data that needs to be analyzed in real time.
  • The second message contains the telemetry, and all the other data that might be needed for deeper analysis or archiving.

The Azure IoT Hub routes these two messages to different resources. It's common to use the familiar terms hot, warm, cool, and cold in data analysis:

  • Hot clearly means a real-time approach is needed.
  • Warm can have the same meaning, though perhaps the data is "near" real time, or at least, recent.
  • Cool means the flow of data is slow.
  • Cold means that the data is stored and not "flowing."

Understand lambda architecture

The Lambda architecture for an Azure IoT solution enables multiple data paths. However, for the sake of explanation, let's work with two paths: hot and cold.

In the following diagram:

  • Fast path - Real Time Processing (hot path) is the streaming telemetry routed into real-time analysis. This path is also the right path to trigger warnings and alerts.

  • The Slow Path - Batch Processing (cold path) is a batch processing path for telemetry data storage.

Diagram that shows the lambda architecture for an IoT solution that includes hot and cold storage paths.

The hot path

In this scenario, the IoT remote device pumps out specific telemetry. This telemetry is sent in its own message, routed by the IoT Hub for instant analysis and visualization. The analysis can be done by a human operator, say, using Azure Data Explorer. This approach is described in this module.

Alternatively, the analysis could be handled by Azure Machine Learning models, via Azure Stream Analytics. This scenario is more complex and involves coding.

The cold path

In this scenario, the IoT remote device also sends out all telemetry, and logging data. The IoT Hub directs these messages down a route to an Azure storage account. There are various storage resources available in Azure, and the next units describe these options.

Issues with lambda architecture

Similar to most hybrid systems, there are issues. One of the main issues with IoT is the duplication of data and code. The more duplication there is, the greater the chance of an unwanted divergence between the duplicate copies. Developers of the IoT device sensor code need to ensure that the telemetry data being sent in the two messages is identical, where appropriate. There may be code duplication in the analysis apps for the hot and cold paths. Duplication needs to be handled carefully, though is a near unavoidable consequence of a hybrid system.