March 2012
Volume 27 Number 03
Microsoft StreamInsight - Building the Internet of Things
By Torsten Grabs | March 2012
There’s a lot of buzz about the “Internet of Things” (IoT) lately, and for good reason. Ericsson CEO Hans Vestberg estimates 50 billion devices will be connected to the Web by 2020 (bit.ly/yciS7r [PDF download]). To put this number in perspective, right now there are about 1.5 billion Web-connected PCs and fewer than 1 billion Web-connected phones—50 billion is about seven devices for every human being on the planet! Market research firm IDC in turn estimates that more than 16 billion devices will be connected to the Internet by 2015 (see Figure 1). Admittedly, some more conservative projections exist as well, but by anyone’s numbers this represents a huge shift in the role of the Internet—from providing information and entertainment to people to supplying the connectivity for an emerging class of device-enabled applications.
Figure 1 Estimating the “Embedded Internet,” from IDC
The reason these large numbers are plausible is that powerful drivers—opportunity and necessity—are propelling a rapid increase in such solutions. As a recent issue of The Economist (economist.com/node/17388368) notes “… the bandwagon is not just rolling for the benefit of technology companies and ambitious politicians. It has gained momentum because there is a real need for such systems.” Figure 2 shows the growth of this type of solution by application area. For example, the mandated implementation of smart energy systems in Europe is being driven by necessity. We can’t build the energy-generation capability required if we don’t also manage energy use. On the opportunity side, the simple, ubiquitous vending machine is a good example. Once that device is connected, service personnel can be dispatched when needed rather than on some sub-optimal schedule. Prices can even be dynamically altered if there’s increased local demand or if the contents are nearing expiration. Power outages can be reported to trigger replacement of perishable items immediately. In other words, connectivity enables an entirely new and more efficient business model.
Figure 2 Application Growth by Segment
Characterizing this shift by referencing the number of connected devices is certainly dramatic, but it’s also somewhat misleading—this isn’t about full employment for the hundreds of thousands of traditional embedded programmers. These devices are just the endpoints for complex solutions that integrate other devices into all aspects of the Internet, including analytics, the cloud, Web applications, PC and mobile interfaces, and much more.
The way to look at this is that everybody currently building Web-based applications will be faced with integrating devices and helping to develop new businesses and new business models as a result. In other words, even if you’re not an embedded developer and don’t work in a shop that builds embedded devices, this is a very compelling opportunity for you to evaluate. Your current Microsoft skills will enable you to succeed in the IoT.
Running Analytics over Device Data
“Data has become the new currency,” said Kevin Dallas, general manager of the Windows Embedded team in a recent interview (bit.ly/wb1Td8). Compared with current Internet applications, the IoT is about information generation, management and access. Let’s compare the data characteristics of a typical Internet application today with an IoT application. You and perhaps several million others pay your bills online using a popular mechanism that’s shared by many financial institutions. You log on several times a month, view some pages and submit payment information. All of this is drawn from a traditional database with queries run when you start interacting with the system. The pages you download can be large, but the interactions are very sparse, though they generate valuable information (payment information, personal information updates and so forth) that needs to be retained indefinitely.
Contrast this with an energy management system where there might be 50 million buildings (commercial and residential) providing input. The input is generated by multiple local endpoints inside, for example, a house, with a single aggregated view posted to the back end. That data includes instantaneous usage information that becomes the basis for pricing and billing, and possibly mandatory controls, back to the building. This system involves very frequent interactions with relatively low-value data that’s not necessarily interesting once you compute the current state of the system and possibly the trend data for that endpoint. However, the system needs to be able to respond instantly to situations that potentially threaten it, such as demand surges that can cause grid overload and power outages. In that case, broadcasting the information might reduce energy consumption immediately. Such a system needs to continuously analyze the incoming data and compare trends over time to identify patterns that indicate a higher risk of power outages.
Figure 3 illustrates a typical architecture for IoT applications. The bottom of the stack shows various assets or devices that are instrumented with different kinds of sensors depending on the application. The sensors generally produce continuous data feeds that the application needs to process and analyze quickly. Depending on its capabilities, the device itself might be able to do some of the processing locally. This is called local analytics, and tools like the .NET Micro Framework can help you perform that local processing before the device passes on the data. IoT applications use Internet protocols to pass along the device data so that global analytics can be performed on the data. The results from global analytics, such as the overall health of a power grid, are of interest to end users who manage operations or to business decision makers. Analytics may also drive closed systems that take action automatically based on the conditions revealed in the incoming data. These approaches are particularly powerful if assets can receive feedback from the global analytics, for example, to effect a change in behavior or to improve operations. The global analytics driving these processes need to be calculated continuously and the results made available as quickly as possible. Moreover, the analytics frequently refer to time and to timestamps provided with the sensor data. Hence, just placing this kind of data in a database and running periodic queries on it is not the right approach. Fortunately, Microsoft StreamInsight allows a different approach.
Figure 3 Typical Architecture for Internet of Things Applications
Microsoft StreamInsight
Microsoft StreamInsight is designed to provide timely reactions to continuously arriving data without writing the data to disk for analysis and querying. Many IoT applications need to analyze incoming data in near-real time—right after the data is acquired from the sources. Think of that smart grid application we mentioned that needs to react quickly to a surge in electricity demand to rebalance the power grid. Many IoT applications have the same needs: data processing that requires continuous analysis and compelling latency. The analysis must be continuous because the data sources unceasingly produce new data. And many scenarios need to identify and react quickly to conditions that become apparent only from the analysis of the incoming data, so they require low-latency analysis with results available almost immediately. These requirements make it impractical to store the data in a relational database before performing the analysis.
We call these applications event-driven applications, and the IoT is just one scenario where such capabilities are highly useful. StreamInsight is a powerful platform for building these highly scalable, low-latency, event-driven applications. It’s been part of Microsoft SQL Server since version 2008 R2. StreamInsight complements SQL Server with event-driven processing and rich, expressive time-based analytics. With StreamInsight, business insight is delivered at the speed the data is produced, as opposed to the speed at which traditional database reports are processed.
Analytical results that are available for human consumption right away or that enable an application to react to events automatically help businesses get a timelier and more relevant view into their operations, or even automate parts of their operations. They can also react more quickly to critical situations, opportunities or trends emerging from the sensor or device data.
To write StreamInsight applications, developers use familiar tools such as the Microsoft .NET Framework, LINQ and Microsoft Visual Studio. Figure 4 depicts the developer and runtime experience of a StreamInsight application and introduces some of the key concepts.
Figure 4 StreamInsight Application Development and Runtime
A Simple IoT Application
Let’s look more in depth at a potential IoT application; then we’ll build it out. For our end-to-end example, we’ll focus on a simple scenario that uses motion sensors to monitor rotating equipment such as turbines or windmills. This is vital because too much vibration could point to a critical condition in which the equipment is likely to break down and cause significant damage if not stopped quickly. To detect this condition reliably, each piece of equipment has multiple sensors tracking motion. A surge in motion from a single sensor might just indicate unreliable data readings from that sensor, but abnormally high motion from multiple sensors at the same time is considered a critical condition. For a large turbine, for example, you’d probably want to raise an alarm or even shut down the equipment automatically. Besides checking continuously for such conditions, we also want to give operators a dashboard that provides a near-real-time view of equipment status.
To build out this scenario, we need to address the following requirements and technical challenges:
- What data does the device need to capture?
- What sensors do we use to measure it?
- How does the device communicate its sensor readings to the Internet?
- How do we collect the device data in one place for analytics?
- How can we continuously analyze the incoming data and react quickly to critical conditions?
- How do we correlate sensor readings in time across multiple devices so we can check for global conditions?
Let’s take a look at how we address these requirements and implement the end-to-end scenario.
An IoT Application: Implementation Highlights
Here are some of the key steps to implement an IoT application as outlined in the previous section. We’ll discuss the device first, jump to the visualization of the output and then move to the analytics across devices that populate the dashboard.
The Device To build a sensor device, we started with the Netduino Plus, a popular small board with 128K SRAM that runs the .NET Micro Framework. We added a common hobbyist Wi-Fi radio called the WiFly GSX Breakout and mounted the actual sensors, including a three-axis accelerometer, on a custom PCB board. We programmed the device to send an update of the sensor readings every second to a Web service that acts as the hub to collect and process the data from all devices.
We’re using a RESTful connection with the Web service—just an HTTP POST containing comma-separated name-value pairs. Of course, you can do this from any kind of device that supports HTTP. We opted to use the .NET Micro Framework so that the entire application—including the devices, the Web service, the StreamInsight adapters, the Silverlight dashboard and so forth—could all be written using a single programming model (.NET) and tool chain (Visual Studio). Clearly, if you have .NET skills, you don’t need to hire new staff or farm out parts of your IoT project to an external embedded shop; you have the skills to do it all. For example, setting up the accelerometer requires only a few lines of code to access the AnalogInput class and call the Read method:
this.analogInputX = new AnalogInput(pinX);
this.analogInputY = new AnalogInput(pinY);
this.analogInputZ = new AnalogInput(pinZ);
...
rawZ = analogInputZ.Read();
rawY = analogInputY.Read();
rawX = analogInputX.Read();
Once the sensor input is read and the HTTP message content formatted, all that’s required to send it is included in Figure 5.
Figure 5 Submitting Sensor Data
protected void submitSensorData(string uri, string payload)
{
// Message format
StringBuilder sb = new StringBuilder(256);
sb.Append(
"POST /Website/Services/DataService.aspx?method=SaveDeviceData HTTP/1.1\n");
sb.Append("User-Agent: NetduinoPlus\n");
sb.Append("Host: 192.168.1.101\n");
sb.Append("Connection: Keep-Alive\n");
sb.Append("Content-Length: ");
sb.Append(payload.Length.ToString());
sb.Append("\n");
sb.Append(payload);
sb.Append("\n");
try
{
HttpResponse response = webServer.SendRequest(uri, 80, request);
}
catch
{
...
}
}
On the server side, we implement the method SaveDeviceData, to which the devices POST their messages. We split the message string and parse the MAC address, the timestamp and payload data, such as the motion readings from the accelerometer. We use all of this to populate a DeviceData object (see Figure 6) that we pass to StreamInsight for subsequent analysis.
Figure 6 Populating the DeviceData Object
private int SaveDeviceData()
{
...
List<string> data = record.Split(',').ToList();
DeviceData deviceData = new DeviceData();
deviceData.MAC = NormalizeMAC(data[0].Trim());
deviceData.DateTime = DateTime.UtcNow;
...
deviceData.Motion = Convert.ToDecimal(data[2].Substring(data[2].IndexOf(":") + 1));
...
// Communicate each new device data record to StreamInsight
DeviceDataStreaming streaming = (DeviceDataStreaming)
HttpContext.Current.Application[Global.StreamingIdentifier];
streaming.TrackDeviceData(deviceData);
...
}
The DashboardNow we want to build a dashboard that lets the equipment operator view the current status of the sensors on the equipment. For ease of presentation, we’ll focus on just a single piece of equipment. Figure 7 shows an example of such a dashboard. Let’s start on the left and look at the different views of the sensor data.
Figure 7 Dashboard for Equipment Monitoring
Moving Averages View: The data grid on the bottom left shows the sensor readings from the device with values for light, temperature and motion, as well as the device ID and a timestamp. As you can see from the timestamps, these values are updated every second. But, instead of displaying the raw sensor values, the dashboard shows moving averages over 10 seconds worth of sensor data. This means every second the values are updated with the average of the last 10 seconds worth of data. Using a moving average is a common, simple technique to protect against the effect of outliers and bad data that occur occasionally with low-cost sensors.
Trendline View: On the lower right, the dashboard shows the trendlines for the sensors. The trendline view is driven by the moving averages shown in the data grid on the left.
Alarm View: The view at the upper right displays a data grid for alarms. If a critical condition is detected, an alarm is raised that shows the time and additional information such as severity and status.
Analytics Now let’s take a look behind the scenes and discuss the analytics that are processing the incoming sensor data and calculating the results the dashboard visualizes. We use StreamInsight to do the analytics. The following class represents the device data, including the MAC address, a timestamp and the sensor values:
public class DeviceData
{
public string MAC { get; set; }
public DateTime DateTime { get; set; }
public decimal? Light { get; set; }
public decimal? Temperature { get; set; }
public decimal? Motion { get; set; }
}
This class defines the shape of a single event, but we want to start reasoning about many events. To do this, we define an Observable data source for StreamInsight. This is simply a data source that implements the System.IObservable interface:
public class DeviceDataObservable : IObservable<DeviceData>
{
...
}
Once you have a .NET Framework sequence like an Enumerable or an Observable like this, you can start writing StreamInsight queries over these collections. Let’s take a quick look at some of the key queries. The first one takes the Observable as an input and produces a stream of StreamInsight point events, using the DateTime field in the device data as the timestamp for the StreamInsight event. We take this stream as input in the next LINQ statement and group the data by the MAC address. For each group, we then apply a hopping window (a time-based subset of events) with a window size of 10 seconds and let the window recalculate every second. Within each window, we calculate the averages for temperature, light and motion. That gives us a moving average per device that recalculates every second. Figure 8 shows the code for this wrapped in a function that returns the result as a stream of StreamInsight events.
Figure 8 Getting the Moving Averages
public static CepStream<AverageSensorValues> GroupedAverages(
Application application,
DeviceDataObservable source)
{
var q1 = from e1 in source.ToPointStream(application,
e => PointEvent.CreateInsert(
new DateTimeOffset(
e.DateTime.ToUniversalTime()),e),
AdvanceTimeSettings.StrictlyIncreasingStartTime,
"Device Data Input Stream")
select e1;
var q2 = from e2 in q1
group e2 by e2.MAC into groups
from w in groups.HoppingWindow(
TimeSpan.FromSeconds(10),
TimeSpan.FromSeconds(1))
select new AverageSensorValues
{
DeviceId = groups.Key,
Timestamp = null,
AvgTemperature = w.Avg(t => t.Temperature),
AvgLight = w.Avg(t => t.Light),
AvgMotion = w.Avg(t => t.Motion)
};
return q2;
}
This is a great place to think about implementing the alarm query. Remember, the alarm is to be triggered when multiple motion sensors move above the motion threshold at the same time. We can handle this with just a couple of StreamInsight LINQ statements over the grouped averages just calculated. The first query, q3, applies a neat trick by representing changes of the alarm threshold as a stream of events called AlarmThresholdSignal. The query joins the thresholds with the averages stream from the previous query and then just filters for the events above the threshold:
var q3 = from sensor in GroupedAverages(application, source)
from refdata in AlarmThresholdSignal(application, alarmsthresholds)
where (sensor.AvgMotion !=
null && (double) sensor.AvgMotion > refdata.Threshold)
select new
{
AlarmDevice = sensor.DeviceId,
AlarmInfo = "This is a test alarm for a single device",
};
The next query uses the StreamInsight snapshot window to identify points in time when event statuses change. If a new event results from the previous filter query, this is a new snapshot and the snapshot operation produces a new window containing all events that coincide or overlap with the event that triggered the snapshot window. The following code counts events above the alarm threshold when the snapshot window is created:
var alarmcount = from win in q3.SnapshotWindow()
select new
{
AlarmCount = win.Count()
};
The final step checks whether the count shows that multiple devices are indicating alarms:
var filteralarms = from f in alarmcount
where f.AlarmCount >= 2
select new AlarmEvent
{
AlarmTime = null,
AlarmInfo = "Now we have an alarm across multiple devices",
AlarmKind = 0,
AlarmSeverity = 10,
AlarmStatus = 1
};
Now we just need to get the output streams with the average sensor values and alarms from StreamInsight to the UI.
Getting the Stream to the UI
With StreamInsight producing the result streams on the server side, we need a way to communicate these streams to the consumers. Consumers probably won’t run in the server process and might use a lightweight Web application to visualize the results. If you’re using Silverlight, the duplex protocol is convenient because it supports continuous push-based delivery from the server to the client. HTML5 Web sockets are a compelling alternative, too. In any case, you want to make it easy to add new analytics on the server side and be able to easily wire them up with the UI—without tearing apart the client-server interfaces between the UI and the process hosting StreamInsight. If your load between UI and server is moderate, you can serialize the results on the server side into XML and deserialize them on the client side. That way, you only need to worry about XML across the wire and in your client-server interfaces, plus an additional cookie to indicate what types to expect for deserialization. Here are a couple of the key pieces of code.
The first code snippet is the Windows Communication Foundation contract for flowing the event data as an XML-serialized string, along with a GUID to indicate the type:
[ServiceContract]
public interface IDuplexClient
{
[OperationContract(IsOneWay = true)]
void Receive(string eventData, Guid guid);
}
Now we can annotate our result event structures with a data contract to make them serializable, as shown in Figure 9.
Figure 9 Annotating the Event Structures
[DataContract]
public class AverageSensorValues : BaseEvent
{
[DataMember]
public new static Guid TypeGuid =
Guid.Parse("{F67ECF8B-489F-418F-A01A-43B606C623AC}");
public override Guid GetTypeGuid() { return TypeGuid; }
[DataMember]
public string DeviceId { get; set; }
[DataMember]
public DateTime? Timestamp { get; set; }
[DataMember]
public decimal? AvgLight { get; set; }
[DataMember]
public decimal? AvgTemperature { get; set; }
[DataMember]
public decimal? AvgMotion { get; set; }
}
We can now easily serialize result events on the server side and communicate them to the client, as Figure 10 shows.
Figure 10 Sending Result Events from the Server
static public void CallClient<T>(T eventData) where T : BaseEvent
{
if (null != client)
{
var xmlSerializer = new XmlSerializer(typeof(T));
var stringBuilder = new StringBuilder();
var stringWriter = new StringWriter(stringBuilder);
xmlSerializer.Serialize(stringWriter, eventData);
client.Receive(stringBuilder.ToString(), eventData.GetTypeGuid());
}
}
On the client side, we deserialize the event in the callback method for the duplex service and then branch into different methods based on the type of event received, as shown in Figure 11.
Figure 11 Receiving and Deserializing the Event on the Client
void proxy_ReceiveReceived(object sender, ReceiveReceivedEventArgs e)
{
if (e.Error == null)
{
if (AverageSensorValues.TypeGuid == e.guid)
{
ProcessAverageSensorValues(Deserialize<AverageSensorValues>(e.eventData));
}
else if (AlarmEvent.TypeGuid == e.guid)
{
ProcessAlarms(Deserialize<AlarmEvent>(e.eventData));
}
else
{
ProcessUnknown();
}
}
}
With these queries and the communication to the Web application in place, you can now pick up several devices and shake them until some are above your alarm threshold. The UI will then produce one of those nice red alarms like the one shown in Figure 12.
Figure 12 Equipment Dashboard with Alarms
Because new data is constantly coming in with a near-real-time dashboard, ObservableCollections are extremely useful for updating your UI. If you base your data grids and trendlines on these Observable collections, you don’t need to worry about the updating part in your code. The collections are doing this for you automatically behind the scenes.
The Outlook
In this implementation, the devices communicate with a regular Web service that could be running on an ordinary PC connected to the Internet. But cloud computing is an attractive alternative; you don’t necessarily need to own the hardware and run the software for your own Web server. Instead, a service in the cloud can serve as the hub where all the device data is being collected for your application. This also makes it very easy for you to elastically scale your processing power as the number of devices grows or you deploy additional analytics over the device data. Microsoft is planning to provide StreamInsight capabilities as a service in Azure (StreamInsight Project Codename “Austin”). By providing predefined communication endpoints and protocols, Austin will make it easy to connect your devices to rich analytical processing capabilities in the Microsoft cloud. If you deploy your IoT applications in Azure, you’ll automatically get the cloud benefits of elastic scale and pay-as-you go to manage device connections and to perform rich analytics over device data.
Another important shift is happening with the recent standardization effort from the W3C. The most important initiatives for IoT applications are HTML5 and Web sockets. HTML5 provides the platform for rich Web applications such as the dashboard we implemented. WebSocket in turn simplifies full-duplex communication between the browser and the Web server over TCP, in particular for the push model of results delivery that the continuous processing of sensor data requires.
Connected devices are opening up an exciting new world of applications, and the tools for building these IoT applications are available from Microsoft today. Here we have shown how you can use your .NET Framework skills at the device level, using familiar interfaces, and feed data through Web services into the powerful analytics capabilities of StreamInsight. Get started building your IoT applications using connected devices now!
Torsten Grabs is a lead program manager in the Microsoft SQL Server division. He has more than 10 years’ experience working with Microsoft SQL Server products and holds a Ph.D. in computer science from the Swiss Federal Institute of Technology, Zurich, Switzerland.
Colin Miller has worked for 25 years (including 15 at Microsoft) on PC software, including databases, desktop publishing, consumer products, Word, Internet Explorer, Passport (LiveID) and online services. He’s the product unit manager of the .NET Micro Framework.
Thanks to the following technical experts for reviewing this article: Rafael Fernandez Moctezuma and Lorenzo Tessiore