The Azure Stream Analytics Query Language
This post is by Janet Yeilding, a Program Manager on the Azure Stream Analytics team, and is based on a recent session from the Microsoft Advanced Analytics webinar series.
Azure Stream Analytics (ASA) is a cloud service that enables real-time processing over streams of data flowing in from devices, sensors, websites and other live systems. The stream-processing logic in ASA is expressed in a SQL-like query language with some added extensions such as windowing for performing temporal calculations. At a recent webinar, we explained key language constructs and query patterns and did an Internet of Things (IoT) demonstration – you can view a recording of the webinar here – and, in this post, we share a few highlights from the webinar and about this technology.
Events and Time
ASA is a temporal system, so every event that flows through it has a timestamp. A timestamp is assigned automatically based on the event's arrival time to the input source but you can also access a timestamp in your event payload explicitly using TIMESTAMP BY:
SELECT * FROM SensorReadings TIMESTAMP BY time
As you perform computations and aggregations over your input and generate output events, Stream Analytics will assign timestamps to them, as well. You can access this timestamp with System.Timestamp:
SELECT System.Timestamp AS Time FROM SensorReadings
Windowing
Windowing is a core requirement for stream processing applications to perform set-based operations like counts or aggregations over events that arrive within a specified period of time. ASA supports three types of windows: Tumbling, Hopping, and Sliding.
Tumbling Windows
Tumbling Windows define a repeating, non-overlapping window of time.
Example: Calculate the count of sensor readings per device every 10 seconds
SELECT sensorId, COUNT(*) AS Count
FROM SensorReadings TIMESTAMP BY time
GROUP BY sensorId, TumblingWindow(second, 10)
Hopping Windows
Like Tumbling Windows, Hopping Windows move forward in time by a fixed period but they can overlap with one another.
Example: Every 5 seconds calculate the count of sensor readings and the average temperature over the last 10 seconds
SELECT sensorId, COUNT(*) AS Count, AVG(temp)
FROM SensorReadings TIMESTAMP BY time
GROUP BY sensorId, HoppingWindow(second, 10 , 5)
Sliding Windows
With a Sliding Window, the system is asked to logically consider all possible windows of a given length and output events for cases when the content of the window actually changes – that is, when an event entered or existed the window.
Example : Generate an output event if the temperature is above 75 for a total of 5 seconds
SELECT sensorId, MIN(temp) as temp
FROM SensorReadings
TIMESTAMP BY time
GROUP BY sensorId, SlidingWindow(second, 5)
HAVING MIN(temp) > 75
Joining Multiple Streams
Like standard T-SQL, JOIN in ASA is used to combine records from two or more input sources. JOINs in ASA are temporal in nature, meaning each JOIN must provide some limits on how far the matching rows can be separated in time.
Example: Find cases where the humidity reading changed by at least .1 within 5 seconds
SELECT s1.time, s1.dspl, s1.hmdt as previousHmdt, s2.hmdt as newHmdt, datediff(ss, s1.time, s2.time) as secondsApart
FROM SensorData s1 timestamp by time
JOIN SensorData s2 timestamp by time
ON s1.dspl = s2.dspl
AND DATEDIFF(s, s1, s2) BETWEEN 0 AND 5
WHERE (s2.hmdt - s1.hmdt >= .1) or (s1.hmdt - s2.hmdt >= .1)
Reference Data JOIN
Reference data is static or slow-changing data. It is often used for performing lookups or correlations and can be incorporated into a ASA query using a Reference Data JOIN.
Example: Use a lookup table to perform dynamic temperature threshold alerting for each device
SELECT SensorReadings.sensorID, SensorReadings.temp
FROM SensorReadings
JOIN thresholdRefData
ON SensorReadings.sensorID = thresholdRefData.sensorID
WHERE SensorReadings.temp > thresholdRefData.value
Multiple Outputs
ASA supports multiple computations and output targets in a single job. You can use the INTO keyword to specify an output target explicitly.
Example: Write all events to outputLog and write events above 75 degrees to outputTempAlert
SELECT *
INTO outputLog
FROM SensorReadings
SELECT *
INTO outputTempAlert
FROM SensorReadings
WHERE temp > 75
ML Functions
Azure ML and ASA are now integrated in private preview, enabling you to call ML models over streaming data in an ASA job. For details on supported scenarios and signup information, see Real Time scoring of streaming data using Machine Learning models.
Stay Connected!
You can learn more about ASA from the documentation page and blog or follow us at @AzureStreaming for the latest news.
Janet
Comments
- Anonymous
January 01, 2003
We are using it for log processing and reducing time to detect and mitigate on some internal services. For specific hardware scenarios, you might be interested to learn about Fujitsu and NEC's use cases which were recently showcased at Build and Ignite:http://blogs.msdn.com/b/streamanalytics/archive/2015/05/11/iot-solutions-built-on-asa-nec-and-fujitsu.aspx - Anonymous
June 02, 2015
Nice stuff, this looks super powerful for dealing with hardware.
What kinds of things is it used for internally? - Anonymous
June 10, 2015
What type of prediction services does Azure ML offer for streaming data. I am interested in survival analysis of electronic equipment where I am trying to predict which equipment and what location will fail. We typically collect temperature, power stats like voltage & current,