A Technical Case Study: High Speed IoT Data Ingestion Using In-Memory OLTP in Azure
In this post we look at a customer case study of an Internet of Things (IoT) scenario, where large amount of device data is ingested into an Azure SQL Database. Because the data lives in a SQL database, it can be conveniently accessed and analyzed through SQL queries. In-Memory OLTP was used to achieve significant performance gains in data ingestion without making application changes.
This customer creates and sells popular consumer devices that reside in people’s homes. Every day, a subset of these devices emits diagnostic data. The diagnostic data is used downstream to:
- Track quality of the devices.
- Compare with the tests that were done on the device before it left the factory.
- And analyze trends and individual issues, in order to improve customer care.
Azure SQL is a fast and convenient to correlate telemetry with customer data and business data. Every day, there are 50,000 to 100,000 devices submitting diagnostics to the mid-tier app running in Microsoft Azure. The diagnostics for each device is a single large file that translates into 5,000 to 10,000 rows inserted into over 70 tables in the database. This translates to 750 million rows inserted into the database every day.
The efficient way to ingest sets of data into a SQL database is to use stored procedures with table-valued parameters (TVPs). The app uses a stored procedure with 70 TVPs to ingest the diagnostics into the SQL database, where the data can then be analyzed.
Here is an example of a traditional table type definition, plus a stored procedure that uses the table type:
CREATE TABLE Location
( LocationName VARCHAR(50) PRIMARY KEY
, CostRate INT );
GO
CREATE TYPE LocationTableType AS TABLE
( LocationName VARCHAR(50)
, CostRate INT );
GO
CREATE PROCEDURE dbo.usp_InsertProductionLocation
@TVP LocationTableType READONLY
AS
BEGIN
INSERT INTO Location (LocationName, CostRate)
SELECT LocationName, CostRate FROM @TVP
END
GO
When the stored procedure is called from the client, the table-valued parameter (TVP) is created at the client-side and passed into the server. Now, at the server-side, TVPs are stored in a system database called tempdb. Tempdb usage does count toward the DTU utilization of the database to some extent, with CPU and log IO utilization being major factors.
When ingesting all this diagnostic data, the database reached the log IO maximum associated with the pricing tier of the database – P11 in this case. The log IO percentage reached 100%, limiting the data that the database could ingest. To overcome this limit, the customer could have increased the pricing tier of the database, or scaled out the database. However, those options would increase the cost.
The customer found that a better option is to use Azure SQL Database In-Memory technology. In-Memory OLTP is a technology that became available in Azure SQL Database recently. As of the time of this writing in April 2016, In-Memory OLTP is in public preview for standalone premium databases. At a high level, In-Memory OLTP uses memory-optimized data structures and algorithms that allow you to improve performance of transactional workloads on Azure SQL DB, without increasing the pricing tier.
The customer chose to use memory-optimized TVPs. The memory-optimization reduces CPU utilization. It also completely eliminates the log IO, because the data exists only in active memory. Note that if the stored procedure inserts the TVP data into a user table, log IO still occurs for the user table.
Here is an example of a memory-optimized table type, and a stored procedure that uses the type for a TVP:
CREATE TABLE Location
( LocationName VARCHAR(50) PRIMARY KEY
, CostRate INT );
GO
CREATE TYPE LocationTableType AS TABLE
( LocationName VARCHAR(50) INDEX ix_LocationName
, CostRate INT )
WITH (MEMORY_OPTIMIZED=ON);
GO
CREATE PROCEDURE dbo.usp_InsertProductionLocation
@TVP LocationTableType READONLY
AS
BEGIN
INSERT INTO Location (LocationName, CostRate)
SELECT LocationName, CostRate FROM @TVP
END
GO
Notice that the only changes required are in the table type definition. No changes are required to the stored procedure or the client application.
For this application, the customer simply changed the table type definitions to memory-optimized. The result is a 30-40% improvement in the performance of the stored procedures used for data ingestion. This translates into being able to ingest 30-40% more data every day, and thus the ability to process diagnostics for that many more devices without increasing cost and without making changes to the application.
Next steps
In-Memory OLTP in Azure SQL Database is a great tool for your IoT data analytics. For more details about how you can use memory-optimization to improve performance of TVPs, table variables, as well as temp tables, see the following blog post:
Get started with In-Memory OLTP in Azure SQL Database:
https://azure.microsoft.com/en-us/documentation/articles/sql-database-in-memory/