Implementing LOB Storage in Memory Optimized Tables
Introduction
Memory optimized tables do not have off-row or large object (LOB) storage, and the row size is limited to 8060 bytes. Thus, storing large binary or character string values can be done in one of two ways:
- Split the LOB values into multiple rows
- Store the LOB values in a regular non-memory optimized (file based) table
This document will provide a mechanism for implementing the split of LOB values into multiple rows, and outline several mechanisms for retrieving the data.
The alternative solution of using a file based table for overflow data will also be discussed, along with options relating to error checking, and durability.
Associated with this document is a set of scripts covering all the TSQL code presented in this document, along with some simple testing scripts.
Approach Overview
The approach to be taken will allow for the processing of LOB data such that when the data length is greater than a predefined upper limit, segment size, it is broken down into smaller chunks.
If the data fits into a single segment then the process will default to operating on just a primary table with a single row for the key identifier. In this instance reading the data means just returning the data solely from the primary table.
If the data is too large for a single segment then it will be broken down into multiple segments and each part saved as a row with an associated part number; linked to the primary element via the key identifier. In this instance, reading the data then becomes a determination of how many segments the data is stored in, and either concatenating them into a single field or return the individual parts through a SELECT statement.
The procedures to support this approach would be as follows:
- Insert a LOB column when the data length is less than a predefined segment size
- Insert a LOB column when the data length is greater than the predefined segment size and a (max) column definition is needed
- Provide Upsert operations that allow an entry to be made after an insert has previously been made; or to insert data if the current state is not known
- Provide a SELECT process that returns the collection of LOB segments; taking into consideration the fact the data may not be segmented
- Read a LOB column when the size is less than a predefined segment size
- Read a LOB column when the size is greater than the predefined segment size, returning the data in a (max) datatype definition
When implementing your own solution consideration should be given to the predefined segment size. The goal should be such that this size is as small as possible but big enough that a large percentage of the applications data can possibly fit into a single segment.
If most of your LOB data is less than 2K in size then this should be predefined segment size. Always using a maximum segment size of 8K could result in a lot of wasted memory, and consequently affect performance. For small transient data, such as session data and small encrypted data elements, adjusting this value can impact the overall performance gains one will get with memory optimized tables.
Of course, if you are saving large LOB data elements, such that most of your LOB data is larger than the maximum segment size of 8K, then this optimization will not be feasible. Thus it will be better to have a segment size of 8K to reduce the number of segments the data needs to be split into.
The other added advantage of having the data fit into a single segment is that the data handling Stored Procedures can all be natively compiled. In this case the application would need to know the size of the data it is processing, which may or may not be possible.
The primary approach outlined may not be suitable for storing very large quantities of LOB data which are accessed infrequently; say large document or images. In this instance a standard file based table approach may be more suitable. However for highly transient data such as session data, encrypted authentication codes, etc. using this approach can dramatically speed up access.
Implementation
For this implementation I have created a new database, called [MemBinaryStore]. In reality the object definitions below would be placed into your own application database:
CREATE DATABASE [MemBinaryStore]
CONTAINMENT = NONE
ON PRIMARY (NAME = [MemBinaryStore_Primary], FILENAME = 'C:\MSSQL\DATA\MemBinaryStore_Prim.mdf', SIZE = 10 MB, FILEGROWTH = 10 MB),
FILEGROUP [MemBinaryStore_Mod] CONTAINS MEMORY_OPTIMIZED_DATA (NAME = [MemBinaryStore_Mod], FILENAME = 'C:\MSSQL\DATA\MemBinaryStore.dir'),
FILEGROUP [MemBinaryStore_Data] (NAME = [MemBinaryStore_Data1], FILENAME = 'C:\MSSQL\DATA\MemBinaryStore_Data1.ndf', SIZE = 10 MB, FILEGROWTH = 10 MB),
(NAME = [MemBinaryStore_Data2], FILENAME = 'C:\MSSQL\DATA\MemBinaryStore_Data2.ndf', SIZE = 10 MB, FILEGROWTH = 10 MB)
LOG ON (NAME = [MemBinaryStore_Log1], FILENAME = 'C:\MSSQL\DATA\MemBinaryStore_Log1.ldf', SIZE = 5 MB, FILEGROWTH = 5 MB)
COLLATE Latin1_General_100_CI_AS;
GO
ALTER DATABASE [MemBinaryStore]
MODIFY FILEGROUP [MemBinaryStore_Data] DEFAULT;
GO
One must remember Memory Optimized tables have a restriction on the code page for (var)char columns. These columns must use code page 1252, hence the reason the collation has been specified.
Table Objects
This implementation uses two base tables. A Primary table that holds the base binary information and also all the necessary data when the LOB data can fit into a single segment. An Extended table that holds any necessary LOB data segments; multiple records existing for each primary entry when the data is split into segments.
CREATE TABLE [dbo].[BinaryItem]
(
[BinaryItemId] binary(16) NOT NULL
CONSTRAINT [PK_BinaryItem] PRIMARY KEY NONCLUSTERED HASH
WITH (BUCKET_COUNT = 1000000),
[ItemCreated] datetime NOT NULL,
[BinaryData] varbinary(6000) NULL,
[BinaryExParts] int NOT NULL
)
WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA);
GO
CREATE TABLE [dbo].[BinaryItemEx]
(
[BinaryItemId] binary(16) NOT NULL
INDEX IDX_BINARYITEMEX_ID HASH
WITH (BUCKET_COUNT = 1000000),
[BinaryPart] int NOT NULL,
[BinaryDataEx] varbinary(6000) NOT NULL,
CONSTRAINT [PK_BinaryItemEx] PRIMARY KEY NONCLUSTERED HASH
([BinaryItemId], [BinaryPart])
WITH (BUCKET_COUNT = 20000000)
)
WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA);
GO
When defining your tables care should be taken in defining the correct bucket count. As a very rough guideline, the bucket count should be set to 1-2X the maximum expected cardinality of the table.
One of the differences in defining the [BinaryItemEx] table is the use of the IDX_BINARYITEMEX_ID index. If this were a file based table the Index would not be needed as it would be covered by the Primary Key index. However, with hash based indexes this is not the case, as outlined in this blog post "Troubleshooting Common Performance Problems with Memory-Optimized Hash Indexes" (https://blogs.technet.com/b/dataplatforminsider/archive/2013/10/09/troubleshooting-common-performance-problems-with-memory-optimized-hash-indexes.aspx).
In this instance I have chosen a key type of binary(16), rather than a varchar() or uniqueidentifier datatype. This is mostly to avoid specifying a binary collation that is needed for the primary key and index specifications; such as "nvarchar(34) collate Latin1_General_100_BIN2".
This collation requirement is necessary because indexes on (n)(var)char columns can only be specified with BIN2 collations.
Write Procedures
The process for inserting or updating LOB items is dependent on the length of the LOB data. Calling application should determine what procedure to call based on the size of the data being passed in.
If the data is less than the selected segment size then it is just a simple INSERT operation, which can be performed using a natively compiled Stored Procedure:
CREATE PROCEDURE [dbo].[InsertBinaryItem]
(
@BinaryItemId uniqueidentifier,
@BinaryData varbinary(6000)
)
WITH NATIVE_COMPILATION, SCHEMABINDING, EXECUTE AS OWNER
AS
BEGIN ATOMIC WITH
(
TRANSACTION ISOLATION LEVEL = SNAPSHOT,
LANGUAGE = N'English'
)
DECLARE @now datetime = GETUTCDATE();
DECLARE @binaryId binary(16) = CAST(@BinaryItemId AS binary(16));
INSERT INTO [dbo].[BinaryItem]
([BinaryItemId], [ItemCreated], [BinaryData], [BinaryExParts])
VALUES(@binaryId, @now, @BinaryData, 0);
END
You can of course define the primary key to be whatever datatype is applicable for your application.
In the instance that the data length is larger than the defined segment size you have to pass in the data using a (max) datatype definition. In this case the procedure will have to chunk the data down and INSERT it into the extensions table:
CREATE PROCEDURE [dbo].[InsertBinaryItemEx]
(
@BinaryItemId uniqueidentifier,
@BinaryData varbinary(max)
)
AS
BEGIN
DECLARE @now datetime = GETUTCDATE();
DECLARE @binaryId binary(16) = CAST(@BinaryItemId AS binary(16));
DECLARE @maxSegmentLength int = 6000;
DECLARE @binaryDataLength int = DATALENGTH(@BinaryData);
DECLARE @maxSegment int = CEILING(@binaryDataLength / 1.0 / @maxSegmentLength);
DECLARE @segmentValue varbinary(6000);
-- Insert the header primary
INSERT INTO [dbo].[BinaryItem] WITH (SNAPSHOT)
([BinaryItemId], [ItemCreated], [BinaryData], [BinaryExParts])
VALUES(@binaryId, @now, NULL, @maxSegment);
-- Now insert the chunked binary data
DECLARE @part int = 1;
WHILE (@part <= @maxSegment)
BEGIN
SET @segmentValue = SUBSTRING(@BinaryData, ((@part -1) * @maxSegmentLength + 1), @maxSegmentLength);
INSERT [dbo].[BinaryItemEx] WITH (SNAPSHOT)
([BinaryItemId], [BinaryPart], [BinaryDataEx])
VALUES (@binaryId, @part, @segmentValue);
SET @part += 1;
END
END
In this case the binary header element is still written but the binary block is NULL. Also the part count, [BinaryExParts], is defined as the number of segments the data has been split into.
Breaking down the LOB column is merely a simple case of using the SUBSTRING, which can be used on binary data, to define each segment.
For the INSERT/UPDATE, upsert operations, the process follows the same pattern. In the case of the data being less than the selected segment size an UPDATE operation is first performed. If no entry is found then an INSERT operation is performed:
CREATE PROCEDURE [dbo].[UpsertBinaryItem]
(
@BinaryItemId uniqueidentifier,
@BinaryData varbinary(6000)
)
WITH NATIVE_COMPILATION, SCHEMABINDING, EXECUTE AS OWNER
AS
BEGIN ATOMIC WITH
(
TRANSACTION ISOLATION LEVEL = SNAPSHOT,
LANGUAGE = N'English'
)
DECLARE @now datetime = GETUTCDATE();
DECLARE @binaryId binary(16) = CAST(@BinaryItemId AS binary(16));
DECLARE @itemIdFound binary(16) = NULL;
-- Delete any extended data
DELETE FROM [dbo].[BinaryItemEx]
WHERE [BinaryItemId] = @binaryId;
-- First try an update as the primary may already exist
UPDATE [dbo].[BinaryItem]
SET @itemIdFound = [BinaryItemId],
[BinaryData] = @BinaryData,
[BinaryExParts] = 0
WHERE [BinaryItemId] = @binaryId;
-- If no primary found the perform the insert
IF (@itemIdFound IS NULL)
BEGIN
INSERT INTO [dbo].[BinaryItem]
([BinaryItemId], [ItemCreated], [BinaryData], [BinaryExParts])
VALUES(@binaryId, @now, @BinaryData, 0);
END
END
In this case one has to remember to ensure that any spurious extended binary segments are first deleted before the new values are upserted.
For the case of the data being larger than the selected segment size the process is very similar, except that the data is chunked as in the case of an insert operation:
CREATE PROCEDURE [dbo].[UpsertBinaryItemEx]
(
@BinaryItemId uniqueidentifier,
@BinaryData varbinary(max)
)
AS
BEGIN
DECLARE @now datetime = GETUTCDATE();
DECLARE @binaryId binary(16) = CAST(@BinaryItemId AS binary(16));
DECLARE @itemIdFound binary(16) = NULL;
DECLARE @maxSegmentLength int = 6000;
DECLARE @binaryDataLength int = DATALENGTH(@BinaryData);
DECLARE @maxSegment int = CEILING(@binaryDataLength / 1.0 / @maxSegmentLength);
DECLARE @segmentValue varbinary(6000);
IF (@binaryDataLength <= 6000)
BEGIN
SET @segmentValue = SUBSTRING(@BinaryData, 1, @binaryDataLength);
EXEC [dbo].[UpsertBinaryItem]@BinaryItemId, @segmentValue
END
ELSE
BEGIN
-- Delete any existing binary segments
DELETE FROM [dbo].[BinaryItemEx] WITH (SNAPSHOT)
WHERE [BinaryItemId] = @binaryId;
-- Update the header
UPDATE [dbo].[BinaryItem] WITH (SNAPSHOT)
SET @itemIdFound = [BinaryItemId],
[BinaryData] = NULL,
[BinaryExParts] = @maxSegment
WHERE [BinaryItemId] = @binaryId;
-- If no header found then create one
IF (@itemIdFound IS NULL)
BEGIN
INSERT INTO [dbo].[BinaryItem] WITH (SNAPSHOT)
([BinaryItemId], [ItemCreated], [BinaryData], [BinaryExParts])
VALUES(@binaryId, @now, NULL, @maxSegment);
END
-- Insert the binary segments
DECLARE @part int = 1;
WHILE (@part <= @maxSegment)
BEGIN
SET @segmentValue = SUBSTRING(@BinaryData, ((@part -1) * @maxSegmentLength + 1), @maxSegmentLength);
INSERT [dbo].[BinaryItemEx] WITH (SNAPSHOT)
([BinaryItemId], [BinaryPart], [BinaryDataEx])
VALUES (@binaryId, @part, @segmentValue);
SET @part += 1;
END
END
END
As in the case of the insert operations, it is only for when the data is less than the selected segment size that the Stored Procedures can be natively compiled. As such, ensuring that a large portion of the data fits into this selected segment size will give the best performance gains.
Access Procedures
The most efficient way to access the binary data is just to return binary segments, through a SELECT statement, and have the application layer combine them back into a single element. The process to do this has to look at the primary element for the specified number of elements and return the corresponding data:
CREATE PROCEDURE [dbo].[GetBinaryDataParts]
(
@BinaryItemId uniqueidentifier
)
WITH NATIVE_COMPILATION, SCHEMABINDING, EXECUTE AS OWNER
AS
BEGIN ATOMIC WITH
(
TRANSACTION ISOLATION LEVEL = SNAPSHOT,
LANGUAGE = N'English'
)
DECLARE @now datetime = GETUTCDATE();
DECLARE @binaryId binary(16) = CAST(@BinaryItemId AS binary(16));
DECLARE @binaryData varbinary(6000);
DECLARE @binaryParts int = 0;
-- Get the binary data and see if parts exists
SELECT
@binaryData = [BinaryData],
@binaryParts = [BinaryExParts]
FROM [dbo].[BinaryItem]
WHERE [BinaryItemId] = @binaryId;
-- If parts exists then return these otherwise just return the single element
IF (@binaryData IS NULL AND @binaryParts > 0)
BEGIN
SELECT CAST([BinaryItemId] AS uniqueidentifier) AS [BinaryItemId], [BinaryPart], [BinaryDataEx] AS [BinaryData]
FROM [dbo].[BinaryItemEx]
WHERE [BinaryItemId] = @binaryId
ORDER BY [BinaryItemId], [BinaryPart];
END
ELSE
BEGIN
SELECT @BinaryItemId AS [BinaryItemId], 0 AS [BinaryPart], @binaryData AS [BinaryData];
END
END
The beauty of this approach is that the called Stored Procedure can be natively compiled.
Also, the application calling this Stored Procedure then has the option of better handling the necessary memory allocations for concatenating the LOB segments into a single element.
If a SELECT process is not optimal, as in the case that the application requires the data to be concatenated, one can use a Stored Procedure with an OUTPUT field.
In the advent that the application knows that data size is less than the specified segment size there is the option of directly calling a natively compiled Stored Procedure:
CREATE PROCEDURE [dbo].[GetBinaryData]
(
@BinaryItemId uniqueidentifier,
@BinaryData varbinary(6000) OUTPUT
)
WITH NATIVE_COMPILATION, SCHEMABINDING, EXECUTE AS OWNER
AS
BEGIN ATOMIC WITH
(
TRANSACTION ISOLATION LEVEL = SNAPSHOT,
LANGUAGE = N'English'
)
DECLARE @now datetime = GETUTCDATE();
DECLARE @binaryId binary(16) = CAST(@BinaryItemId AS binary(16));
-- Get the binary data for output
SELECT
@BinaryData = [BinaryData]
FROM [dbo].[BinaryItem]
WHERE [BinaryItemId] = @binaryId;
END
In the case that the data size is unknown, or it is larger than the specified segment size, the Stored Procedure has to perform the concatenation into a max datatype, and thus cannot be natively compiled:
CREATE PROCEDURE [dbo].[GetBinaryDataEx]
(
@BinaryItemId uniqueidentifier,
@BinaryData varbinary(max) OUTPUT
)
AS
BEGIN
DECLARE @now datetime = GETUTCDATE();
DECLARE @binaryId binary(16) = CAST(@BinaryItemId AS binary(16));
DECLARE @maxSegmentLength int = 6000;
DECLARE @blobSegment varbinary(6000);
DECLARE @binaryParts int;
DECLARE @count int = 0;
SET @BinaryData = NULL;
-- Get the binary primary and see if parts exists
SELECT
@BinaryData = [BinaryData],
@binaryParts = [BinaryExParts]
FROM [dbo].[BinaryItem]
WHERE [BinaryItemId] = @binaryId;
-- If parts exist then build the binary element
IF (@BinaryData IS NULL AND @binaryParts > 0)
BEGIN
WHILE (@count <= @binaryParts)
BEGIN
SELECT @blobSegment = [BinaryDataEx]
FROM [dbo].[BinaryItemEx] WITH (SNAPSHOT)
WHERE [BinaryItemId] = @binaryId AND [BinaryPart] = @count;
IF (@BinaryData IS NULL)
SET @BinaryData = CAST(@blobSegment AS varbinary(max));
ELSE
SET @BinaryData = @BinaryData + CAST(@blobSegment AS varbinary(max));
SET @count += 1;
END
END
END
In this case the LOB concatenation is performing simply using the plus operator. Of course this concatenation is only necessary if extended binary segments are found.
In this scenario one does have the option of providing internal natively compiled procedures to perform the UPDATE and SELECT operations, which will provide some further performance improvements.
From the performance perspective it is probably better to take the original SELECT approach and have the application layer handle the necessary data concatenation; as it can better handle the memory allocations necessary to perform this operation.
Alternative Considerations
Archiving Considerations
Whenever persisting data it is always important to consider how the data is going to be archived. In this case one could take the approach of delete data elements that are older than a fixed period of time; say 24 hours.
This approach would be rather blunt. An alternative would be to update the primary element with a last accessed time. This would allow for elements to be removed based on a defined life expectancy. In this scenario the [GetBinaryDataParts] procedure would be as follows:
CREATE PROCEDURE [dbo].[GetBinaryDataParts]
(
@BinaryItemId uniqueidentifier
)
WITH NATIVE_COMPILATION, SCHEMABINDING, EXECUTE AS OWNER
AS
BEGIN ATOMIC WITH
(
TRANSACTION ISOLATION LEVEL = SNAPSHOT,
LANGUAGE = N'English'
)
DECLARE @now datetime = GETUTCDATE();
DECLARE @binaryId binary(16) = CAST(@BinaryItemId AS binary(16));
DECLARE @binaryData varbinary(6000);
DECLARE @binaryParts int = 0;
-- Get the binary data and see if parts exists
UPDATE [dbo].[BinaryItem]
SET [ItemAccessed] = @now,
@binaryData = [BinaryData],
@binaryParts = [BinaryExParts]
WHERE [BinaryItemId] = @binaryId;
-- If parts exists then return these otherwise just return the single element
IF (@binaryData IS NULL AND @binaryParts > 0)
BEGIN
SELECT CAST([BinaryItemId] AS uniqueidentifier) AS [BinaryItemId], [BinaryPart], [BinaryDataEx] AS [BinaryData]
FROM [dbo].[BinaryItemEx]
WHERE [BinaryItemId] = @binaryId
ORDER BY [BinaryItemId], [BinaryPart];
END
ELSE
BEGIN
SELECT @BinaryItemId AS [BinaryItemId], 0 AS [BinaryPart], @binaryData AS [BinaryData];
END
END
The SELECT operation has been replaced with an UPDATE that modifies a [LastAccessed] time.
Similarly the non-natively compiled procedures that perform the segment concatenation would be as follows:
CREATE PROCEDURE [dbo].[GetBinaryDataEx]
(
@BinaryItemId uniqueidentifier,
@BinaryData varbinary(max) OUTPUT
)
AS
BEGIN
DECLARE @now datetime = GETUTCDATE();
DECLARE @binaryId binary(16) = CAST(@BinaryItemId AS binary(16));
DECLARE @maxSegmentLength int = 6000;
DECLARE @blobSegment varbinary(6000);
DECLARE @binaryParts int;
DECLARE @count int = 0;
SET @BinaryData = NULL;
-- Get the binary primary and see if parts exists
UPDATE [dbo].[BinaryItem] WITH (SNAPSHOT)
SET [ItemAccessed] = @now,
@BinaryData = [BinaryData],
@binaryParts = [BinaryExParts]
WHERE [BinaryItemId] = @binaryId;
-- If parts exist then build the binary element
IF (@BinaryData IS NULL AND @binaryParts > 0)
BEGIN
WHILE (@count <= @binaryParts)
BEGIN
SELECT @blobSegment = [BinaryDataEx]
FROM [dbo].[BinaryItemEx] WITH (SNAPSHOT)
WHERE [BinaryItemId] = @binaryId AND [BinaryPart] = @count;
IF (@BinaryData IS NULL)
SET @BinaryData = CAST(@blobSegment AS varbinary(max));
ELSE
SET @BinaryData = @BinaryData + CAST(@blobSegment AS varbinary(max));
SET @count += 1;
END
END
END
Of course the Insert and Upsert operations would also need to appropriately set the last accessed time.
To archive the data the usual approach with file based tables is to perform the deletion in chunks. Namely, the first set of X records are identified and deleted within a transaction. This process is then repeated within a loop until no more records to delete are found:
CREATE TYPE [dbo].[BinaryKeys]
AS TABLE
(
[BinaryItemId] binary(16) NOT NULL PRIMARY KEY NONCLUSTERED
HASH WITH (BUCKET_COUNT=100000)
) WITH (MEMORY_OPTIMIZED=ON);
GO
CREATE PROCEDURE [dbo].[PurgeArchivedEntriesChunk]
(
@ArchiveHours int = 24
)
AS
BEGIN
BEGIN TRY
SET NOCOUNT ON;
-- Define the Archive window
DECLARE @archiveTime datetime = DATEADD(HOUR, -@ArchiveHours, GETUTCDATE());
-- Define a table to hold the items to purge within each loop
DECLARE @entryPurge [dbo].[BinaryKeys];
-- Define the binary reference from which to start the purging
DECLARE @minReference binary(16);
SELECT @minReference = (
SELECT TOP(1) [BinaryItemId]
FROM [dbo].[BinaryItem]
WHERE [ItemCreated] < @archiveTime
ORDER BY [BinaryItemId] ASC
);
DECLARE @continue int;
DECLARE @purgeCount int;
SET @continue = 1;
SET @purgeCount = 1000;
WHILE (@continue = 1)
BEGIN
-- define the entries to be purged in this cycle
DELETE FROM @entryPurge;
INSERT INTO @entryPurge
SELECT TOP(@purgeCount) [BinaryItemId]
FROM [dbo].[BinaryItem]
WHERE [ItemCreated] < @archiveTime
AND [BinaryItemId] >= @minReference
ORDER BY [BinaryItemId] ASC;
-- If subset selected make the last loop
IF (@@ROWCOUNT < @purgeCount)
BEGIN
SET @continue = 0;
END
-- Run in a transaction to ensure consistency
BEGIN TRANSACTION;
-- Delete the BinaryItem and Ex tables
DELETE FROM [dbo].[BinaryItemEx]
FROM @entryPurge purge
INNER JOIN [dbo].[BinaryItemEx] items WITH (SNAPSHOT)
ON items.[BinaryItemId] = purge.[BinaryItemId];
DELETE FROM [dbo].[BinaryItem]
FROM @entryPurge purge
INNER JOIN [dbo].[BinaryItem] items WITH (SNAPSHOT)
ON items.[BinaryItemId] = purge.[BinaryItemId];
COMMIT TRANSACTION;
-- Define the next lowest PDR to be processed
SELECT @minReference = (
SELECT TOP(1) [BinaryItemId] FROM @entryPurge
ORDER BY [BinaryItemId] DESC
);
END
END TRY
BEGIN CATCH
IF XACT_STATE() <> 0 ROLLBACK TRANSACTION;
THROW;
END CATCH;
END
In this instance one does have the option of using an In Memory Table Type for saving the necessary deletion keys.
When operation on file based tables one would usually ensure rowlocks are issued and the deadlock priority is set to low. All this is designed to ensure that only a small section of the records within the table are locked to ensure that the table can still handle a high transaction rate; without the purging inadvertently blocking critical active transactions.
However in the case of In Memory table, as one does not have to worry about locking, one again has the option of taking the simple deletion approach:
CREATE PROCEDURE [dbo].[PurgeArchivedEntries]
(
@ArchiveHours int = 24
)
AS
BEGIN
BEGIN TRY
SET NOCOUNT ON;
BEGIN TRANSACTION;
DECLARE @archiveTime datetime = DATEADD(HOUR, -@ArchiveHours, GETUTCDATE());
DELETE [dbo].[BinaryItemEx]
FROM [dbo].[BinaryItemEx] itemex WITH (SNAPSHOT)
INNER JOIN [dbo].[BinaryItem] item WITH (SNAPSHOT)
ON itemex.[BinaryItemId] = item.[BinaryItemId]
WHERE [ItemCreated] < @archiveTime;
DELETE [dbo].[BinaryItem] WITH (SNAPSHOT)
WHERE [ItemCreated] < @archiveTime;
COMMIT TRANSACTION;
END TRY
BEGIN CATCH
IF XACT_STATE() <> 0 ROLLBACK TRANSACTION;
THROW;
END CATCH;
END
Unfortunately one cannot yet natively compile this procedure as using the FROM clause in a DELETE statement is not supported.
Error Checking and Retries
The Stored Procedures outlined above do not take into consideration error checking and retry logic. The approach to either can be that it is the responsibility of the calling application or of the actual Stored Procedures.
The associated scripts included with this document include a version of the Stored Procedures that perform both error checking and retry logic. As an example here is the [UpsertBinaryItem] Stored Procedure:
CREATE PROCEDURE [dbo].[UpsertBinaryItem]
(
@BinaryItemId uniqueidentifier,
@BinaryData varbinary(6000)
)
WITH NATIVE_COMPILATION, SCHEMABINDING, EXECUTE AS OWNER
AS
BEGIN ATOMIC WITH
(
TRANSACTION ISOLATION LEVEL = SNAPSHOT,
LANGUAGE = N'English'
)
DECLARE @retry int = 3;
DECLARE @now datetime = GETUTCDATE();
DECLARE @binaryId binary(16) = CAST(@BinaryItemId AS binary(16));
DECLARE @itemIdFound binary(16) = NULL;
WHILE (@retry > 0)
BEGIN
BEGIN TRY
-- Delete any extended data
DELETE FROM [dbo].[BinaryItemEx]
WHERE [BinaryItemId] = @binaryId;
-- First try an update as the primary may already exist
UPDATE [dbo].[BinaryItem]
SET @itemIdFound = [BinaryItemId],
[ItemAccessed] = @now,
[BinaryData] = @BinaryData,
[BinaryExParts] = 0
WHERE [BinaryItemId] = @binaryId;
-- If no primary found the perform the insert
IF (@itemIdFound IS NULL)
BEGIN
INSERT INTO [dbo].[BinaryItem] ([BinaryItemId], [ItemCreated], [ItemAccessed], [BinaryData], [BinaryExParts])
VALUES(@binaryId, @now, @now, @BinaryData, 0);
END
SET @retry = 0;
END TRY
BEGIN CATCH
SET @retry -= 1;
IF (@retry <= 0 OR (error_number() <> 41302 AND error_number() <> 41305 AND error_number() <> 41325 AND error_number() <> 41301))
BEGIN
-- insert custom error handling for other error conditions here
;THROW
END
END CATCH
END
END
This code covers the common errors associated with using an optimistic concurrency control mechanism, as used by the In Memory OLTP engine.
In this case I have set the retry count to be 3, but this can easily be configured to suit your application and environment requirements.
Durability
In the samples above I have defined the tables to have a durability option of SCHEMA_AND_DATA. In the case of transient session data one could again improve performance using just the SCHEMA_ONLY option.
File Based Extension
In addition to taking the approach outlined above there is the option that if the data size is larger than the specified segment size a normal file based table can be used for saving the LOB data. In this case the table definitions would be more like the following:
CREATE TABLE [dbo].[BinaryItem]
(
[BinaryItemId] binary(16) NOT NULL
CONSTRAINT [PK_BinaryItem] PRIMARY KEY NONCLUSTERED HASH
WITH (BUCKET_COUNT = 1000000),
[ItemCreated] datetime NOT NULL,
[BinaryData] varbinary(6000) NULL,
[IsExtended] int NOT NULL
)
WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA);
GO
CREATE TABLE [dbo].[BinaryItemEx]
(
[BinaryItemId] binary(16) NOT NULL,
[TotalLength] int NOT NULL,
[BinaryDataEx] varbinary(max) NOT NULL,
CONSTRAINT [PK_BinaryItemEx] PRIMARY KEY CLUSTERED
([BinaryItemId])
)
ON [DEFAULT] TEXTIMAGE_ON [MemBinaryStore_Binary];
In this case the number of segments is not required, but rather just an indicator of whether extended data exists.
One could even take the approach of defining the database such that the LOB data is saved into a dedicated performant filegroup:
CREATE DATABASE [MemBinaryStore]
CONTAINMENT = NONE
ON PRIMARY (NAME = [MemBinaryStore_Primary], FILENAME = 'C:\MSSQL\DATA\MemBinaryStore_Prim.mdf', SIZE = 10 MB, FILEGROWTH = 10 MB),
FILEGROUP [MemBinaryStore_Mod] CONTAINS MEMORY_OPTIMIZED_DATA (NAME = [MemBinaryStore_Mod], FILENAME = 'C:\MSSQL\DATA\MemBinaryStore.dir'),
FILEGROUP [MemBinaryStore_Data] (NAME = [MemBinaryStore_Data1], FILENAME = 'C:\MSSQL\DATA\MemBinaryStore_Data1.ndf', SIZE = 10 MB, FILEGROWTH = 10 MB),
(NAME = [MemBinaryStore_Data2], FILENAME = 'C:\MSSQL\DATA\MemBinaryStore_Data2.ndf', SIZE = 10 MB, FILEGROWTH = 10 MB),
FILEGROUP [MemBinaryStore_Binary] (NAME = [MemBinaryStore_Binary1], FILENAME = 'C:\MSSQL\DATA\MemBinaryStore_Binary1.ndf', SIZE = 10 MB, FILEGROWTH = 10 MB),
(NAME = [MemBinaryStore_Binary2], FILENAME = 'C:\MSSQL\DATA\MemBinaryStore_Binary2.ndf', SIZE = 10 MB, FILEGROWTH = 10 MB)
LOG ON (NAME = [MemBinaryStore_Log1], FILENAME = 'C:\MSSQL\DATA\MemBinaryStore_Log1.ldf', SIZE = 5 MB, FILEGROWTH = 5 MB)
COLLATE Latin1_General_100_CI_AS;
In this case the extended processing Stored Procedures would just have to be modified to insert a single row of LOB data and return this single row.
Conclusion
Although memory optimized tables do not have off-row or large object (LOB) storage, and the row size is limited to 8060 bytes, this does not mean it they cannot be used for LOB storage. This can be achieved easily in one of two ways:
- Split the LOB values into multiple rows
- Store the LOB values in a regular non-memory optimized (file based) table
Splitting the LOB data into multiple rows affords one the option of obtaining the possible performance gains of switching to memory optimized tables.
If you are in the scenario where a high percent of the LOB data fits into 1 or few segments the approach outlined above will provide all the possible advantages of memory optimized tables whilst providing a safety value for possible large data.
If in this scenario the application can also be modified to stitch back together the LOB segments then the process can also rely mostly on natively compiled procedures; again providing a performance boost.
The outlined may not be suitable for storing very large quantities of LOB data which are accessed infrequently; say large document or images. In this instance a standard file based table approach may be more suitable.
Written by Carl Nolan