Events
Mar 31, 11 PM - Apr 2, 11 PM
The biggest SQL, Fabric and Power BI learning event. March 31 – April 2. Use code FABINSIDER to save $400.
Register todayThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Applies to:
SQL Server
Azure SQL Database
Azure SQL Managed Instance
Important context information for understanding this article is available at:
When creating a hash index for a memory-optimized table, the number of buckets needs to be specified at create time. In most cases the bucket count would ideally be between 1 and 2 times the number of distinct values in the index key.
However, even if the BUCKET_COUNT is moderately below or above the preferred range, the performance of your hash index is likely to be tolerable or acceptable.
At minimum, consider giving your hash index a BUCKET_COUNT roughly equal to the number of rows you predict your memory-optimized table will grow to have.
Suppose your growing table has 2,000,000 rows, but the prediction is it'll grow 10 times to 20,000,000 rows. Start with a bucket count that is 10 times the number of rows in the table. This gives you room for an increased quantity of rows.
Suppose a hash index has 10,000,000 distinct key values.
If the hash indexed values have a high rate of duplicates, the hash buckets suffer longer chains.
Assume you have the same SupportEvent table from the earlier T-SQL syntax code block. The following T-SQL code demonstrates how you can find and display the ratio of all values to unique values:
-- Calculate ratio of: Rows / Unique_Values.
DECLARE @allValues float(8) = 0.0, @uniqueVals float(8) = 0.0;
SELECT @allValues = Count(*) FROM SupportEvent;
SELECT @uniqueVals = Count(*) FROM
(SELECT DISTINCT SupportEngineerName
FROM SupportEvent) as d;
-- If (All / Unique) >= 10.0, use a nonclustered index, not a hash.
SELECT Cast((@allValues / @uniqueVals) as float) as [All_divby_Unique];
go
This section discusses how to troubleshoot the bucket count for your hash index.
You can monitor the statistical health of your hash indexes by running the following T-SQL SELECT. The SELECT uses the data management view (DMV) named sys.dm_db_xtp_hash_index_stats.
SELECT
QUOTENAME(SCHEMA_NAME(t.schema_id)) + N'.' + QUOTENAME(OBJECT_NAME(h.object_id)) as [table],
i.name as [index],
h.total_bucket_count,
h.empty_bucket_count,
FLOOR((
CAST(h.empty_bucket_count as float) /
h.total_bucket_count) * 100)
as [empty_bucket_percent],
h.avg_chain_length,
h.max_chain_length
FROM
sys.dm_db_xtp_hash_index_stats as h
JOIN sys.indexes as i
ON h.object_id = i.object_id
AND h.index_id = i.index_id
JOIN sys.memory_optimized_tables_internal_attributes ia ON h.xtp_object_id=ia.xtp_object_id
JOIN sys.tables t on h.object_id=t.object_id
WHERE ia.type=1
ORDER BY [table], [index];
Compare the SELECT results to the following statistical guidelines:
The following T-SQL code block gives you an easy way to test a SELECT * FROM sys.dm_db_xtp_hash_index_stats;
. The code block completes in 1 minute. Here are the phases of the following code block:
DROP TABLE IF EXISTS SalesOrder_Mem;
go
CREATE TABLE SalesOrder_Mem
(
SalesOrderId uniqueidentifier NOT NULL DEFAULT newid(),
OrderSequence int NOT NULL,
OrderDate datetime2(3) NOT NULL,
StatusCode tinyint NOT NULL,
PRIMARY KEY NONCLUSTERED
HASH (SalesOrderId) WITH (BUCKET_COUNT = 262144),
INDEX ix_OrderSequence
HASH (OrderSequence) WITH (BUCKET_COUNT = 20000),
INDEX ix_StatusCode
HASH (StatusCode) WITH (BUCKET_COUNT = 8),
INDEX ix_OrderDate NONCLUSTERED (OrderDate DESC)
)
WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA)
go
--------------------
SET NOCOUNT ON;
-- Same as PK bucket_count. 68 seconds to complete.
DECLARE @i int = 262144;
BEGIN TRANSACTION;
WHILE @i > 0
BEGIN
INSERT SalesOrder_Mem
(OrderSequence, OrderDate, StatusCode)
Values
(@i, GetUtcDate(), @i % 8); -- Modulo technique.
SET @i -= 1;
END
COMMIT TRANSACTION;
PRINT 'Next, you should query: sys.dm_db_xtp_hash_index_stats .';
go
The preceding INSERT
loop does the following:
StatusCode
. Therefore there is a high rate of value duplication in index ix_StatusCode.For troubleshooting when the bucket count is not optimal, examine the following output of the SELECT from sys.dm_db_xtp_hash_index_stats. For these results we added WHERE Object_Name(h.object_id) = 'SalesOrder_Mem'
to the SELECT copied from section D.1.
Our SELECT
results are displayed after the code, artificially split into two narrower results tables for better display.
IndexName | total_bucket_count | empty_bucket_count | EmptyBucketPercent |
---|---|---|---|
ix_OrderSequence | 32768 | 13 | 0 |
ix_StatusCode | 8 | 4 | 50 |
PK_SalesOrd_B14003... | 262144 | 96525 | 36 |
IndexName | avg_chain_length | max_chain_length |
---|---|---|
ix_OrderSequence | 8 | 26 |
ix_StatusCode | 65536 | 65536 |
PK_SalesOrd_B14003... | 1 | 8 |
Let us interpret the preceding results tables for the three hash indexes:
ix_StatusCode:
ix_OrderSequence:
Primary key index (PK_SalesOrd_...):
OLTP workloads focus on individual rows. Full table scans are not usually in the performance critical path for OLTP workloads. Therefore, the trade-off you must balance is between quantity of memory utilization versus performance of equality tests and insert operations.
If memory utilization is the bigger concern:
If performance of equality tests is the bigger concern:
Hash Indexes for Memory-Optimized Tables
Nonclustered Indexes for Memory-Optimized Tables
Events
Mar 31, 11 PM - Apr 2, 11 PM
The biggest SQL, Fabric and Power BI learning event. March 31 – April 2. Use code FABINSIDER to save $400.
Register todayTraining