Peristiwa
31 Mac, 11 PTG - 2 Apr, 11 PTG
Acara pembelajaran SQL, Fabric dan Power BI terbesar. 31 Mac - 2 April. Gunakan kod FABINSIDER untuk menjimatkan $400.
Daftar hari iniPelayar ini tidak lagi disokong.
Naik taraf kepada Microsoft Edge untuk memanfaatkan ciri, kemas kini keselamatan dan sokongan teknikal yang terkini.
Applies to:
SQL Server
Azure SQL Database
Azure SQL Managed Instance
Azure Synapse Analytics
Analytics Platform System (PDW)
SQL database in Microsoft Fabric
High-level recommendations for designing columnstore indexes. A few good design decisions help you achieve the high data compression and query performance that columnstore indexes are designed to provide.
This article assumes you are familiar with columnstore architecture and terminology. For more information, see Columnstore indexes: Overview and Columnstore Index Architecture.
Before designing a columnstore index, understand as much as possible about your data requirements. For example, think through the answers to these questions:
You might not need a columnstore index. Rowstore (or B-tree) tables with heaps or clustered indexes perform best on queries that seek into the data, searching for a particular value, or for queries on a small range of values. Use rowstore indexes with transactional workloads since they tend to require mostly table seeks instead of large range table scans.
A columnstore index is either clustered or nonclustered. A clustered columnstore index can have one or more nonclustered B-tree indexes. Columnstore indexes are easy to try. If you create a table as a columnstore index, you can easily convert the table back to a rowstore table by dropping the columnstore index.
Here is a summary of the options and recommendations.
Columnstore option | Recommendations for when to use | Compression |
---|---|---|
Clustered columnstore index | Use for: 1) Traditional data warehouse workload with a star or snowflake schema 2) Internet of Things (IOT) workloads that insert large volumes of data with minimal updates and deletes. |
Average of 10x |
Ordered columnstore index | Use when a clustered columnstore index is queried via a single ordered predicate column or column set. This guidance is similar to choosing the key columns for a rowstore clustered index, though the compressed underlying rowgroups behave differently. For more information, see CREATE COLUMNSTORE INDEX and Performance tuning with ordered columnstore indexes. | Average of 10x |
Nonclustered B-tree indexes on a clustered columnstore index | Use to: 1. Enforce primary key and foreign key constraints on a clustered columnstore index. 2. Speed up queries that search for specific values or small ranges of values. 3. Speed up updates and deletes of specific rows. |
10x on average plus some additional storage for the NCIs. |
Nonclustered columnstore index on a disk-based heap or B-tree index | Use for: 1) An OLTP workload that has some analytics queries. You can drop B-tree indexes created for analytics and replace them with one nonclustered columnstore index. 2) Many traditional OLTP workloads that perform Extract Transform and Load (ETL) operations to move data to a separate data warehouse. You can eliminate ETL and a separate data warehouse by creating a nonclustered columnstore index on some of the OLTP tables. |
NCCI is an additional index that requires 10% more storage on average. |
Columnstore index on an in-memory table | Same recommendations as nonclustered columnstore index on a disk-based table, except the base table is an in-memory table. | Columnstore index is an additional index. |
The clustered columnstore index is more than an index, it is the primary table storage. It achieves high data compression and a significant improvement in query performance on large data warehousing fact and dimension tables. Clustered columnstore indexes are best suited for analytics queries rather than transactional queries, since analytics queries tend to perform operations on large ranges of values rather than looking up specific values.
Consider using a clustered columnstore index when:
Don't use a clustered columnstore index when:
For more information, see Columnstore indexes in data warehousing.
For ordered columnstore index availability, see Columnstore indexes: Overview.
Consider using an ordered columnstore index in the following scenarios:
An ordered columnstore index might not be as effective in these scenarios:
ALTER INDEX REORGANIZE
.Beginning with SQL Server 2016 (13.x), you can create nonclustered B-tree or rowstore indexes as secondary indexes on a clustered columnstore index. The nonclustered B-tree index is updated as changes occur to the columnstore index. This is a powerful feature that you can use to your advantage.
By using the secondary B-tree index, you can efficiently search for specific rows without scanning through all the rows. Other options become available too. For example, you can enforce a primary or foreign key constraint by using a UNIQUE constraint on the B-tree index. Since a non-unique value fails to insert into the B-tree index, SQL Server cannot insert the value into the columnstore.
Consider using a B-tree index on a columnstore index to:
Beginning with SQL Server 2016 (13.x), you can have a nonclustered columnstore index on a rowstore disk-based table or an in-memory OLTP table. This makes it possible to run the analytics in real-time on a transactional table. While transactions are occurring on the underlying table, you can run analytics on the columnstore index. Since one table manages both indexes, changes are available in real-time to both the rowstore and the columnstore indexes.
Since a columnstore index achieves 10x better data compression than a rowstore index, it only needs a small amount of extra storage. For example, if the compressed rowstore table takes 20 GB, the columnstore index might require an additional 2 GB. The additional space required also depends on the number of columns in the nonclustered columnstore index.
Consider using a nonclustered columnstore index to:
Run analytics in real-time on a transactional rowstore table. You can replace existing B-tree indexes that are designed for analytics with a nonclustered columnstore index.
Eliminate the need for a separate data warehouse. Traditionally, companies run transactions on a rowstore table and then load the data into a separate data warehouse to run analytics. For many workloads, you can eliminate the loading process and the separate data warehouse by creating a nonclustered columnstore index on transactional tables.
SQL Server 2016 (13.x) offers several strategies to make this scenario performant. It's easy to try it since you can enable a nonclustered columnstore index with no changes to your OLTP application.
To add additional processing resources, you can run the analytics on a readable secondary. Using a readable secondary separates the processing of the transactional workload and the analytics workload.
For more information, see Get started with Columnstore for real-time operational analytics
For more information on choosing the best columnstore index, see Sunil Agarwal's blog Which columnstore index is right for my workload?.
Columnstore indexes support partitioning, which is a good way to manage and archive data. Partitioning also improves query performance by limiting operations to one or more partitions.
For large tables, the only practical way to manage ranges of data is by using partitions. The advantages of partitions for rowstore tables also apply to columnstore indexes.
For example, both rowstore and columnstore tables use partitions to:
Additionally, with a columnstore index, you use partitioning to:
COLUMNSTORE_ARCHIVE
compression options. Query performance might be slower, which could be acceptable if the partition is queried infrequently.By using partitions, you can limit your queries to scan only specific partitions, which limits the number of rows to scan. For example, if the index is partitioned by year and the query is analyzing data from last year, it only needs to scan the data in one partition.
Unless you have a large enough data size, a columnstore index performs best with fewer partitions than what you might use for a rowstore index. If you don't have at least one million rows per partition, most of your rows might go to the deltastore where they don't receive the performance benefit of columnstore compression. For example, if you load one million rows into a table with 10 partitions and each partition receives 100,000 rows, all of the rows go to the delta rowgroups.
Example:
For more information about partitioning, see Sunil Agarwal's blog post, Should I partition my columnstore index?.
The columnstore index offers two choices for data compression: columnstore compression and archive compression. You can choose the compression option when you create the index, or change it later with ALTER INDEX ... REBUILD.
Columnstore compression typically achieves 10x better compression rates over rowstore indexes. It is the standard compression method for columnstore indexes and enables fast query performance.
Archive compression is designed for maximum compression when query performance is not as important. It achieves higher data compression rates than columnstore compression, but it comes with a price. It takes longer to compress and decompress the data, so it is not well-suited for fast query performance.
If your data is already in a rowstore table, you can use CREATE COLUMNSTORE INDEX to convert the table to a clustered columnstore index. There are a couple optimizations that will improve query performance after the table is converted, described next.
You can configure the maximum number of processors for converting a heap or clustered B-tree index to a columnstore index. To configure the processors, use the maximum degree of parallelism option (MAXDOP).
If you have large amounts of data, MAXDOP 1
might be too slow. Increasing MAXDOP to 4
works fine. If this results in a few rowgroups that do not have the optimal number of rows, you can run ALTER INDEX REORGANIZE to merge them together in the background.
Since the B-tree index already stores rows in a sorted order, preserving that order when the rows get compressed into the columnstore index can improve query performance.
The columnstore index does not sort the data, but it does use metadata to track the minimum and maximum values of each column segment in each rowgroup. When scanning for a range of values, it can quickly compute when to skip the rowgroup. When the data is ordered, more rowgroups can be skipped.
To preserve the sorted order during conversion:
Use CREATE COLUMNSTORE INDEX with the DROP_EXISTING clause. This also preserves the name of the index. If you have scripts that already use the name of the rowstore index you don't need to update them.
This example converts a clustered rowstore index on a table named MyFactTable
to a clustered columnstore index. The index name, ClusteredIndex_d473567f7ea04d7aafcac5364c241e09
, stays the same.
CREATE CLUSTERED COLUMNSTORE INDEX ClusteredIndex_d473567f7ea04d7aafcac5364c241e09
ON MyFactTable
WITH (DROP_EXISTING = ON);
Each rowgroup contains one column segment for every column in the table. Each column segment is compressed together and stored on physical media.
There is metadata with each segment to allow for fast elimination of segments without reading them. Data type choices might have a significant impact on query performance based common filter predicates for queries on the columnstore index. For more information, see segment elimination.
These are tasks for creating and maintaining columnstore indexes.
Task | Reference articles | Notes |
---|---|---|
Create a table as a columnstore. | CREATE TABLE (Transact-SQL) | Beginning with SQL Server 2016 (13.x), you can create the table as a clustered columnstore index. You do not have to first create a rowstore table and then convert it to columnstore. |
Create a memory table with a columnstore index. | CREATE TABLE (Transact-SQL) | Beginning with SQL Server 2016 (13.x), you can create a memory-optimized table with a columnstore index. The columnstore index can also be added after the table is created, using the ALTER TABLE ADD INDEX syntax. |
Convert a rowstore table to a columnstore. | CREATE COLUMNSTORE INDEX (Transact-SQL) | Convert an existing heap or B-tree to a columnstore. Examples show how to handle existing indexes and also the name of the index when performing this conversion. |
Convert a columnstore table to a rowstore. | CREATE CLUSTERED INDEX (Transact-SQL) or Convert a columnstore table back to a rowstore heap | Usually this conversion isn't necessary, but there can be times when you need to convert. Examples show how to convert a columnstore to a heap or clustered index. |
Create a columnstore index on a rowstore table. | CREATE COLUMNSTORE INDEX (Transact-SQL) | A rowstore table can have one columnstore index. Beginning with SQL Server 2016 (13.x), the columnstore index can have a filtered condition. Examples show the basic syntax. |
Create performant indexes for operational analytics. | Get started with Columnstore for real-time operational analytics | Describes how to create complementary columnstore and B-tree indexes so that OLTP queries use B-tree indexes and analytics queries use columnstore indexes. |
Create performant columnstore indexes for data warehousing. | Columnstore indexes in data warehousing | Describes how to use B-tree indexes on columnstore tables to create performant data warehousing queries. |
Use a B-tree index to enforce a primary key constraint on a columnstore index. | Columnstore indexes in data warehousing | Shows how to combine B-tree and columnstore indexes to enforce primary key constraints on the columnstore index. |
Drop a columnstore index | DROP INDEX (Transact-SQL) | Dropping a columnstore index uses the standard DROP INDEX syntax that B-tree indexes use. Dropping a clustered columnstore index converts the columnstore table to a heap. |
Delete a row from a columnstore index | DELETE (Transact-SQL) | Use DELETE (Transact-SQL) to delete a row. columnstore row: SQL Server marks the row as logically deleted but does not reclaim the physical storage for the row until the index is rebuilt. deltastore row: SQL Server logically and physically deletes the row. |
Update a row in the columnstore index | UPDATE (Transact-SQL) | Use UPDATE (Transact-SQL) to update a row. columnstore row: SQL Server marks the row as logically deleted, and then inserts the updated row into the deltastore. deltastore row: SQL Server updates the row in the deltastore. |
Force all rows in the deltastore to go into the columnstore. | ALTER INDEX (Transact-SQL) ... REBUILD Optimize index maintenance to improve query performance and reduce resource consumption |
ALTER INDEX with the REBUILD option forces all rows to go into the columnstore. |
Defragment a columnstore index | ALTER INDEX (Transact-SQL) | ALTER INDEX ... REORGANIZE defragments columnstore indexes online. |
Merge tables with columnstore indexes. | MERGE (Transact-SQL) |
To create an empty columnstore index for:
For more information on how to convert an existing rowstore heap or B-tree index into a clustered columnstore index, or to create a nonclustered columnstore index, refer to CREATE COLUMNSTORE INDEX (Transact-SQL).
Peristiwa
31 Mac, 11 PTG - 2 Apr, 11 PTG
Acara pembelajaran SQL, Fabric dan Power BI terbesar. 31 Mac - 2 April. Gunakan kod FABINSIDER untuk menjimatkan $400.
Daftar hari iniLatihan
Laluan pembelajaran
Use advance techniques in canvas apps to perform custom updates and optimization - Training
Use advance techniques in canvas apps to perform custom updates and optimization
Dokumentasi
Columnstore indexes - Query performance - SQL Server
Columnstore index query performance recommendations for achieving the fast query performance.
Columnstore indexes: Overview - SQL Server
An overview on columnstore indexes. Columnstore indexes are the standard for storing and querying large data warehousing fact tables.
Performance Tuning With Ordered Columnstore Indexes - SQL Server
Learn more about how ordered columnstore indexes can benefit your query performance.