Achieving Fast Parallel Columnstore Index Builds
To build columnstore indexes as fast a possible, make sure to provide plenty of memory. See the Columnstore FAQs related to index build here for a discussion of memory-related issues, including what to do if index build runs out of memory or can't get enough memory. To build a single columnstore segment, the system must load a million rows into memory in a row-based format, and then create the columnstore representation of the segment. Each segment must be built by a single thread. The index build code reduces the degree of parallelism (DOP) if there is not enough memory.
If index build is slow because DOP was reduced automatically, consider the following options:
- Set min server memory to a large initial value. E.g. if you know your server may consume up to 60GB of RAM, set both min server memory ** and max server memory to 60GB using sp_configure. If you don't set min server memory, then SQL Server gradually requests memory from the operating system over time, on demand. If a burst of activity causes SQL Server to need more memory than is currently allocated from the operating system, the memory requests may fail or may block for extended periods. To get the best, most predictable performance, use sp_configure settings to pre-allocate memory.
- use
ALTER WORKLOAD GROUP [DEFAULT] WITH (REQUEST_MAX_MEMORY_GRANT_PERCENT=X)
ALTER RESOURCE GOVERNOR RECONFIGURE
to set the maximum grant size to something larger than the default, say 50 percent (going much higher than 50 percent is not recommended because you can hang the server). This can let you get a larger memory grant and thus a higher DOP for your index build. - Add more memory to your server. This helps because building an index at DOP N requires N times more memory than building an index at DOP 1. In addition, ample memory reduces the cost for SQL Server to re-organize memory to make room for large memory allocations, which are needed during columnstore index build.
- Omit large, infrequently used columns from the columnstore index (e.g. large string columns that your queries probably won't touch).
- Normalize your schema, e.g. to factor out large string columns into separate dimension tables.
- Vertically partition your table into two or more narrower tables, which will not have to be joined together.
To get a feel for how much memory may be needed, consider the following example. A 1000-byte row size requires 1GB just to load a row-based copy of one segment into memory. 2 to 4 times this much memory, and in some cases more, may be required to build a segment. So, if you have a 20 core machine and want to use DOP 20 to do an index build, it may take 20*4GB = 80GB or more, dedicated to this operation alone, to do it fully in parallel.
To determine the runtime DOP of a columnstore index build operation, use sys.dm_exec_query_memory_grants and look at the 'dop' column. If the DOP is lower than expected, make sure nothing else is limiting the DOP (like the max degree of parallelism sp_configure setting), and then, if possible, provide more memory to SQL Server and try again.