Parallel Index Creation performance issue
Both SQL Server 2005 and 2008 allow you to do parallel index creation. In fact you can specify MAXDOP in create index statement. When you use MAXDOP (let's say 8), you would assume that 8 threads will do equal amount of work and finish the work at the same time.
But this is not always the case. We have recently investigated an issue from a customer who complained about index creation being slow. They have a need to create non-clustered indexes every night on a big table that they import data daily. What we observed is that there were 8 threads initially doing work. After a while, we just saw one thread (in addition to the main thread) consuming 100% CPU for a long time. This is definitely unusual.
As it turns out that the table has really skewed data. Out of 250 million rows, over 150 millions rows contain empty strings for the column they are trying to create index. So one thread ended up doing most of the work.
Here is discovery process,
We got xml plan (set statistics xml on). In the XML plan, you can see actual rows processed by each thread for each operator. For example, in this output, we see that thread 1 ended up sorting over 152 million rows. Rest of the threads didn't do too much work.
<RelOp NodeId="2" PhysicalOp="Sort" LogicalOp="Sort" EstimateRows="2.57398e+008" EstimateIO="2744.91" EstimateCPU="1958.46" AvgRowSize="35" EstimatedTotalSubtreeCost="7260.16" Parallel="1" EstimateRebinds="0" EstimateRewinds="0">
- <OutputList>
<ColumnReference Column="RowRefSrc1011" />
</OutputList>
<MemoryFractions Input="1" Output="1" />
- <RunTimeInformation>
<RunTimeCountersPerThread Thread="7" ActualRows="16059608" ActualRebinds="1" ActualRewinds="0" ActualEndOfScans="1" ActualExecutions="1" />
<RunTimeCountersPerThread Thread="6" ActualRows="15888692" ActualRebinds="1" ActualRewinds="0" ActualEndOfScans="1" ActualExecutions="1" />
<RunTimeCountersPerThread Thread="8" ActualRows="8269991" ActualRebinds="1" ActualRewinds="0" ActualEndOfScans="1" ActualExecutions="1" />
<RunTimeCountersPerThread Thread="5" ActualRows="16257841" ActualRebinds="1" ActualRewinds="0" ActualEndOfScans="1" ActualExecutions="1" />
<RunTimeCountersPerThread Thread="4" ActualRows="16733855" ActualRebinds="1" ActualRewinds="0" ActualEndOfScans="1" ActualExecutions="1" />
<RunTimeCountersPerThread Thread="3" ActualRows="15253016" ActualRebinds="1" ActualRewinds="0" ActualEndOfScans="1" ActualExecutions="1" />
<RunTimeCountersPerThread Thread="1" ActualRows="152841862" ActualRebinds="1" ActualRewinds="0" ActualEndOfScans="1" ActualExecutions="1" />
<RunTimeCountersPerThread Thread="2" ActualRows="16092988" ActualRebinds="1" ActualRewinds="0" ActualEndOfScans="1" ActualExecutions="1" />
<RunTimeCountersPerThread Thread="0" ActualRows="0" ActualRebinds="0" ActualRewinds="0" ActualEndOfScans="0" ActualExecutions="0" />
</RunTimeInformation>
You wouldn't see anything in regular statistics profile. All you see is over 250million rows processed.
Rows Executes StmtText
-------------------- -------------------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------
0 1 insert [schema1].[table1] select *, %%bmk%% from [schema1].[table1]
0 1 |--Parallelism(Gather Streams)
0 8 |--Index Insert(OBJECT:([db2].[schema1].[table1].[IX_table1_column1]))
257397853 8 |--Sort(ORDER BY:([db1].[schema1].[table1].[column1] ASC, [Bmk1000] ASC) PARTITION ID:([db1].[schema1].[table1].[column1]))
257397853 8 |--Table Scan(OBJECT:([db1].[schema1].[table1]))
At this point, we obtained show statistics output from customer after index creation to further confirm data distribution. From histogram, we confirmed that the contaim has over 140 million rows of empty strings out of 250 million rows.
RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
------------------------ ------------- ------------- -------------------- --------------
0 1,426618E+08 0 1
01.... 114197 19829 39314 2,904741
0313... 493116 12894 97814 5,041364
0316... 451782 16061 82636 5,467133
.....
Because of the skewed data distribution, sort can't be truely parallelized. Currently SQL can't divide a single value for multiple threads to handle. So one thread ended up doing more work.
So if you have skewed data on a column you are trying to index, it will not achieve true parallelism at sort phase.
For this particular case, you can potentially do filter index on NULL, or blanks to avoid the sort inbalance. But if it's a meaningful value to you that you won't want to filter out, you can end up with one thread doing most of the work during the sort phase.
For this particular customer, there are other tuning opportunities specific to their requirement which are not relavant to this post. So I will not add confusion.
Jack Li | Senior Escalation Engineer |Microsoft SQL Server Escalation Service
Comments
Anonymous
March 19, 2009
Introducing Web Friendly Feeds aka Friendly Feeds Parallel Index Creation performance issue IntegrationAnonymous
March 19, 2009
IntroducingWebFriendlyFeedsakaFriendlyFeedsParallelIndexCreationperformanceissueIntegr...Anonymous
March 19, 2009
This has to be a prime case for a filtered index ( SQL 2008 ) when presumably this problem would not then arise?Anonymous
April 02, 2012
Jack, Hi, Great article. I was wondering if you could shed some light on the parallel index operation process. Where do the stats that are used to calculate parallelism for the index operation come from, is it the existing stats, or as stated here: "msdn.microsoft.com/.../ms191292.aspx", a separate set of stats calculated specifically for index operation ? Many thanks, IainAnonymous
October 08, 2017
For this particular customer, there are other tuning opportunities specific to their requirement which are not relavant to this post.What other options we have pls