Dynamic Partition Elimination Performance
In this post on partitioned tables, I mentioned that SQL Server 2008 has a much more efficient implementation of dynamic partition elimination as compared to SQL Server 2005. In response, a reader posted this comment asking how much dynamic partition elimination really costs in SQL Server 2005. While I was sure that the answer is not much, I nonetheless decided to measure it and find out.
I wrote the following stored procedure which creates a table with the specified number of partitions:
CREATE PROCEDURE CreateTable @N INT
AS
BEGIN
DECLARE @Cmd VARCHAR(8000)
DECLARE @I INTSET @Cmd = 'CREATE PARTITION FUNCTION PF(INT) AS RANGE FOR VALUES (1'
SET @I = 2
WHILE @I < @N
BEGIN
SET @Cmd = @Cmd + ',' + CONVERT(VARCHAR(4),@I)
SET @I = @I + 1
ENDSET @Cmd = @Cmd + ')'
EXECUTE (@Cmd)
CREATE PARTITION SCHEME PS AS PARTITION PF ALL TO ([PRIMARY])
CREATE TABLE T1 (A INT, B INT)
CREATE CLUSTERED INDEX T1A ON T1(A) ON PS(A)
END
I then created a table with either 2, 10, 100, or 1000 partitions. For each table, I ran the following batch:
SET NOCOUNT ON
DECLARE @T DATETIME
DECLARE @I INT, @J INT, @K INT
SET @I = 0
SET @J = 0
SET @T = GETDATE()
WHILE @J < 10000
BEGIN
SELECT @K = B FROM T1 WHERE A < @I
SET @J = @J + 1
END
SELECT DATEDIFF (MS, @T, GETDATE())
The main SELECT statement in this batch scans a single partition but, because the WHERE clause references a variable, SQL Server does not know this information at compile time. This query is too fast to measure a single execution accurately. Thus, I magnify the cost by repeating the query 10,000 times in a WHILE loop. The SET NOCOUNT ON statement and the assignment in the SELECT statement merely serve to minimize client communication costs which would otherwise dominate the batch cost.
The table is empty which makes this test a worst case scenario. The cost of the query is dominated by the cost of locating the correct partition.
Looking at the SQL Server 2005 STATISTICS PROFILE output for the 1000 partition table, we can see that SQL Server tests each partition before scanning only the one that is not eliminated:
0 1 |--Nested Loops(Inner Join, OUTER REFERENCES:([PtnIds1004]) PARTITION ID:([PtnIds1004]))
1 1 |--Filter(WHERE:([PtnIds1004]<=RangePartitionNew([@I],(0),(1),(2),(3),...
1000 1 | |--Constant Scan(VALUES:(((1)),((2)),((3)),...
0 1 |--Clustered Index Seek(OBJECT:([T1].[T1A]), SEEK:([T1].[A] < [@I]) ORDERED FORWARD PARTITION ID:([PtnIds1004]))
By comparison, SQL Server 2008 seeks directly to the correct partition:
0 1 |--Clustered Index Seek(OBJECT:([T1].[T1A]), SEEK:([PtnId1000] >= (1) AND [PtnId1000] <= RangePartitionNew([@I],(0),(1),(2),(3),...
Here are the results. In each case, before collecting any data, I ran the batch once to eliminate compilation overhead and warm up costs. I then took the median of 5 runs. Times are in milliseconds for the full 10,000 executions of the query. Note that we cannot compare the SQL Server 2005 results directly to the SQL Server 2008 results as I ran these tests on two different machines. However, the relative cost of adding partitions in SQL Server 2005 vs. SQL Server 2008 is clear.
# Partitions |
SQL Server 2005 |
SQL Server 2008 |
2 |
233 |
216 |
10 |
313 |
203 |
100 |
1216 |
216 |
1000 |
18060 |
216 |
It is important to remember that these are times for 10,000 executions of the query. Thus, even on SQL Server 2005 with 1000 partitions, the cost of a single execution of the query is only 1.8 milliseconds. Moreover, if the table contains any real data and if the query returns even a modest number of rows, the cost of the query will be dominated by the actual data processing.
Comments
Anonymous
September 18, 2008
Thanks Craig for the concrete information...Anonymous
February 21, 2011
This is good information, thanksAnonymous
April 01, 2013
Nice way to show the difference. Thanks,