Partitioned Tables in SQL Server 2008

In this post, I introduced how SQL Server 2005 implements query plans on partitioned tables.  If you've read that post or used partitioned tables, you may recall that SQL Server 2005 uses a constant scan operator to enumerate the list of partition ids that need to be scanned.  As a refresher, here is the example from that post showing the plan for scanning a table with four partitions:

CREATE PARTITION FUNCTION PF(INT) AS RANGE FOR VALUES (0, 10, 100)
CREATE PARTITION SCHEME PS AS PARTITION PF ALL TO ([PRIMARY])
CREATE TABLE T (A INT, B INT) ON PS(A)

SELECT * FROM T

  |--Nested Loops(Inner Join, OUTER REFERENCES:([PtnIds1004]) PARTITION ID:([PtnIds1004]))
       |--Constant Scan(VALUES:(((1)),((2)),((3)),((4))))
       |--Table Scan(OBJECT:([t]))

SQL Server 2005 treats partitioned tables specially and creates special plans, such as the above one, for partitioned tables.  SQL Server 2008, on the other hand, for the most part treats partitioned tables as regular tables that just happen to be logically indexed on the partition id column.  For example, for the purposes or query optimization and query execution, SQL Server 2008 treats the above table not as a heap but as an index on [PtnId].  If we create a partitioned index (clustered or non-clustered), SQL Server 2008 logically adds the partition id column as the first column of the index.  For instance, if we create the following index:

CREATE INDEX TA ON T(A) ON PS(A)

SQL Server 2008 treats it not as a single column index on T(A), but rather as a multi-column or composite index on T([PtnId],A).  Note that the table and index are still stored physically as partitioned tables.  In this example, the table and non-clustered index are decomposed into four heaps and four non-clustered indexes.

By treating partitioned tables as indexes, SQL Server 2008 can frequently use much simpler query plans.  Let's look at the SQL Server 2008 plan for the above example:

  |--Table Scan(OBJECT:([T]))

This query plan is no different than what you might see if you scanned any other ordinary table!  So, how can we tell that the table is really partitioned and how can we tell what partitions the plan is going to scan?  Luckily, the graphical and XML plans identify partitioned tables by adding a "Partitioned" attribute to any scan, seek, or update operator that processes a partitioned table.  Moreover, the actual graphical plan and the STATISTICS XML output identify the exact partitions that the plan touches during execution.  For example, here is an excerpt from the STATISTICS XML output generated by running the above query:

<RunTimePartitionSummary>
  <PartitionsAccessed PartitionCount="4">
    <PartitionRange Start="1" End="4"/>
  </PartitionsAccessed>
</RunTimePartitionSummary>

This output shows that, as expected, the query plan scanned all four partitions of the table.  Now, suppose we insert a row into the second partition (up until now, the table has been empty) and run the following slightly modified query:

INSERT T VALUES (1, 1)

SELECT TOP 1 * FROM T

This query can stop executing as soon as it finds a single row.  It scans the first partition and finding no rows continues on to scan the second partition.  At that point, it finds a row and stops.  Thus, the query terminates after scanning only two partitions.  This outcome is clearly reflected in the STATISTICS XML output:

<RunTimePartitionSummary>
  <PartitionsAccessed PartitionCount="2" >
    <PartitionRange Start="1" End="2" />
  </PartitionsAccessed>
</RunTimePartitionSummary>

Now let's see how partition elimination works.  Let's start with static partition elimination:

SELECT * FROM T WHERE A < 100

  |--Table Scan(OBJECT:([T]), SEEK:([PtnId1001] >= (1) AND [PtnId1001] <= (3)) ,  WHERE:([T].[A]<(100)) ORDERED FORWARD)

Because SQL Server 2008 treats the partitioned table like an index on [PtnId], it can simply calculate the range of partitions that need to be scanned and perform a "seek" on those partitions.  It may seem a little strange to see a SEEK predicate on a table scan, but it just means that the plan is going to scan partitions 1 through 3.

And now, let's look at dynamic partition elimination:

DECLARE @I INT
SELECT @I = 0
SELECT * FROM T WHERE A < @I

  |--Table Scan(OBJECT:([T]), SEEK:([PtnId1001] >= (1) AND [PtnId1001] <= RangePartitionNew([@I],(0),(0),(10),(100)) ),  WHERE:([T].[A]<[@I]) ORDERED FORWARD)

This plan is identical to the prior plan except that the constant partition id 3 has been replaced by the RangePartitionNew function which computes the correct partition id from the variable @I.  To find out exactly which partitions were scanned, we can once again check the STATISTICS XML output:

<RunTimePartitionSummary>
  <PartitionsAccessed PartitionCount="1">
    <PartitionRange Start="1" End="1"/>
  </PartitionsAccessed>
</RunTimePartitionSummary>

Note that this plan computes exactly which partitions to scan.  Compare that to the SQL Server 2005 plan which evaluates a filter for each and every partition id:

  |--Nested Loops(Inner Join, OUTER REFERENCES:([PtnIds1004]) PARTITION ID:([PtnIds1004]))
       |--Filter(WHERE:([PtnIds1004]<=RangePartitionNew([@i],(0),(0),(10),(100))))
| |--Constant Scan(VALUES:(((1)),((2)),((3)),((4))))

       |--Table Scan(OBJECT:([t]), WHERE:([t].[a]<[@i]) PARTITION ID:([PtnIds1004]))

While evaluating the filter repeatedly may not cost too much if we have only four partitions, it is certainly going to waste some cycles if we have many partitions!

In my next post, I'll take a look at how SQL Server 2008 handles scans and seeks on partitioned indexes.

Comments

  • Anonymous
    July 20, 2008
    How accurate is the $PARTIION function in this case, compared to examine the XML output?

  • Anonymous
    July 21, 2008
    The $PARTITION function operates on a row by row basis.  For instance, in the example in this post, if we write:SELECT *, $PARTITION.PF(A) FROM Tfor each row returned by the query we will also get the partition id to which the row belongs.  The XML output will only indicate which partitions were scanned but will not tell us how the rows map to partitions.Note that we can use the $PARTITION function to get the same results as the XML output (much less efficiently) by writing:SELECT DISTINCT $PARTITION.PF(A) FROM TOf course, this query does not return any data other than the partition ids.

  • Anonymous
    August 05, 2008
    In my last post , I looked at how SQL Server 2008 handles scans on partitioned tables. I explained that

  • Anonymous
    August 18, 2008
    //While evaluating the filter repeatedly may not cost too much if we have only four partitions, it is certainly going to waste some cycles if we have many //For arguments sake, let's say there is a table with 200 partitions.  (Not a recommendation, just a number for posing a question).So you're saying that if the same partitioning scheme(and functions) are used on a Sql Server 2005 db and then (same db setup) on a Sql Server 2008 database, the Sql Server 2008 database will perform better because it will "avoid evaluating the filter" so many times.My question is primarily about dynamic partition elimination, where I am passing a @MyPartitionKey (int) variable into a stored procedure.And if I'm discerning all that correctly, any idea how costly that really ends up being?Maybe in the 5 million total rows, 1 million per partition range?  ( I know each case is varying, but any hints would be appreciated).Thanks.

  • Anonymous
    August 22, 2008
    In this post on partitioned tables, I mentioned that SQL Server 2008 has a much more efficient implementation

  • Anonymous
    August 22, 2008
    I decided to measure the overhead of dynamic partition elimination with a large number of partitions.  Please see this post for the results:http://blogs.msdn.com/craigfr/archive/2008/08/22/dynamic-partition-elimination-performance.aspx

  • Anonymous
    January 04, 2009
    one question: I have index partitioning on a table, will the non-clustered index partitioning (one or more using the same partition scheme) bring more tradeoff for index seek or scan or sql optimizer take them the same?  I noticed if using non-clustered index partitioning will create more partitions for each NC index.

  • Anonymous
    January 05, 2009
    With regards to seek or scan performance, the question to ask is whether there is a predicate on the partition column that can be used to enable partition elimination.  If the answer is yes, then partitioning should not impact performance.  If the answer is no, the query plan may need to seek or scan every partition in which case partitioning may affect performance.  The actual impact depends on the number and size of the partitions and total number of rows processed by the query.A typical scenario for partitioned tables is to use ALTER TABLE to switch an old partition out and a new partition in.  This scenario requires that all indexes (clustered and non-clustered) be partitioned using the same scheme.  For some workloads, there may be a tradeoff between enabling this scenario and achieving optimal partition elimination performance.

  • Anonymous
    August 24, 2010
    Consider I created a partitioned as follows. Here, the Partition column is a non-indexed column 'SelectId' which will be used most of the predicate of select statement. Partition scheme is defined for filegroup FG1, but the Clustered Index in defined on PRIMARY. How does the data and index storage manage to give optimum performance? USE TestDB GO -- Step 1: Create the partition function(s) CREATE PARTITION FUNCTION fnPartitionId (INT) AS RANGE LEFT FOR VALUES (1400, 2800, 4200, 5600) GO -- Step 2: Create Partition Scheme: CREATE  PARTITION SCHEME schmPartitionId AS PARTITION fnPartitionId ALL TO ('FG1') -- place partitions in the filegroup GO --Step 3: Create the table on Partition CREATE TABLE dbo.[tblInvoice]( [InvoiceNo] varchar NOT NULL, [InvoiceType] varchar NULL, [InvoiceCompany] varchar NULL, [InvoiceDate] [datetime] NULL, [InvoiceCreationDateAndTimeStamp] [datetime] NULL, [SoldTo] varchar NULL, [ShipTo] varchar NULL, [SelectId] [int] NULL, CONSTRAINT [PK_tblInvoice] PRIMARY KEY CLUSTERED ( [InvoiceNo] ASC )WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY] ) ON schmPartitionId GO

  • Anonymous
    August 31, 2010
    I'm not certain if this is your question, but SQL Server will ignore the table level partition and will create a non-partitioned clustered index in the primary filegroup.  See the discussion of the ON clause in the Books Online CREATE TABLE entry: msdn.microsoft.com/.../ms174979.aspx.

  • Anonymous
    August 19, 2011
    Hi, I am trying to optimize some code hitting partitioned tables and the showpla_xml does not have a RunTimePartitionSummary node, why would this be?  Also the first SeekKeys node in the SeekPredicateNew node shows all of our partitions.  Does this mean that every partition is being checked even though only 3 are needed to return the data?

  • Anonymous
    August 19, 2011
    I was using showplan_xml instead of set statistics xml on.  The RunTimePartitionSummary node now exists.  Still though the first node of the seekKeys shows every partition.  Is this correct? The following article says that is a list of all of the partitions needed to produce the result set, yet the Summary only states that 3 partitions are used. msdn.microsoft.com/.../ms345599.aspx

  • Anonymous
    August 19, 2011
    The SeekKeys node shows the list of partition ids that may be scanned based on static partition elimination (that is, based on information known at query optimization time).  The query may access fewer partitions at runtime if the result can be computed without scanning all partitions.  For example, SELECT TOP 1 * FROM T, will stop after returning a single row.  The query plan will report that it needs to scan all partitions since it does not know whether any partitions have a row.  If the first partition of table T has a row, only one partition will be scanned (and reported in the runtime summary) regardless of the actual number of partitions.  If the first first partition is empty but the second partition has a row, two partitions will be scanned.  HTH.