Partitioned Tables, Parallelism & Performance considerations
Q: When querying SS2005 Partitioned Tables, what kind of parallelism should I expect to see and how will that affect performance?
A: First, a brief background on partitioning. SQL Server 2005 table partitioning provides many improvements in terms of manageability and availability. The manageability improvements allow metadata only switch-in and switch-out of a partition of data (supporting sliding window requirements). Availability improvements include online index rebuilds, parallel operations, and piecemeal restores of filegroups (see blog SQL Server 2005 OnLine Piecemeal Restore at https://blogs.msdn.com/sqlcat/archive/2005/10/27/485580.aspx).
Now let’s look at how performance is impacted by partitioned tables. Regardless of whether you are using partitioning, the selection of a parallel plan is determined by the number of CPUs, query cost, available memory and current workload. The remaining considerations outlined here are valid only if all of these allow using parallel plan.
When a query uses a single partition e.g. all but a single partition is eliminated, SQL Server 2005 Intra-Partition parallelism can parallelize row retrieval using multiple threads up to the sp_configure’d value of Maximum Degree of Parallelism (MAXDOP). Frequently, MAXDOP is set to the default of 0 which means MAXDOP is equal to the total number of CPUs. When a query spans two or more partitions, only a single thread per partition will be used to retrieve rows.
If your number of partitions is equal to MAXDOP or less, data skewing may cause some parallel threads to complete before others resulting in CXPACKET waits. In cases where the number of partitions exceeds MAXDOP, SQL Server 2005 on-demand parallelism means that when a thread completes its work on a partition, it will automatically start processing the next partition in line. Say you have 16 partitions and a MAXDOP of 8. The first 8 threads work on partitions 1-8. The first thread completed will start processing partition 9, the next, partition 10 and so on.
If you have big multi-proc box of 8 or more CPUs, the worst case query performance would be a single SELECT statement that spans two partitions (see Table 1). One thread per partition would be used to retrieve the rows (See Table 2 yellow highlight for partitions 80 and 81 & number of threads in Executes column) although MAXDOP can still be applied after row retrieval in subsequent steps (See Table 2 green highlights).
Table 1: Retrieve 2 weeks of data
SELECT |
SUM(Sales_Qty) as Sales_Qty, |
SUM(Sale_Amt) as Sales_Amount |
FROM SalesDB.dbo.Tbl_Fact_ Sales – Partitioned by week |
WHERE date_id between '20050703' and '20050716' |
Table 2: Set Statistics Profile: MAXDOP = 12
Rows |
Executes |
StmtText |
1 |
1 |
SELECT SUM([Sales_Qty]) [Sales_Qty],SUM([Sale_Amt]) [Sales_Amount] FROM [SalesDB].[dbo].[Tbl_Fact_Sales] WHERE [date_id]>=@1 AND [date_id]<=@2 |
0 |
0 |
|--Compute Scalar(DEFINE:([Expr1002]=CASE WHEN [globalagg1008]=(0) THEN NULL ELSE [globalagg1010] END, [Expr1003]=CASE WHEN [globalagg1012]=(0) THEN NULL ELSE [globalagg1014] END)) |
1 |
1 |
|--Stream Aggregate(DEFINE:([globalagg1008]=SUM([partialagg1007]), [globalagg1010]=SUM([partialagg1009]), [globalagg1012]=SUM([partialagg1011]), [globalagg1014]=SUM([partialagg1013]))) |
2 |
1 |
|--Parallelism(Gather Streams) |
2 |
12 |
|--Stream Aggregate(DEFINE:([partialagg1007]=COUNT_BIG([SalesDB].[dbo].[Tbl_Fact_Sales].[Sales_Qty] as [ss].[Sales_Qty]), [partialagg1009]=SUM([SalesDB].[dbo].[Tbl_Fact_Sales].[Sales_Qty] as [ss].[Sales_Qty]), [partialagg1011]=COUNT_BIG([SalesDB].[dbo].[Tbl_Fact_Sales].[Sale_Amt] as [ss].[Sale_Amt]), [partialagg1013]=SUM([SalesDB].[dbo].[Tbl_Fact_Sales].[Sale_Amt] as [ss].[Sale_Amt]))) |
20577235 |
12 |
|--Nested Loops(Inner Join, OUTER REFERENCES:([PtnIds1006]) PARTITION ID:([PtnIds1006])) |
2 |
12 |
|--Parallelism(Distribute Streams, Demand Partitioning) |
2 |
1 |
| |--Constant Scan(VALUES:(((80)),((81)))) |
20577235 |
2 |
|--Index Seek(OBJECT:([SalesDB].[dbo].[Tbl_Fact_Sales].[IX_Tbl_Fact_Sales_SKDteItmStrIDSalQtySalAmtDiscMkd] AS [ss]), SEEK:([ss].[SK_Date_ID] >= (20050703) AND [ss].[SK_Date_ID] <= (20050716)) ORDERED FORWARD PARTITION ID:([PtnIds1006])) |
For instance, suppose you have a TB-sized Sales table that is partitioned by month. A common query pattern may be to compare this month to last month, or perhaps this month to a year ago.
Table 3: MONTHLY Partitions
|
WHERE clause |
Partitions |
Retrieval Parallelism |
Single SELECT statement |
SELECT …. WHERE DateCol BETWEEN ’10/1/2005’ and ‘11/30/2005’ |
2 |
1 thread per partition * |
SELECT UNION SELECT [UNION SELECT] |
Select …. Where DateCol between ’10/1/2005’ and ‘10/31/2005 UNION Select …. Where DateCol between ’11/1/2005’ and ‘11/30/2005 |
1 per select |
MAXDOP per partition |
Table 4: WEEKLY Partitions: Sales for November 1-15
Comments
Anonymous
May 01, 2006
What is the relationship between MAXDOP and the number of threads used to access the actual filesystem.
In your example above lets say I had a large query that spanned 2 partitions. But lets say these partition where spread across 12 different files on 12 different luns on my SAN.
Would you still have only 2 threads accessing these files? Or is there not a direct correlation between parallel threads and storage engine threads
BertAnonymous
November 08, 2006
SQL Server parallelizes a nested loops join by distributing the outer rows (i.e., the rows from the firstAnonymous
February 07, 2007
Hi, Here in Brazil I´m working on MS Gold Certified Partner, and I act in two distinct teams - Data...Anonymous
June 14, 2007
The comment has been removedAnonymous
January 21, 2009
PingBack from http://www.hilpers.it/2537369-partizionamento-di-una-tabellaAnonymous
January 21, 2009
PingBack from http://www.keyongtech.com/2160525-parallel-i-o-for-partitionedAnonymous
June 02, 2009
PingBack from http://patiochairsite.info/story.php?id=26502Anonymous
June 15, 2009
PingBack from http://unemploymentofficeresource.info/story.php?id=15587Anonymous
June 17, 2009
PingBack from http://patiosetsite.info/story.php?id=957Anonymous
June 18, 2009
PingBack from http://adirondackchairshub.info/story.php?id=3744