Partitioned Tables, Parallelism & Performance considerations

Bài viết
11/30/2005

Q: When querying SS2005 Partitioned Tables, what kind of parallelism should I expect to see and how will that affect performance?

A: First, a brief background on partitioning. SQL Server 2005 table partitioning provides many improvements in terms of manageability and availability. The manageability improvements allow metadata only switch-in and switch-out of a partition of data (supporting sliding window requirements). Availability improvements include online index rebuilds, parallel operations, and piecemeal restores of filegroups (see blog SQL Server 2005 OnLine Piecemeal Restore at https://blogs.msdn.com/sqlcat/archive/2005/10/27/485580.aspx).

Now let’s look at how performance is impacted by partitioned tables. Regardless of whether you are using partitioning, the selection of a parallel plan is determined by the number of CPUs, query cost, available memory and current workload. The remaining considerations outlined here are valid only if all of these allow using parallel plan.

When a query uses a single partition e.g. all but a single partition is eliminated, SQL Server 2005 Intra-Partition parallelism can parallelize row retrieval using multiple threads up to the sp_configure’d value of Maximum Degree of Parallelism (MAXDOP). Frequently, MAXDOP is set to the default of 0 which means MAXDOP is equal to the total number of CPUs. When a query spans two or more partitions, only a single thread per partition will be used to retrieve rows.

If your number of partitions is equal to MAXDOP or less, data skewing may cause some parallel threads to complete before others resulting in CXPACKET waits. In cases where the number of partitions exceeds MAXDOP, SQL Server 2005 on-demand parallelism means that when a thread completes its work on a partition, it will automatically start processing the next partition in line. Say you have 16 partitions and a MAXDOP of 8. The first 8 threads work on partitions 1-8. The first thread completed will start processing partition 9, the next, partition 10 and so on.

If you have big multi-proc box of 8 or more CPUs, the worst case query performance would be a single SELECT statement that spans two partitions (see Table 1). One thread per partition would be used to retrieve the rows (See Table 2 yellow highlight for partitions 80 and 81 & number of threads in Executes column) although MAXDOP can still be applied after row retrieval in subsequent steps (See Table 2 green highlights).

Table 1: Retrieve 2 weeks of data

SELECT

SUM(Sales_Qty) as Sales_Qty,

SUM(Sale_Amt) as Sales_Amount

FROM SalesDB.dbo.Tbl_Fact_ Sales – Partitioned by week

WHERE date_id between '20050703' and '20050716'

Table 2: Set Statistics Profile: MAXDOP = 12

Rows	Executes	StmtText
1	1	SELECT SUM([Sales_Qty]) [Sales_Qty],SUM([Sale_Amt]) [Sales_Amount] FROM [SalesDB].[dbo].[Tbl_Fact_Sales] WHERE [date_id]>=@1 AND [date_id]<=@2
0	0	\|--Compute Scalar(DEFINE:([Expr1002]=CASE WHEN [globalagg1008]=(0) THEN NULL ELSE [globalagg1010] END, [Expr1003]=CASE WHEN [globalagg1012]=(0) THEN NULL ELSE [globalagg1014] END))
1	1	\|--Stream Aggregate(DEFINE:([globalagg1008]=SUM([partialagg1007]), [globalagg1010]=SUM([partialagg1009]), [globalagg1012]=SUM([partialagg1011]), [globalagg1014]=SUM([partialagg1013])))
2	1	\|--Parallelism(Gather Streams)
2	12	\|--Stream Aggregate(DEFINE:([partialagg1007]=COUNT_BIG([SalesDB].[dbo].[Tbl_Fact_Sales].[Sales_Qty] as [ss].[Sales_Qty]), [partialagg1009]=SUM([SalesDB].[dbo].[Tbl_Fact_Sales].[Sales_Qty] as [ss].[Sales_Qty]), [partialagg1011]=COUNT_BIG([SalesDB].[dbo].[Tbl_Fact_Sales].[Sale_Amt] as [ss].[Sale_Amt]), [partialagg1013]=SUM([SalesDB].[dbo].[Tbl_Fact_Sales].[Sale_Amt] as [ss].[Sale_Amt])))
20577235	12	\|--Nested Loops(Inner Join, OUTER REFERENCES:([PtnIds1006]) PARTITION ID:([PtnIds1006]))
2	12	\|--Parallelism(Distribute Streams, Demand Partitioning)
2	1	\| \|--Constant Scan(VALUES:(((80)),((81))))
20577235	2	\|--Index Seek(OBJECT:([SalesDB].[dbo].[Tbl_Fact_Sales].[IX_Tbl_Fact_Sales_SKDteItmStrIDSalQtySalAmtDiscMkd] AS [ss]), SEEK:([ss].[SK_Date_ID] >= (20050703) AND [ss].[SK_Date_ID] <= (20050716)) ORDERED FORWARD PARTITION ID:([PtnIds1006]))

For instance, suppose you have a TB-sized Sales table that is partitioned by month. A common query pattern may be to compare this month to last month, or perhaps this month to a year ago.

Table 3: MONTHLY Partitions

WHERE clause

Partitions

Retrieval Parallelism

Single SELECT statement

SELECT …. WHERE DateCol BETWEEN ’10/1/2005’ and ‘11/30/2005’

1 thread per partition *

SELECT

UNION SELECT

[UNION SELECT]

Select …. Where DateCol between ’10/1/2005’ and ‘10/31/2005

UNION

Select …. Where DateCol between ’11/1/2005’ and ‘11/30/2005

1 per select

MAXDOP per partition

Table 4: WEEKLY Partitions: Sales for November 1-15

Comments

Anonymous
May 01, 2006
What is the relationship between MAXDOP and the number of threads used to access the actual filesystem.

In your example above lets say I had a large query that spanned 2 partitions. But lets say these partition where spread across 12 different files on 12 different luns on my SAN.

Would you still have only 2 threads accessing these files? Or is there not a direct correlation between parallel threads and storage engine threads

Bert
Anonymous
November 08, 2006
SQL Server parallelizes a nested loops join by distributing the outer rows (i.e., the rows from the first
Anonymous
February 07, 2007
Hi, Here in Brazil I´m working on MS Gold Certified Partner, and I act in two distinct teams - Data...
Anonymous
June 14, 2007
The comment has been removed
Anonymous
January 21, 2009
PingBack from http://www.hilpers.it/2537369-partizionamento-di-una-tabella
Anonymous
January 21, 2009
PingBack from http://www.keyongtech.com/2160525-parallel-i-o-for-partitioned
Anonymous
June 02, 2009
PingBack from http://patiochairsite.info/story.php?id=26502
Anonymous
June 15, 2009
PingBack from http://unemploymentofficeresource.info/story.php?id=15587
Anonymous
June 17, 2009
PingBack from http://patiosetsite.info/story.php?id=957
Anonymous
June 18, 2009
PingBack from http://adirondackchairshub.info/story.php?id=3744

Chia sẻ qua

Partitioned Tables, Parallelism & Performance considerations

Comments

Tài nguyên bổ sung