parallel plan slower than serial plan

Question

Can anyone give an example that parallel plan (parallelism) slower than serial plan

Answer

I don't have a demo to share, but I have seen this more than once. A parallel plan that runs for ages, add OPTION (MAXDOP 1) and boom! Query is instant.

In cases where I have seen this, the optimizer has gone wrong on partitioning the data between the threads, and one thread has ended up with all the data. And that with a plan which was designed for heavy crunching and take benefit parallelism. MAXDOP 1 has forced a less heavy plan which has been faster in real life.

I can't recall that I have seen a case where the serial version of the same parallel plan has been faster, but it is perfectly conceivable, since the parallelism operators add some overhead.

Answer

Sakuraime,

Please access this links:

https://www.red-gate.com/simple-talk/sql/learn-sql-server/understanding-and-using-parallelism-in-sql-server/

https://www.mssqltips.com/sqlservertip/5404/parallelism-in-sql-server-execution-plan/

Answer

expanding on Erland, the asymmetric work per thread is likely to occur in the Nested Loops join operation. The more efficient method of partitioning work between threads is to have each thread work on a separate value from the Join Outer Source (upper right). Unfortunately, if the rows in the Inner Source of a loop join (lower center) is highly uneven, then this strategy fails.
A possibility that could happen but is unlikely, is when there is a high degree of communication between threads, and there are threads on cores in separate sockets, greatly elevating the intercommunication lag.
SQL Server has soft-NUMA, in-which a each soft node may be subset of a full NUMA node, and keeps parallel threads within a soft-node. But if you explicitly set MAXDOP higher than a single physical NUMA node, you could have threads spread over more than one socket. Only do this for high parallel operation with little inter-thread communication.
also per Erland, the more likely cause is a structurally different execution plan , excluding the parallelism operators. I have argued for years that the cost model employed by the SQL Server query optimizer is seriously obsolete, being based on a reference system with little memory and a weak disk system. The true structure of operation costs when data is largely in memory or with a powerful IO system is very different.

Share via

parallel plan slower than serial plan

3 answers