parallel plan slower than serial plan

sakuraime 2,321 Reputation points
2020-09-02T18:43:50.507+00:00

Can anyone give an example that parallel plan (parallelism) slower than serial plan

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
13,366 questions
{count} votes

3 answers

Sort by: Most helpful
  1. Erland Sommarskog 107.2K Reputation points
    2020-09-02T21:34:46.527+00:00

    I don't have a demo to share, but I have seen this more than once. A parallel plan that runs for ages, add OPTION (MAXDOP 1) and boom! Query is instant.

    In cases where I have seen this, the optimizer has gone wrong on partitioning the data between the threads, and one thread has ended up with all the data. And that with a plan which was designed for heavy crunching and take benefit parallelism. MAXDOP 1 has forced a less heavy plan which has been faster in real life.

    I can't recall that I have seen a case where the serial version of the same parallel plan has been faster, but it is perfectly conceivable, since the parallelism operators add some overhead.

    1 person found this answer helpful.

  2. Junior Galvão MVP 76 Reputation points MVP
    2020-09-02T18:50:53.71+00:00

  3. Chang, Joe 111 Reputation points
    2020-09-04T13:43:48.793+00:00

    expanding on Erland, the asymmetric work per thread is likely to occur in the Nested Loops join operation. The more efficient method of partitioning work between threads is to have each thread work on a separate value from the Join Outer Source (upper right). Unfortunately, if the rows in the Inner Source of a loop join (lower center) is highly uneven, then this strategy fails.
    A possibility that could happen but is unlikely, is when there is a high degree of communication between threads, and there are threads on cores in separate sockets, greatly elevating the intercommunication lag.
    SQL Server has soft-NUMA, in-which a each soft node may be subset of a full NUMA node, and keeps parallel threads within a soft-node. But if you explicitly set MAXDOP higher than a single physical NUMA node, you could have threads spread over more than one socket. Only do this for high parallel operation with little inter-thread communication.
    also per Erland, the more likely cause is a structurally different execution plan , excluding the parallelism operators. I have argued for years that the cost model employed by the SQL Server query optimizer is seriously obsolete, being based on a reference system with little memory and a weak disk system. The true structure of operation costs when data is largely in memory or with a powerful IO system is very different.