Why did such a change occur?

博雄 胡 585 Reputation points
2023-12-22T12:57:14.9566667+00:00

First of all, I want to say sorry. I am not good at English, so I am using a translator to communicate with you. Please forgive me.

There are 3 plans in the figure, and their SQL is as follows. Plan 1 is the original SQL, plan 2 adds a [union ''] on the basis of plan 1, and plan 3 is rewritten to optimize plan 2.I have 2 main questions for you:

  1. Why did adding [union ''] in plan 1 to plan 2 cause the plan to change from serial to parallel and the join to change from hash to merge?Is it due to CBO?Why did plan 1 not choose a parallel plan when the current [cost threshold for parallelism] is set to 5?
  2. Why does the merge join in plan 2 read up to 40,532,364 rows?This seems like a loop join could achieve.In my understanding, the merge join will not read more than the top or bottom result set rows.

Thank you all.

1

--plan 1
select * from ( select distinct NewTireNumber from MaterialOut a where NewTireNumber not in ( select distinct OldTireNumber from MaterialOut b where a . NewTireNumber = b . OldTireNumber ) and UserAuxiliary = @0 ) as T order by NewTireNumber asc

--plan 2
select * from (
select distinct NewTireNumber from MaterialOut a
where a.NewTireNumber not in (select distinct OldTireNumber  from MaterialOut b where a.NewTireNumber=b.OldTireNumber)
and UserAuxiliary='粤BHN690' 
union  
select ''
) as T order by NewTireNumber  asc

--plan 3
select * from (
select distinct NewTireNumber from MaterialOut a
where  not exists (select distinct OldTireNumber  from MaterialOut b where a.NewTireNumber=b.OldTireNumber)
and UserAuxiliary='粤BHN690' 
union  
select ''
) as T order by NewTireNumber  asc
SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
14,494 questions
0 comments No comments
{count} votes

Accepted answer
  1. Erland Sommarskog 120.2K Reputation points MVP
    2023-12-25T13:15:17.5566667+00:00

    I found the reason. In query plan for the first query, there is this:

    User's image

    That is, there is a plan guide that is forcing something. Maybe a full plan. Maybe a hint like OPTION(MAXDOP 1).

    For the other queries there are plan guides, so they get the plans that SQL Server thinks is the best. I note that the first query has an estimate cost of 11 (hover over the SELECT operator and view Estimated Subtree Cost), so I guess the plan guide is explicitly turning off parallelism. The second query, on the other hand, is just barely above the limit of 5.

    Beware that a plan guide is only for the exact query. Add some spacing and it will no longer match. And even less if you logically change the query.

    Of the other two queries, I prefer NOT EXISTS over NOT IN. NOT IN is one to be careful with, because it can lead into surprises if there are NULL values involved.

    I made one observation in the query plan XML that I like to call out:

    User's image

    You are running the release version of SQL 2016 which is grossly outdated. You should download and install the last Service Pack for SQL 2016 which is SP3. There is also one Cumulative Update that you should apply, and some security fixes on top of that. This will give you access to many bug fixes and enhancements.

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Erland Sommarskog 120.2K Reputation points MVP
    2023-12-22T22:17:26.15+00:00

    First, no can answer questions like "why did..." based only images of the query plans. With the actual query plans in XML, the table and index definitions, information about the data etc, it may be possible give answers with confidence.

    What I can say about the first question is that you should keep in mind that UNION implies DISTINCT:

    SELECT a
    FROM (VALUES(1), (1), (3)) AS V(a)
    UNION
    SELECT a
    FROM (VALUES(3), (4), (5)) AS V(a)
    
    
    

    returns

    1
    3
    4
    5
    

    And getting rid of duplicates requires more power, and this can explain why you got parallel execution. If you know that there cannot be duplicates or you want to retain them, use UNION ALL instead.

    Then again, since you did not show the query after UNION, maybe the answer is that this part is more complex and requires parallelism on its own.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.