Massive Performance Difference between two near identical SQL

Avik Mukherjee 1 Reputation point
2021-05-16T18:34:03.85+00:00

Hi

Could someone guess/suggest what could be the difference here between the two queries? Results are the same but 1st one taking 10mins+ to return and 2nd one returning in milliseconds. One thing I noticed in the Execution plans... On 1st Query - it is doing PK (STUDENT_ID) scan twice for STUDENT table (one for alias S1 and another for S) but on 2nd one, only once Index Scan and that is on S but then for S1 it is doing index Seek.

Query 1:
SELECT S1."STUDENT_ID"
FROM "dbo"."STUDENT" S1
INNER JOIN (
SELECT DISTINCT D.STUDENT_ID
FROM "dbo"."STUDENT" (NOLOCK) S
CROSS APPLY "dbo".GET_CLASSREPRESENTATIVE(S.STUDENT_ID) D
) TBL ON
S1."STUDENT_ID"=TBL."STUDENT_ID"

Query 2:
SELECT TBL."STUDENT_ID"
FROM "dbo"."STUDENT" S1
INNER JOIN (
SELECT DISTINCT D.STUDENT_ID
FROM "dbo"."STUDENT" (NOLOCK) S
CROSS APPLY "dbo".GET_CLASSREPRESENTATIVE(S.STUDENT_ID) D
) TBL ON
S1."STUDENT_ID"=TBL."STUDENT_ID"

Developer technologies | Transact-SQL
SQL Server | Other
{count} votes

2 answers

Sort by: Most helpful
  1. Erland Sommarskog 121.8K Reputation points MVP Volunteer Moderator
    2021-05-16T20:05:37.73+00:00

    It would certainly have helped to see the actual execution plans. And the code of the table-valued function, which I assume is an inline function?

    I also like to remark that the queries looks unnecessarily complicated. I can't see any point with the JOIN. But I take it that this is a simplified version of your actual case.

    Certainly, one would like to think that the queries are equivalent, and they would lead to the same internal representation, but apparently, this doesn't happen. It could be that, although the actual performance of the two plans are radically different, the estimated cost for the plans are closer, maybe so close that it is more or less a toss-up for the optimizer which plan to pick.

    Keep in mind that the optimizer works with statistics about the data, which has been sampled, and from this information the optimizer makes an estimate of what is the best plan. This often works out well, so often that we spoiled and take it for granted. But there is more than one place when things can go wrong, and when they go wrong, they can go wrong with a bang and we see such drastic differences.

    0 comments No comments

  2. MelissaMa-MSFT 24,221 Reputation points
    2021-05-17T06:36:47.97+00:00

    Hi @Avik Mukherjee ,

    Welcome to Microsoft Q&A!

    As mentioned by other experts, it is recommended to post the execution plan for this kind of perfomance issue.

    It could also be better for you to post the DDL of "dbo"."STUDENT" table together with the code of your function "dbo".GET_CLASSREPRESENTATIVE.

    Normally the index seek is faster than index scan since an index scan reads all the rows in an index – B-tree in the index order whereas index seek traverses a B-tree and walks through leaf nodes seeking only the matching or qualifying rows based on the filter criteria.

    Is STUDENT_ID column the primary key or clustered index of STUDENT table? If not, you could consider to create one clustered/non-clustered index on this column to improve the performance.

    In addition, you could check whether the statistics is accouate or not. If not, you may need to update statistics manually.

    What is the size of S1 part and TBL part? It would take more time to query the results with bigger size.

    Best regards
    Melissa


    If the answer is helpful, please click "Accept Answer" and upvote it.
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.