question

IgorGelin-0063 avatar image
0 Votes"
IgorGelin-0063 asked IgorGelin-0063 commented

Unexplainabe slow execution of a query

Hi all,

I have two queries which return same output, except that query 1 runs in less than 1 second and query 2 runs in 26 minutes.

  1. select OIdA into #test from A where A.User = 123456
    SELECT AD.* FROM AD where AD.OIdA in (select OIdA from #test)

  2. SELECT AD.* FROM AD, A WHERE A.OIdA = AD.OIdA AND A.User = 123456

Additional info
1. SQL Server 2016.
2. A and AD are large tables with millions of rows. Some records from tables A and AD are deleted on daily basis.
3. OIdA is primary key on A.
4. OIdAD is primary key on AD.
5. AD has an index on columns OIdA, OIdAD.

Thank you for your help.

sql-server-transact-sql
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

To be able to answer this question, we would need to see the actual query plans in XML format. You can attach them here (if you change the file extension) or upload them to http://www.brentozar.com/pastetheplan.

0 Votes 0 ·
ErlandSommarskog avatar image
1 Vote"
ErlandSommarskog answered IgorGelin-0063 commented

It would have helped to see the actual execution plans, and not only the estimated ones. Often in cases like this, there are miss-estimates, and they are revealed by the actual plans.

It seems though, that the optimizer expects that the condition in table A to return 2525. The temp table on the other hand has an estimate of 255 rows, and I guess that estimate comes from a previous execution with the same value for UserID.

For that few rows, the optimizer thinks that a nested loop join + key lookup is the cheapest solution., but for the tenfold number of rows, the optimizer thinks it's cheaper to scan the table AD once for a hash joins. The optimizer is predisposed against loop joins and key lookups, and I would not have been so surprised, if AD had had some 10 million rows, but now we are talking 867 million. But one answer may be found in the estimate for how many rows the optimizer estimates the operation to produce in total: For the join with the temp table, it estimates 868000 rows, and for the direct join a whopping number of 12 million. 12 million key lookups is not fun.

What is the actual number returned?

As for why the estimates are out of whack, it could be that statistics are out of date, and you could try running UPDATE STATISTICS on the tables. And to get the most accurate statistics you could try

UPDATE STATISTICS A IX_A_User A WITH FULLSCAN
UPDATE STATISTICS AD IX_A WITH FULLSCAN

But I am not sure how much this is going to help. Also, why the nested loop join may be the best choice for this user id, it may be different for other user ids. Then again, I don't know how you actually use this query. May this particular user is the important one?

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Erlang,

This user is no more important than other users.
The output of the query is about 3K rows.
As a matter of fact, for these particular tables nested loop join + key lookup is the best solution.
Scanning the table AD is an awful choice.
IX_A_User and IX_A are indexes. Are they suppose to be always up to date? Why would I need to update statistics on them?

Thank you for your help.

0 Votes 0 ·
IgorGelin-0063 avatar image
0 Votes"
IgorGelin-0063 answered

I attached the query plans in XML format.

146929-query1.txt146930-query2.txt



query1.txt (30.6 KiB)
query2.txt (16.9 KiB)
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

ErlandSommarskog avatar image
1 Vote"
ErlandSommarskog answered IgorGelin-0063 commented

IX_A_User and IX_A are indexes. Are they suppose to be always up to date? Why would I need to update statistics on them?

The indexes are of course always up to date. However, the statistics are necessarily not. SQL Server updates these when enough changes have occurred, and by default it uses sampled statistics. Which often is good enough, but not always.

As a matter of fact, for these particular tables nested loop join + key lookup is the best solution.

I am not so sure with the given estimates. For this user, there is a gross over-estimate, but that makes me think that there are user with tons of rows. You can try this query (although it will as long time as query 2):

SELECT A."User", COUNT(*)
FROM   A
JOIN   AD ON AD.FOldA = A.OldA
GROUP  BY A."User"
ORDER BY 2 DESC

I suspect you will find some users with very many rows. Then again, these may be system users that you will never query about.

Rather than using the temp table, you can force the loop join by adding the hint

OPTION (LOOP JOIN)

to the end of the query.

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

All users have around 2K-3K rows each. We just have a ton of "Users".
I used your suggestion to update statistics on an index. It helped.

Thank you.

0 Votes 0 ·