Conor vs. “Does Join Order Matter?”

Article
12/10/2009

I got this question/claim again this week, so I’ll write a blog post on it while I wait for my build to complete:

It usually goes something like “When I rewrite my query, SQL Server gives me a different plan. Therefore, I should rewrite all my queries until I find the best plan for each of them”. Essentially, the question is whether writing the query differently should impact the plan choice.

I’ll give two answers – a short one and a longer one containing some of the reasons.

Short answer:

No, it should not matter for SQL Server (any version since 7.0). We do a lot of work so that you don’t have to care about this. Sometimes you might get a different plan, but that may or may not be related to the fact that you wrote the query in two equivalent ways.

Longer answer:

Some other database engines used to include rule-based optimizers. So, these would run a set of heuristic rules over the internal query tree and then generate a plan. I believe there even were (are? I haven’t checked recently) 3rd party products you could buy which would try lots of rewrites to see which one performed the best. Additionally, procedural programmers who use SQL are accustomed to finding tweaks in procedural code to make it run faster, and this is a natural behavior of this species of SQL user.

Most modern optimizers are cost-based. Now, this doesn’t mean that the input tree does not matter at all. Philosophically, however, the goal of these optimizers is to find a good plan for the user instead of allow you treat SQL as a procedural language. So, SQL Server (I can’t speak for the other vendors) performs normalization steps for query trees that should usually map equivalent trees into a single starting point before the optimization process really begins. So, A JOIN B and B JOIN A would both be mapped to the same starting point (let’s say we did A JOIN B as the normalized form). This normalization step usually works well and gives good results in terms of plan quality and the customer experience we desire – we want you to have time to think of all of the other parts of the application instead of worrying about this.

So, I can hear you now – the next statement I usually hear is “But this one time, I rewrote this query and I got a different plan and it was faster!” I’m not doubting you, actually. It can happen. However, you are mistaking correlation for causality. It does not follow that rewriting the plan was the specific reason for this different plan choice, nor does it imply that it is a stable plan choice for the lifetime of your application as your data changes, nor does it imply that a future build of SQL Server will do things in such a way so that continues to hold true when you install the next service pack of the product.

I call this condition “luck”. You got lucky... for now. Don’t assume that it means you will be lucky tomorrow.

So, let’s go through the reasons why this is luck and not design:

1. Normalization – As mentioned, we have a series of early-phase rewrites in the Optimizer that will tend to collapse equivalent forms of a query into a single starting form for optimization to use. This logic can have some limitations – SQL Server does not perform every possible rewrite, and it also doesn’t perform every possible normalization step either – you would not like the compilation time. We do the ones most likely to be hit by our customers. So, we will take a series of inner joins and normalize them. We also will try to make it so that the order of predicates in a WHERE clause is handled automatically by the system. We have logic that can decorrelate APPLY (rewrite an APPLY to be a JOIN), combine UNIONs, and otherwise fix all sorts of funny SQL usages that we see in customer- or computer-generated queries. We don’t do every pattern or combination of these in any order, and there are other patterns that we don’t rewrite where perhaps we could. So, in these cases, the starting point of the optimization process may differ from your other written form. The problem with all of this is that you don’t really know – we don’t publish the internal trees. Most of the normal cases DO NOT matter as a result of this.

2. Optimization goals. One interesting part of optimizers is that they trade space and time for optimality on a query plan selection. So, SQL Server’s QP is actually not set up to give you the best plan. Instead, it is set up to give you a good enough plan quickly (where good enough is very close to “best”). In practical terms, if I can spend 1 second optimizing a query that takes 2 minutes to execute, that’s a good deal. If I spend 4 hours optimizing that same query to find a plan that executes in 1 minute 50 seconds, our average customer would be very angry. Now, for most OLTP queries the optimal plan choice is known, obvious, and we generate it all of the time. This characteristic tends to impact things more as you get into queries with longer runtimes.

To complete this thought, even if SQL Server started with two different (but equivalent) starting point trees, the searching process may actually investigate the same set of plans if it thinks it is worthwhile to do so. So, if you get two plans out of your experimentation that the optimizer considers to be roughly the same in cost, it doesn’t matter to the optimizer that they are different. Furthermore, if the tables get bigger, the optimizer may decide to search longer and eventually find the same plan.

The search process is dynamic and pruned based on estimated cost improvement.

3. Non-equivalent rewrites – I know this can’t be you, since all of you are above average. Let’s just say that some people occasionally rewrite their queries and they are actually changing the semantics of their query in doing so. That new query may perform better than the old one, but sometimes that is because the new one is asking a far easier question of the system. Even I screw this up occasionally and then have to go work backwards to figure out what I did wrong. Not all customers have the expertise to recite the nuances of the ANSI SQL specification or the details of SQL Server’s tree semantics. I can only tell you that this is harder than it looks.

4. Optimizer “missing feature”. There are some cases where I see that the Optimizer could have performed a rewrite but didn’t, causing two "semantically equivalent” trees to generate different plans that perform differently. I saw one last week where we didn’t convert WHERE 2009 = datepart(year, datecol) into an index seek, while writing the query as datecol BETWEEN ‘2009-01-01 00:00:00’ and ‘2009-12-31 11:59:59.997’. These are cases where a rewrite could potentially yield a better plan. In this case, the two forms are conceptually equivalent (at least for the datetime type). However, not all rewrites are provably equivalent in the optimizer. So, when you start doing rewrites that involve changing the question you ask to the optimizer, you have to know that what you are doing is actually valid (or else you get into case 3 – asking a different question of the data).

I usually see the most advanced SQL Server users (SQL Server MVPs, etc) able to perform this kind of rewrite. In order to do stage 4 reliably, you have to have a good understanding of what the set of supported transformations is so that you can understand why a query plan is generated (or not generated).

I hope this gives you a bit of insight into how SQL Server views equivalent SQL statements.

Happy Querying!

Conor Cunningham

Conor vs. “Does Join Order Matter?”

Additional resources