Possible joins in a query, how many does the query optimizers choose from?

Computer Mike 86 Reputation points
2022-02-03T20:43:56.603+00:00

Hey Guys,

I remember when I was learning about joins, I learned that if you have two tables sql server can join that in 2 ways. If you have three tables sql server can join in 9 ways and the more joins.

The more tables, the more possible ways to join then and it increase exponentially.

For example, how many possible join does the query optimizer look at when joining 6 tables? Looking for an article that describes this.

Anyone know of an article. I think i used to see something on Wikipedia?

Any help appreciated.

Mike

Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,601 questions
0 comments No comments
{count} votes

5 answers

Sort by: Most helpful
  1. Erland Sommarskog 107.2K Reputation points
    2022-02-03T22:51:14.747+00:00

    There is no straightforward to your question. To start with, it is not correct that there are only two ways to join two tables, because SQL Server have three join types: Loop join, Hash join, and Merge Join. Or even four, if you count Adaptive join, which is a join type where SQL Server dynamically chooses between Loop or Hash join.

    For a query with many tables, it is not possible for SQL Server to investigate all possible plans. That could take days. There is some heuristics and rules that SQL Server follows to reduce the plan space and it will stop when it thinks it has found a good enough plan.

    Just as a piece of trivia: the other day I looked at a query which joined a temp table with a large table table to update the same temp table. There might have been one more table in the query, but it was not a very complicated query. I found in Query Store that SQL Server had in the last 30 days used 19 different execution plans for this query.

    1 person found this answer helpful.

  2. Tom Phillips 17,721 Reputation points
    2022-02-03T20:54:49.24+00:00

    There are entire books written on the subject.

    I would start here:
    https://www.red-gate.com/library/inside-the-sql-server-query-optimizer


  3. Computer Mike 86 Reputation points
    2022-02-03T22:16:27.917+00:00

    i found this...
    execution-plans

    Under, The sequence in which the source tables are accessed.
    3 tables, 4 possible plans...

    how many possible plans for 4 tables? 5 tables? 6 tables, etc.

    0 comments No comments

  4. Computer Mike 86 Reputation points
    2022-02-04T03:01:12.18+00:00

    Three tables... 4 options. The other sequences in which the database server could access the tables are:
    TableC, TableB, TableA, or
    TableB, TableA, TableC, or
    TableB, TableC, TableA, or
    TableC, TableA, TableB

    I got this from link above, please see.

    I can figure out with a formula manually for 4 tables, 5 tables, etc. I know I have seen article and table to describe, but at a loss on terminology.


  5. Erland Sommarskog 107.2K Reputation points
    2022-02-04T22:15:00.467+00:00

    If I was requested to write a query, that required 14 tables, in the query I would break the query into probably 3 ctes. I learned from a Microsoft sql book ( There used to be a Wikipedia page that described this as well and even had a table) that ... the number of possible query execution plans exponentially increases with the number of tables in the join (other other things being equal. indexes etc). And if there are too many possible execution plans, the query engine will stop looking at possible execution plans, take the best and just execute. So if you have too many tables you may not get the best plan.

    It's correct that if you have many tables in a query, there are chances that the optimizer will not get it 100% perfect, because the estimation errors can multiply rapidly. That is, it is not only a matter of trying different order, it also a matter of getting the estimates correct.

    However, the CTEs you are planning will not have any impact of this. A CTE is just a local macro, and the algebrizer will expand the CTE into the query before the optimizer sees it. And the CTE is a logical construct. The optimizer may recast the computation order, and never compute the CTE as such.

    What sometimes is a good idea is to materialise intermediate results in a temp table. Since a temp table has statistics, this can help the optimizer to estimate the next step more accurately. But obviously, materialising the data into a temp table will also add overhead. And if you make an incorrect choice, you can degrade performance quite a bit.

    This is a demo query from one of my presentations:

    WITH OrderAmounts AS (
       SELECT OD.OrderID, SUM(OD.Quantity * OD.UnitPrice) AS Amount
       FROM   dbo.[Order Details] OD
       GROUP  BY OD.OrderID
    )
    SELECT O.OrderID, C.CustomerID, C.CustomerName, C.City, OA.Amount
    FROM   dbo.Orders O
    JOIN   dbo.Customers C ON O.CustomerID = C.CustomerID
    JOIN   OrderAmounts OA ON O.OrderID = OA.OrderID
    WHERE  O.OrderDate = '19970421'
    ORDER BY O.OrderID;
    

    The database has one million orders and the CTE says "Compute the sum of all orders". Yet this query returns instantly, because the optimizer realises that it only has to compute the sum for the orders on the given date.

    Next part of my demo goes like this:

    DROP TABLE IF EXISTS #OrderAmounts;
    CREATE TABLE #OrderAmounts (OrderID int           NOT NULL PRIMARY KEY,
                                Amount  decimal(20,2) NOT NULL);
    
    INSERT #OrderAmounts (OrderID, Amount)
       SELECT OD.OrderID, SUM(OD.Quantity * OD.UnitPrice) AS Amount
       FROM   dbo.[Order Details] OD
       GROUP  BY OD.OrderID;
    
    SELECT O.OrderID, C.CustomerID, C.CustomerName, C.City, OA.Amount
    FROM   dbo.Orders O
    JOIN   dbo.Customers C ON O.CustomerID = C.CustomerID
    JOIN   #OrderAmounts OA ON O.OrderID = OA.OrderID
    WHERE  O.OrderDate = '19970421'
    ORDER BY O.OrderID;
    

    This runs in five seconds on my demo machine, because this time SQL Server are very clearly instructed to compute the sum of all orders.

    There is a second aspect when you have a complex query: it may also help you to get the query right by saving data into intermediate tables that permits you inspect the data to verify that your logic is correct.

    0 comments No comments