Understanding LEFT JOINS

Carlton Patterson 741 Reputation points
2021-01-05T13:00:42.073+00:00

Hello Forum,

I would like to get a practical understanding of when to use Left Joins. I understand Left Joins from the point where we use Left Joins to find missing data. However, the following example uses Left Joins to provide the results needed for the the query.

The logic is as follows:

Find the number of orders, the number of customers and the total cost of orders per each city that has made at least 5 orders.
Output each calculation along with the corresponding city name.

The tables used for the query are as follows:

CREATE TABLE customers (
    id int,
    first_name varchar(50),
    last_name varchar(50),
    city varchar(50),
    address varchar(50),
    phone_number varchar(50))

INSERT customers VALUES
(8,'John','Joseph','Arizona','','928-386-8164'),
(7,'Jill','Michael','Florida','','813-297-0692'),
(4,'William','Daniel','Colorado','','813-368-1200'),
(5,'Henry','Jackson','Hawaii','','808-601-7513'),
(13,'Emma','Isaac','Hawaii','','808-690-5201'),
(14,'Liam','Samuel','Hawaii','','808-555-5201'),
(15,'Mia','Owen','Hawaii','','808-640-5201'),
(1,'Mark','Thomas','Arizona','4476 Parkway Drive','602-993-5916'),
(12,'Eva','Lucas','Arizona','4379 Skips Lane','301-509-8805'),
(6,'Jack','Aiden','Arizona','4833 Coplin Avenue','480-303-1527'),
(2,'Mona','Adrian','California','1958 Peck Court','714-409-9432'),
(10,'Lili','Oliver','California','3832 Euclid Avenue','530-695-1180'),
(3,'Farida','Joseph','Florida','3153 Rhapsody Street','813-368-1200'),
(9,'Justin','Alexander','Colorado','4470 McKinley Avenue','970-433-7589'),
(11,'Frank','Jacob','Hawaii','1299 Randall Drive','808-590-5201')

SELECT * FROM customers

CREATE TABLE orders (
id int,
cust_id int,
order_date date,
order_quantity int,
order_details varchar(50),
order_cost int)

INSERT orders VALUES
(1,3,CONVERT(DATETIME, '2019-03-04', 120),4,'Coat',100),
(2,3,CONVERT(DATETIME, '2019-03-01', 120),1,'Shoes',80),
(3,3,CONVERT(DATETIME, '2019-03-04', 120),1,'Skirt',30),
(4,7,CONVERT(DATETIME, '2019-02-01', 120),1,'Coat',100),
(5,7,CONVERT(DATETIME, '2019-03-10', 120),1,'Shoes',80),
(6,15,CONVERT(DATETIME, '2019-02-01', 120),2,'Boats',100),
(7,15,CONVERT(DATETIME, '2019-01-11', 120),3,'Shirts',60),
(8,15,CONVERT(DATETIME, '2019-03-11', 120),1,'Slipper',20),
(9,15,CONVERT(DATETIME, '2019-03-01', 120),2,'Jeans',80),
(10,15,CONVERT(DATETIME, '2019-03-09', 120),3,'Shirts',50),
(11,5,CONVERT(DATETIME, '2019-02-01', 120),1,'Shoes',80),
(12,12,CONVERT(DATETIME, '2019-01-11', 120),3,'Shirts',60),
(13,12,CONVERT(DATETIME, '2019-03-11', 120),1,'Slipper',20),
(14,4,CONVERT(DATETIME, '2019-02-01', 120),1,'Shoes',80),
(15,4,CONVERT(DATETIME, '2019-01-11', 120),3,'Shirts',60),
(16,3,CONVERT(DATETIME, '2019-04-19', 120),3,'Shirts',50),
(17,7,CONVERT(DATETIME, '2019-04-19', 120),1,'Suit',150),
(18,15,CONVERT(DATETIME, '2019-04-19', 120),1,'Skirt',30),
(19,15,CONVERT(DATETIME, '2019-04-19', 120),2,'Dresses',200),
(20,12,CONVERT(DATETIME, '2019-01-11', 120),5,'Coat',20),
(21,7,CONVERT(DATETIME, '2019-04-01', 120),1,'Suit',50),
(22,7,CONVERT(DATETIME, '2019-04-02', 120),1,'Skirt',30),
(23,7,CONVERT(DATETIME, '2019-04-03', 120),1,'Dresses',50),
(24,7,CONVERT(DATETIME, '2019-04-04', 120),1,'Coat',25)

SELECT * FROM orders

I know the correct query results involve using
LEFT JOIN orders ON customers.id = orders.cust_id

However, I need to understand why the best approach is to use Left Join for this solution.

Please note, I have posted similar questions in the past asking for help and get very negative or not very constructive feedback. Therefore, if you can't help without providing nasty comments, then don't comment at all!

Developer technologies Transact-SQL
{count} votes

Accepted answer
  1. Tom Cooper 8,481 Reputation points
    2021-01-05T16:11:11.53+00:00

    For your query (cities with at least 5 orders), you don't need a left join. An inner join will work fine. But if you want to find all cities even if they don't have any orders, you need a left outer join. See below for the right way to find cities with at least 5 orders, the wrong way to find all cities, and the right way to find all cities.

    -- Get cities with at least 5 orders
    Select c.city, 
       Count(Distinct c.id) As NbrOfCustomers, 
       Count(o.id) As NbrOfOrders,
       Sum(o.order_cost) As TotalCost
    From customers c
    Inner Join orders o On c.id = o.cust_id
    Group By city
    Having Count(o.id) >= 5;
    
    -- If you want all cities even they don't have any orders
    -- the following is wrong.  It won't find cities with no orders
    -- California is missing - that's because there are no orders for
    -- California, so the Inner Join removes California
    Select c.city, 
       Count(Distinct c.id) As NbrOfCustomers, 
       Count(o.id) As NbrOfOrders,
       Sum(o.order_cost) As TotalCost
    From customers c
    Inner Join orders o On c.id = o.cust_id
    Group By city;
    
    -- To include cities with no orders, use a left outer join
    -- Then California is included in the result
    Select c.city, 
       Count(Distinct c.id) As NbrOfCustomers, 
       Count(o.id) As NbrOfOrders,
       IsNull(Sum(o.order_cost), 0) As TotalCost
    From customers c
    Left Join orders o On c.id = o.cust_id
    Group By city;
    

    Tom

    0 comments No comments

9 additional answers

Sort by: Most helpful
  1. Erland Sommarskog 121.4K Reputation points MVP Volunteer Moderator
    2021-01-06T10:32:41.32+00:00

    I just couldn’t see how HAVING orders.order_quantity >= 5 selects the orders related to my OrdersPerCity.

    I'm not sure that you got it, because Tom appears to have simplified things a bit too much in my taste. Maybe you got it, but just in case.

    You had:

    FROM (SELECT
    customers.city
    ,COUNT(orders.order_quantity) AS OrdersPerCity
    ,COUNT(customers.id) AS CustomersPerCity
    ,SUM(orders.order_cost) AS TotalCostOfOrders
    FROM dbo.customers
    INNER JOIN dbo.orders
    ON customers.id = orders.cust_id
    GROUP BY customers.city) SubQuery
    WHERE SubQuery.OrdersPerCity >= 5
    

    This could be changed to

    SELECT
    customers.city
    ,COUNT(orders.order_quantity) AS OrdersPerCity
    ,COUNT(customers.id) AS CustomersPerCity
    ,SUM(orders.order_cost) AS TotalCostOfOrders
    FROM dbo.customers
    INNER JOIN dbo.orders
    ON customers.id = orders.cust_id
    GROUP BY customers.city
    HAVING COUNT(orders.order_quantity)  >= 5
    

    That is, you need the COUNT. Only

    HAVING orders.order_quantity >= 5
    

    In the HAVING clause you can only refer to aggregates and columns in the GROUP BY list.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.