Understanding LEFT JOINS

Carlton Patterson 741 Reputation points
2021-01-05T13:00:42.073+00:00

Hello Forum,

I would like to get a practical understanding of when to use Left Joins. I understand Left Joins from the point where we use Left Joins to find missing data. However, the following example uses Left Joins to provide the results needed for the the query.

The logic is as follows:

Find the number of orders, the number of customers and the total cost of orders per each city that has made at least 5 orders.
Output each calculation along with the corresponding city name.

The tables used for the query are as follows:

CREATE TABLE customers (
    id int,
    first_name varchar(50),
    last_name varchar(50),
    city varchar(50),
    address varchar(50),
    phone_number varchar(50))

INSERT customers VALUES
(8,'John','Joseph','Arizona','','928-386-8164'),
(7,'Jill','Michael','Florida','','813-297-0692'),
(4,'William','Daniel','Colorado','','813-368-1200'),
(5,'Henry','Jackson','Hawaii','','808-601-7513'),
(13,'Emma','Isaac','Hawaii','','808-690-5201'),
(14,'Liam','Samuel','Hawaii','','808-555-5201'),
(15,'Mia','Owen','Hawaii','','808-640-5201'),
(1,'Mark','Thomas','Arizona','4476 Parkway Drive','602-993-5916'),
(12,'Eva','Lucas','Arizona','4379 Skips Lane','301-509-8805'),
(6,'Jack','Aiden','Arizona','4833 Coplin Avenue','480-303-1527'),
(2,'Mona','Adrian','California','1958 Peck Court','714-409-9432'),
(10,'Lili','Oliver','California','3832 Euclid Avenue','530-695-1180'),
(3,'Farida','Joseph','Florida','3153 Rhapsody Street','813-368-1200'),
(9,'Justin','Alexander','Colorado','4470 McKinley Avenue','970-433-7589'),
(11,'Frank','Jacob','Hawaii','1299 Randall Drive','808-590-5201')

SELECT * FROM customers

CREATE TABLE orders (
id int,
cust_id int,
order_date date,
order_quantity int,
order_details varchar(50),
order_cost int)

INSERT orders VALUES
(1,3,CONVERT(DATETIME, '2019-03-04', 120),4,'Coat',100),
(2,3,CONVERT(DATETIME, '2019-03-01', 120),1,'Shoes',80),
(3,3,CONVERT(DATETIME, '2019-03-04', 120),1,'Skirt',30),
(4,7,CONVERT(DATETIME, '2019-02-01', 120),1,'Coat',100),
(5,7,CONVERT(DATETIME, '2019-03-10', 120),1,'Shoes',80),
(6,15,CONVERT(DATETIME, '2019-02-01', 120),2,'Boats',100),
(7,15,CONVERT(DATETIME, '2019-01-11', 120),3,'Shirts',60),
(8,15,CONVERT(DATETIME, '2019-03-11', 120),1,'Slipper',20),
(9,15,CONVERT(DATETIME, '2019-03-01', 120),2,'Jeans',80),
(10,15,CONVERT(DATETIME, '2019-03-09', 120),3,'Shirts',50),
(11,5,CONVERT(DATETIME, '2019-02-01', 120),1,'Shoes',80),
(12,12,CONVERT(DATETIME, '2019-01-11', 120),3,'Shirts',60),
(13,12,CONVERT(DATETIME, '2019-03-11', 120),1,'Slipper',20),
(14,4,CONVERT(DATETIME, '2019-02-01', 120),1,'Shoes',80),
(15,4,CONVERT(DATETIME, '2019-01-11', 120),3,'Shirts',60),
(16,3,CONVERT(DATETIME, '2019-04-19', 120),3,'Shirts',50),
(17,7,CONVERT(DATETIME, '2019-04-19', 120),1,'Suit',150),
(18,15,CONVERT(DATETIME, '2019-04-19', 120),1,'Skirt',30),
(19,15,CONVERT(DATETIME, '2019-04-19', 120),2,'Dresses',200),
(20,12,CONVERT(DATETIME, '2019-01-11', 120),5,'Coat',20),
(21,7,CONVERT(DATETIME, '2019-04-01', 120),1,'Suit',50),
(22,7,CONVERT(DATETIME, '2019-04-02', 120),1,'Skirt',30),
(23,7,CONVERT(DATETIME, '2019-04-03', 120),1,'Dresses',50),
(24,7,CONVERT(DATETIME, '2019-04-04', 120),1,'Coat',25)

SELECT * FROM orders

I know the correct query results involve using
LEFT JOIN orders ON customers.id = orders.cust_id

However, I need to understand why the best approach is to use Left Join for this solution.

Please note, I have posted similar questions in the past asking for help and get very negative or not very constructive feedback. Therefore, if you can't help without providing nasty comments, then don't comment at all!

Developer technologies Transact-SQL
{count} votes

Accepted answer
  1. Tom Cooper 8,481 Reputation points
    2021-01-05T16:11:11.53+00:00

    For your query (cities with at least 5 orders), you don't need a left join. An inner join will work fine. But if you want to find all cities even if they don't have any orders, you need a left outer join. See below for the right way to find cities with at least 5 orders, the wrong way to find all cities, and the right way to find all cities.

    -- Get cities with at least 5 orders
    Select c.city, 
       Count(Distinct c.id) As NbrOfCustomers, 
       Count(o.id) As NbrOfOrders,
       Sum(o.order_cost) As TotalCost
    From customers c
    Inner Join orders o On c.id = o.cust_id
    Group By city
    Having Count(o.id) >= 5;
    
    -- If you want all cities even they don't have any orders
    -- the following is wrong.  It won't find cities with no orders
    -- California is missing - that's because there are no orders for
    -- California, so the Inner Join removes California
    Select c.city, 
       Count(Distinct c.id) As NbrOfCustomers, 
       Count(o.id) As NbrOfOrders,
       Sum(o.order_cost) As TotalCost
    From customers c
    Inner Join orders o On c.id = o.cust_id
    Group By city;
    
    -- To include cities with no orders, use a left outer join
    -- Then California is included in the result
    Select c.city, 
       Count(Distinct c.id) As NbrOfCustomers, 
       Count(o.id) As NbrOfOrders,
       IsNull(Sum(o.order_cost), 0) As TotalCost
    From customers c
    Left Join orders o On c.id = o.cust_id
    Group By city;
    

    Tom

    0 comments No comments

9 additional answers

Sort by: Most helpful
  1. Guoxiong 8,206 Reputation points
    2021-01-05T14:49:07.467+00:00

    Based on w3schools.com, the LEFT JOIN keyword returns all records from the left table (table1), and the matched records from the right table (table2). The result is NULL from the right side, if there is no match.

    In your case, if you want to know the number of the orders for each customer, you need to use LEFT JOIN since some of the customers (id = 1, 2, 6, 8, 9, 10, 11, 13 and 14) do not have any order. So those results are NULL. If you use INNER JOIN, those results will not return.

    0 comments No comments

  2. Carlton Patterson 741 Reputation points
    2021-01-05T16:22:21.087+00:00

    HI Tom,

    Really appreciate your feedback/suggestions.

    I will take a look at your suggestion. The following is another suggestion (it will be interesting to get your opinion)

    SELECT customers.city,
    count(orders.cust_id) AS orders_per_city,
    count(customers.id) AS customers_per_city,
    sum(orders.order_cost) AS orders_cost_per_city
    FROM customers
    LEFT JOIN orders ON customers.id = orders.cust_id
    GROUP BY customers.city
    HAVING count(orders.id) >=5

    0 comments No comments

  3. Carlton Patterson 741 Reputation points
    2021-01-05T16:25:40.92+00:00

    I don't understand how HAVING count(orders.id) >=5 works for the solution???

    0 comments No comments

  4. Carlton Patterson 741 Reputation points
    2021-01-05T16:32:33.677+00:00

    Tom,

    I would have completed the answer with the following query

    SELECT
    SubQuery.city
    ,SubQuery.OrdersPerCity
    ,SubQuery.CustomersPerCity
    ,SubQuery.TotalCostOfOrders
    FROM (SELECT
    customers.city
    ,COUNT(orders.order_quantity) AS OrdersPerCity
    ,COUNT(customers.id) AS CustomersPerCity
    ,SUM(orders.order_cost) AS TotalCostOfOrders
    FROM dbo.customers
    INNER JOIN dbo.orders
    ON customers.id = orders.cust_id
    GROUP BY customers.city) SubQuery
    WHERE SubQuery.OrdersPerCity >= 5
    GROUP BY SubQuery.city
    ,SubQuery.OrdersPerCity
    ,SubQuery.CustomersPerCity
    ,SubQuery.TotalCostOfOrders

    I'm guessing this isn't the most efficient way to write this code for this query? What do you think?


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.