Share via

Query Code returning incorrect results

Carlton Patterson 761 Reputation points
2021-01-03T14:52:21.93+00:00

Hello Community,

I have the following logic:

Find customers who didn't place orders from 2019-02-01 to 2019-03-01. Output customer's first name.

I am working with the following dataset:

CREATE TABLE #tmpTable (
    id int,
    first_name varchar(50),
    last_name varchar(50),
    city varchar(50),
    address varchar(50),
    phone_number varchar(50),
    id int,
    cust_id int,
    order_date date,
    order_quantity int,
    order_details varchar(50),
    order_cost int)

INSERT #tmpTable VALUES
(3,'Farida','Joseph','Florida','3153 Rhapsody Street','813-368-1200',1,3,CONVERT(DATETIME, '2019-03-04', 120),1,'Coat',100),
(3,'Farida','Joseph','Florida','3153 Rhapsody Street','813-368-1200',2,3,CONVERT(DATETIME, '2019-03-01', 120),1,'Shoes',80),
(3,'Farida','Joseph','Florida','3153 Rhapsody Street','813-368-1200',3,3,CONVERT(DATETIME, '2019-03-10', 120),1,'Skirt',30),
(7,'Jill','Michael','Florida','','813-297-0692',4,7,CONVERT(DATETIME, '2019-02-01', 120),1,'Coat',100),
(7,'Jill','Michael','Florida','','813-297-0692',5,7,CONVERT(DATETIME, '2019-03-10', 120),1,'Shoes',80),
(15,'Mia','Owen','Hawaii','','808-640-5201',6,15,CONVERT(DATETIME, '2019-02-01', 120),2,'Boats',100),
(15,'Mia','Owen','Hawaii','','808-640-5201',7,15,CONVERT(DATETIME, '2019-01-11', 120),3,'Shirts',60),
(15,'Mia','Owen','Hawaii','','808-640-5201',8,15,CONVERT(DATETIME, '2019-03-11', 120),1,'Slipper',20),
(15,'Mia','Owen','Hawaii','','808-640-5201',9,15,CONVERT(DATETIME, '2019-03-01', 120),2,'Jeans',80),
(15,'Mia','Owen','Hawaii','','808-640-5201',10,15,CONVERT(DATETIME, '2019-03-09', 120),3,'Shirts',50),
(5,'Henry','Jackson','Hawaii','','808-601-7513',11,5,CONVERT(DATETIME, '2019-02-01', 120),1,'Shoes',80),
(12,'Eva','Lucas','Arizona','4379 Skips Lane','301-509-8805',12,12,CONVERT(DATETIME, '2019-01-11', 120),3,'Shirts',60),
(12,'Eva','Lucas','Arizona','4379 Skips Lane','301-509-8805',13,12,CONVERT(DATETIME, '2019-03-11', 120),1,'Slipper',20),
(4,'William','Daniel','Colorado','','813-368-1200',14,4,CONVERT(DATETIME, '2019-02-01', 120),1,'Shoes',80),
(4,'William','Daniel','Colorado','','813-368-1200',15,4,CONVERT(DATETIME, '2019-01-11', 120),3,'Shirts',60),
(3,'Farida','Joseph','Florida','3153 Rhapsody Street','813-368-1200',16,3,CONVERT(DATETIME, '2019-04-19', 120),3,'Shirts',50),
(7,'Jill','Michael','Florida','','813-297-0692',17,7,CONVERT(DATETIME, '2019-04-19', 120),1,'Suit',150),
(15,'Mia','Owen','Hawaii','','808-640-5201',18,15,CONVERT(DATETIME, '2019-04-19', 120),1,'Skirt',30),
(15,'Mia','Owen','Hawaii','','808-640-5201',19,15,CONVERT(DATETIME, '2019-04-19', 120),2,'Dressess',200),
(12,'Eva','Lucas','Arizona','4379 Skips Lane','301-509-8805',20,12,CONVERT(DATETIME, '2019-01-11', 120),1,'Coat',100)

SELECT * FROM #tmpTable

DROP TABLE #tmpTable

My solution attempt is as follows:

SELECT
  customers.first_name
FROM (SELECT
         orders.cust_id
       FROM dbo.orders
       WHERE orders.order_date BETWEEN '2019-02-01' AND '2019-03-01') SubQuery
    ,dbo.customers
     INNER JOIN dbo.orders
       ON customers.id = orders.cust_id
WHERE customers.id NOT IN (SubQuery.cust_id)
GROUP BY customers.first_name

However, the query is incorrect.

The correct result should like the following:

FirstName
John
Emma
Liam
Mark
Eva
Jack
Mona
Lili
Justin
Frank

Can someone take a look at my code and let me know where I'm going wrong?

Developer technologies | Transact-SQL
Developer technologies | Transact-SQL

A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.


Answer accepted by question author

  1. Erland Sommarskog 134.1K Reputation points MVP Volunteer Moderator
    2021-01-03T19:27:40.867+00:00

    why is joining the tables NOT are requirement in this example?

    (I suppose "are" should be "a" here.)

    Because you can write the query without a join!

    Think of it this way: you have a set of customers. You are looking for a subset of that set. Thus, you should take that set and filter it. When you join two tables you are creating a new set. Or, well, in the case when you are joining orders and customers, you still have the set of orders.

    Now, you can write this with a join as well, the one you wanted to avoid:

    SELECT c.first_name
    FROM   dbo.customers c
    LEFT   JOIN dbo.orders o ON  c.id = o.cust_id
                           AND  o.order_date BETWEEN '20190201' AND '20190301'
    WHERE  o.cust_id IS NULL
    

    But as I remarked, I find this style awkard. We take the set of customers and extend it with the orders placed in February, but retaining the original customers. The we filter the extended set by those that do not have a customer id on the order side.

    Was this answer helpful?


6 additional answers

Sort by: Most helpful
  1. David Browne - msft 3,851 Reputation points
    2021-01-03T16:14:21.933+00:00

    Simply

    SELECT customers.first_name
    FROM dbo.customers
     WHERE customers.id NOT IN 
        (SELECT
         orders.cust_id
         FROM dbo.orders
         WHERE orders.order_date BETWEEN '2019-02-01' AND '2019-03-01')
     
    

    outputs

    first_name
    --------------------------------------------------
    John
    Emma
    Liam
    Mark
    Eva
    Jack
    Mona
    Lili
    Justin
    Frank
    

    Was this answer helpful?


  2. Carlton Patterson 761 Reputation points
    2021-01-03T16:06:35.12+00:00

    Hi TomCooper,

    You suggestion is incorrect

    The tables are as follows:

    CREATE TABLE orders (
        id int,
        cust_id int,
        order_date date,
        order_quantity int,
        order_details varchar(50),
        order_cost int)
    
    INSERT orders VALUES
    (1,3,CONVERT(DATETIME, '2019-03-04', 120),1,'Coat',100),
    (2,3,CONVERT(DATETIME, '2019-03-01', 120),1,'Shoes',80),
    (3,3,CONVERT(DATETIME, '2019-03-10', 120),1,'Skirt',30),
    (4,7,CONVERT(DATETIME, '2019-02-01', 120),1,'Coat',100),
    (5,7,CONVERT(DATETIME, '2019-03-10', 120),1,'Shoes',80),
    (6,15,CONVERT(DATETIME, '2019-02-01', 120),2,'Boats',100),
    (7,15,CONVERT(DATETIME, '2019-01-11', 120),3,'Shirts',60),
    (8,15,CONVERT(DATETIME, '2019-03-11', 120),1,'Slipper',20),
    (9,15,CONVERT(DATETIME, '2019-03-01', 120),2,'Jeans',80),
    (10,15,CONVERT(DATETIME, '2019-03-09', 120),3,'Shirts',50),
    (11,5,CONVERT(DATETIME, '2019-02-01', 120),1,'Shoes',80),
    (12,12,CONVERT(DATETIME, '2019-01-11', 120),3,'Shirts',60),
    (13,12,CONVERT(DATETIME, '2019-03-11', 120),1,'Slipper',20),
    (14,4,CONVERT(DATETIME, '2019-02-01', 120),1,'Shoes',80),
    (15,4,CONVERT(DATETIME, '2019-01-11', 120),3,'Shirts',60),
    (16,3,CONVERT(DATETIME, '2019-04-19', 120),3,'Shirts',50),
    (17,7,CONVERT(DATETIME, '2019-04-19', 120),1,'Suit',150),
    (18,15,CONVERT(DATETIME, '2019-04-19', 120),1,'Skirt',30),
    (19,15,CONVERT(DATETIME, '2019-04-19', 120),2,'Dressess',200),
    (20,12,CONVERT(DATETIME, '2019-01-11', 120),1,'Coat',100)
    
    SELECT * FROM orders
    
    DROP TABLE orders
    
    
    CREATE TABLE customers (
        id int,
        first_name varchar(50),
        last_name varchar(50),
        city varchar(50),
        address varchar(50),
        phone_number varchar(50))
    
    INSERT customers VALUES
    (8,'John','Joseph','Arizona','','928-386-8164'),
    (7,'Jill','Michael','Florida','','813-297-0692'),
    (4,'William','Daniel','Colorado','','813-368-1200'),
    (5,'Henry','Jackson','Hawaii','','808-601-7513'),
    (13,'Emma','Isaac','Hawaii','','808-690-5201'),
    (14,'Liam','Samuel','Hawaii','','808-555-5201'),
    (15,'Mia','Owen','Hawaii','','808-640-5201'),
    (1,'Mark','Thomas','Arizona','4476 Parkway Drive','602-993-5916'),
    (12,'Eva','Lucas','Arizona','4379 Skips Lane','301-509-8805'),
    (6,'Jack','Aiden','Arizona','4833 Coplin Avenue','480-303-1527'),
    (2,'Mona','Adrian','California ','1958 Peck Court','714-409-9432'),
    (10,'Lili','Oliver','California ','3832 Euclid Avenue','530-695-1180'),
    (3,'Farida','Joseph','Florida','3153 Rhapsody Street','813-368-1200'),
    (9,'Justin','Alexander','Colorado','4470 McKinley Avenue','970-433-7589'),
    (11,'Frank','Jacob','Hawaii','1299 Randall Drive','808-590-5201')
    
    SELECT * FROM customers
    
    DROP TABLE customers
    

    Was this answer helpful?


  3. Erland Sommarskog 134.1K Reputation points MVP Volunteer Moderator
    2021-01-03T16:05:33.713+00:00

    There is more than one thing that is going wrong in that query. If we take this part:

    FROM (SELECT
              orders.cust_id
            FROM dbo.orders
            WHERE orders.order_date BETWEEN '2019-02-01' AND '2019-03-01') SubQuery
         ,dbo.customers
    

    This part produces a cartesian product between the customers that place orders in February and all customers. So if 10 customers placed orders in February and there are 120 customers in total, you now have 1200 rows.

    INNER JOIN dbo.orders
            ON customers.id = orders.cust_id
    

    And then you join all customers to the orders table, so after the comma, you now have as many rows as there are orders in the database. So if there are 1500 orders in total, you now have 15000 rows.

    WHERE customers.id NOT IN (SubQuery.cust_id)
    

    This is the same as:

    WHERE customers.id <> SubQuery.cust_id
    

    Keep in mind that x IN (val1, val2, ...) is the same as x = val1 OR x = val2)

    So what you are doing here is to filter out the rows when the customer id on the left side of the is the same as customer id on the right side. Exactly how many rows you filter out depends on how many orders the customers who had orders in February have in total, but it is a minor part of the rows.

    There is little reason why you would just a LEFT JOIN here. Yes, it is possible, but it is an awkward way of doing it, which is more difficult to read. You can use EXCEPT that Tom showed, but that is also kind of odd. NOT IN can be used in this case, but there is a trap will NULL values which is unlikely to strike here (one will have to assume that Orders.cust_id is not nullable.)

    The normal way to write the query like this is to use NOT EXISTS:

    SELECT c.first_name
    FROM   dbo.Customers c
    WHERE NOT EXISTS (SELECT *
                      FROM Orders o
                      WHERE c.id = o.cust_id
                          AND  o.order_date BETWEEN '20190201' AND '20190301')
    

    Was this answer helpful?

    0 comments No comments

  4. Tom Cooper 8,501 Reputation points
    2021-01-03T15:30:58.977+00:00

    You didn't give us your customer table and orders table. And, of course, saying don't use left join is effectively saying "Please don't do this in the most efficient manner, but do it some other, slower way." But try

    Select Distinct first_name
    From customers
    Except
    Select Distinct first_name
    From customers
    Inner Join Orders On customers.id = orders.cust_id And orders.order_date Between '20190201' And '20190301'
    

    Tom

    Was this answer helpful?


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.