how works "not in" with nulls value?

PawelSz 21 Reputation points
2021-12-09T11:42:38.797+00:00

Hi everyone,

I don't udenrstand one thing.

SQL code example:

insert into tab1 (col1) values (1), (2), (3), (4), (5)

insert into tab2 (col2) values (1), (null), (6), (4), (7)

select * from tab1
select * from tab2

select * from tab1 where col1 in (select col2 from tab2)
select * from tab1 where col1 not in (select col2 from tab2)

Why statement with "in" show correct values : 1, 4 but statement with "not in" don't show anything?
Why it not showing 2, 3, 5?

It's SQL specification or bug?

Developer technologies | Transact-SQL
0 comments No comments
{count} votes

Accepted answer
  1. Erland Sommarskog 121.4K Reputation points MVP Volunteer Moderator
    2021-12-09T22:50:11.05+00:00

    Here is a more long-winded explanation which takes it step by step. We have:

    select * from tab1 where col1 not in (select col2 from tab2)
    

    We can replace the subquery with the values in tab2. This gives us:

    select * from tab1 where col1 not in (1, null, 6 ,4, 7)
    

    Let's move the NOT:

    select * from tab1 where not (col1 in (1, null, 6 ,4, 7))
    

    IN is just a shortcut for OR, so what we really have is:

    select * from tab1 where NOT (col1 = 1 OR col1 = null OR col1 = 6 OR col = 4 OR col1 = 7)
    

    Let's nu study this for some of the values for tab1.col1. We can start with 1. This gives us

    NOT (1 = 1 OR ...)
    

    and we can stop there. The list of OR will return TRUE and with NOT in front we get FALSE.

    What if we try 5?

    NOT (5= 1 OR 5 = null OR 5 = 6 OR 5 = 4 OR 5 = 7)
    

    so that's

    NOT (FALSE OR 5 = NULL OR FALSE OR FALSE OR FALSE)
    

    What about 5 = NULL? Well NULL is an unknown value. It could be different from 5, but by chance it could be 5. So the outcome is not TRUE, not FALSE, but UNKNOWN. So we have:

    NOT (FALSE OR UNKNOWN OR FALSE OR FALSE OR FALSE)
    

    The result of all the OR is UNKNOWN and with NOT in front, the value is still UNKNOWN by the three-valued logic of SQL. But for WHERE to include a row, the condition for the row must be TRUE, and thus the row with 5 is filtered out as well.

    Using NOT EXISTS as Dan showed is a very good solution. Not the least since [NOT] EXISTS is more powerful than IN. However, there is a simpler fix to the original query:

    select * from tab1 where col1 not in (select col2 from tab2 WHERE col2 IS NOT NULL)
    
    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Dan Guzman 9,401 Reputation points
    2021-12-09T11:59:06.753+00:00

    It's SQL specification or bug?

    The unintuitive behavior of NOT IN is part of the SQL standard specification. The SQL Server documentation includes this caution:

    Any null values returned by subquery or expression that are compared to test_expression using IN or NOT IN return UNKNOWN. Using null values in together with IN or NOT IN can produce unexpected results.

    The gotcha is that UNKNOWN is neither TRUE nor FALSE when NULL values are returned so the NOT IN predicate will never evaluate to either true or false . You may find NOT EXISTS is more intuitive because it will return only TRUE or FALSE.

    SELECT *  
    FROM tab1   
    WHERE NOT EXISTS (  
        SELECT 1   
        FROM tab2  
        WHERE tab2.col2 = tab1.col1  
    );  
    
    1 person found this answer helpful.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.