An Apache Spark-based analytics platform optimized for Azure.
Hi ,
Thanks for reaching out to Microsoft Q&A.
The difference is coming from how IS NOT DISTINCT FROM is evaluated in the join condition combined with null semantics tightening in Runtime 18.
In your query, T0.b is NULL and T1.d is also NULL, so the join condition (T0.b IS NOT DISTINCT FROM T1.d) should evaluate to true (because this operator treats NULL = NULL as true). Older runtimes and SQL Warehouse honour this and produce the match, hence you see 2 in o_2. In Runtime 18, the optimizer is likely rewriting or pushing down the join in a way that breaks this null-safe equality handling (known area of change with Spark 3.5 optimizer rules), effectively behaving closer to a standard equality join during execution, so the row does not match and you get NULL instead of 2.
I feel this is very likely a regression/optimizer bug rather than intended behaviour. As a workaround, avoid relying on IS NOT DISTINCT FROM in join predicates for now and rewrite explicitly as (T0.b = T1.d OR (T0.b IS NULL AND T1.d IS NULL)), which tends to bypass these optimizer issues and gives consistent results across runtimes.
Please 'Upvote'(Thumbs-up) and 'Accept' as answer if the reply was helpful. This will be benefitting other community members who face the same issue.