SQL Server NEWID combined with CHOOSE sometimes fails

DJ Olsen 1 Reputation point
2021-08-29T23:34:05.887+00:00

I'm working on SQL Server 2019 and encountered something I do not understand.

These two statements below are intended to randomly select between 'A' and 'B'. This is achieved by using NEWID(), taking a checksum, then the absolute value, modulus 2 and adding 1. The result should be a random number between 1 and 2. This has to be used instead of something based on RAND() because that function will return identical values for each row.

This is a very reduced example of two variants of this query

SELECT CHOOSE(abs(checksum(NEWID()))%2+1,'A','B') FROM (VALUES (1),(2),(3),(4),(5))FiveRows(RowNum);

SELECT CHOOSE(abs(checksum(ID))%2+1,'A','B') FROM (VALUES (1),(2),(3),(4),(5))FiveRows(RowNum),(SELECT ID=NewID())ID;

The thing is, the first statement returns NULL sometimes. This would be expected with a CHOOSE statement if the index value is out of range, in this case zero or greater than 2. This however is not possible, as any number modulus 2 plus 1 must be 1 or 2.

The second statement is functionally identical. Instead of using NEWID() in-line, cross-joins a dynamic table which is simply a NEWID(). This works.

What gives??

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
13,069 questions
Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,577 questions
{count} votes

1 answer

Sort by: Most helpful
  1. MelissaMa-MSFT 24,176 Reputation points
    2021-08-30T02:45:49.963+00:00

    Hi @DJ Olsen

    Welcome to Microsoft Q&A!

    Actually SQL Server converts the CHOOSE to a bunch of CASEs.

    The definition of choose(x, a, b) is equivalent to:

    case when x = 1 then a  
          when x = 2 then b  
     end  
    

    Simply enough when x is a constant or when x refers to a column or when x is a non-volatile expression. But 99.9% counter-intuitive when x is a non-deterministic expression.

    Your first query is executing like below:

    SELECT  case when abs(checksum(NEWID()))%2+1 = 1 then 'A'  
                 when abs(checksum(NEWID()))%2+1 = 2 then 'B' end  
    FROM (VALUES (1),(2),(3),(4),(5))FiveRows(RowNum);  
    

    Then above expression has two random numbers. In fact, one random number for each possible target.

    Only about 25% percent of the time, both comparisons will fail. And, the result will be NULL.

    Best regards,
    Melissa


    If the answer is helpful, please click "Accept Answer" and upvote it.
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.