How does order by work when all column values are identical?

Lei Xiao 31 Reputation points
2023-01-31T03:16:29.5833333+00:00

Hi,

I use SQL Server 2016. Below is the rows in table: test_account. You can see the values of updDtm, fileCreateTime are identical.

id	accno	updDtm	fileCreatedTime
1	123456789	2022-07-27 09:41:10.0000000	2022-07-27 11:33:33.8300000
2	123456789	2022-07-27 09:41:10.0000000	2022-07-27 11:33:33.8300000
3	123456789	2022-07-27 09:41:10.0000000	2022-07-27 11:33:33.8300000

I want to query the latest account id which accno is 123456789 order by updDtm, fileCreatedTime

So I run the following SQL, the output result is id = 1

select t.id from (
  SELECT ROW_NUMBER() OVER(PARTITION BY a.accno ORDER BY a.updDtm desc, a.fileCreatedTime desc) AS seq, a.id, a.accno, a.updDtm, a.fileCreatedTime
  FROM test_account a
) as t
where t.seq = 1

My question is does the query result is repeatable and reliable (always output id=1 either run 1 time or multiple times) when the values of columns: updDtm, fileCreatedTime are identical or just output the random id?

I read some articles and learn that for MySql and Oracle the query result is not reliable and reproducible. How about SQL Server?

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
12,714 questions
{count} votes

Accepted answer
  1. Viorel 112.1K Reputation points
    2023-02-01T08:11:13.4033333+00:00

    A test in Management Studio with large amount of rows does not seem to return id=1:

    set nocount on
    drop table if exists test1
    create table test1(id int not null identity(1,1) primary key, accno int, updDtm datetime2, fileCreatedTime datetime2)
    go
    
    insert test1(accno, updDtm, fileCreatedTime) values (123456789, '2022-07-27 09:41:10.0000000', '2022-07-27 11:33:33.8300000')
    go 100000
    
    select t.id from (
      SELECT ROW_NUMBER() OVER(PARTITION BY a.accno ORDER BY a.updDtm desc, a.fileCreatedTime desc) AS seq, a.id, a.fileCreatedTime
      FROM test1 a
    ) as t
    where t.seq = 1
    -- or:
    select top(1) id
    from test1
    order by updDtm desc, fileCreatedTime desc
    

    Try it on your server.

    1 person found this answer helpful.

4 additional answers

Sort by: Most helpful
  1. Jingyang Li 5,891 Reputation points
    2023-01-31T03:21:58.1066667+00:00

    You are responsible for choosing a tie breaker column in order by clause: for example, add id column.

    select t.id from (
      SELECT ROW_NUMBER() OVER(PARTITION BY a.accno ORDER BY a.updDtm desc, a.fileCreatedTime desc, id ) AS seq, a.id, a.accno, a.updDtm, a.fileCreatedTime
      FROM test_account a
    ) as t
    where t.seq = 1
    

  2. PercyTang-MSFT 12,426 Reputation points Microsoft Vendor
    2023-01-31T07:10:33.24+00:00

    Hi @Lei Xiao

    I tested the case mentioned by Viorel and when the amount of data is large, there will be some situations where the id is not 1.

    Because the sample data you provide accno, updDtm, and fileCreatedTime columns are the same, the system has many possibilities when sorting. If you want the outcome you expect, maybe you need to add a tiebreaker column, as Jingyang Li said.

    So the answer to your question is that SQL Server query results don't seem to be that reliable when the values of the columns are the same.

    Best regards,

    Percy Tang


    If the answer is the right solution, please click "Accept Answer". If you have extra questions about this answer, please click "Comment".

    Note: Please follow the steps in our Documentation to enable e-mail notifications if you want to receive the related email notification for this thread.


  3. Erland Sommarskog 101K Reputation points MVP
    2023-01-31T23:03:51.7733333+00:00

    I read some articles and learn that for MySql and Oracle the query result is not reliable and reproducible. How about SQL Server?

    It is the same. The query is non-deterministic and it can return any of the three rows. You can run thousand tests and always get the same result, but that proves nothing. The vendor may make a change to the optimizer, and the plan could be different and so could the result. And that would not be a bug or a breaking change.


  4. Bruce (SqlWork.com) 56,021 Reputation points
    2023-02-01T01:16:27.8033333+00:00

    The order of duplicate rows is not defined. The type of sort will determine how repeatable. Those that use a random pivot will not be repeatable. The type of sort is typically depend on key distributions.

    if the order by uses an index rather than sort it may be repeatable.

    note: by sql definition order by with duplicates is not repeatable, that is the order of the duplicates is random.

    0 comments No comments