How does order by work when all column values are identical?

Question

Hi,

I use SQL Server 2016. Below is the rows in table: test_account. You can see the values of updDtm, fileCreateTime are identical.

id	accno	updDtm	fileCreatedTime
1	123456789	2022-07-27 09:41:10.0000000	2022-07-27 11:33:33.8300000
2	123456789	2022-07-27 09:41:10.0000000	2022-07-27 11:33:33.8300000
3	123456789	2022-07-27 09:41:10.0000000	2022-07-27 11:33:33.8300000

I want to query the latest account id which accno is 123456789 order by updDtm, fileCreatedTime

So I run the following SQL, the output result is id = 1

select t.id from (
  SELECT ROW_NUMBER() OVER(PARTITION BY a.accno ORDER BY a.updDtm desc, a.fileCreatedTime desc) AS seq, a.id, a.accno, a.updDtm, a.fileCreatedTime
  FROM test_account a
) as t
where t.seq = 1

My question is does the query result is repeatable and reliable (always output id=1 either run 1 time or multiple times) when the values of columns: updDtm, fileCreatedTime are identical or just output the random id?

I read some articles and learn that for MySql and Oracle the query result is not reliable and reproducible. How about SQL Server?

Accepted Answer

A test in Management Studio with large amount of rows does not seem to return id=1:

set nocount on
drop table if exists test1
create table test1(id int not null identity(1,1) primary key, accno int, updDtm datetime2, fileCreatedTime datetime2)
go

insert test1(accno, updDtm, fileCreatedTime) values (123456789, '2022-07-27 09:41:10.0000000', '2022-07-27 11:33:33.8300000')
go 100000

select t.id from (
  SELECT ROW_NUMBER() OVER(PARTITION BY a.accno ORDER BY a.updDtm desc, a.fileCreatedTime desc) AS seq, a.id, a.fileCreatedTime
  FROM test1 a
) as t
where t.seq = 1
-- or:
select top(1) id
from test1
order by updDtm desc, fileCreatedTime desc

Try it on your server.

Answer

You are responsible for choosing a tie breaker column in order by clause: for example, add id column.

select t.id from (
  SELECT ROW_NUMBER() OVER(PARTITION BY a.accno ORDER BY a.updDtm desc, a.fileCreatedTime desc, id ) AS seq, a.id, a.accno, a.updDtm, a.fileCreatedTime
  FROM test_account a
) as t
where t.seq = 1

Answer

Hi @Lei Xiao

I tested the case mentioned by Viorel and when the amount of data is large, there will be some situations where the id is not 1.

Because the sample data you provide accno, updDtm, and fileCreatedTime columns are the same, the system has many possibilities when sorting. If you want the outcome you expect, maybe you need to add a tiebreaker column, as Jingyang Li said.

So the answer to your question is that SQL Server query results don't seem to be that reliable when the values of the columns are the same.

Best regards,

Percy Tang

If the answer is the right solution, please click "Accept Answer". If you have extra questions about this answer, please click "Comment".

Note: Please follow the steps in our Documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

Answer

I read some articles and learn that for MySql and Oracle the query result is not reliable and reproducible. How about SQL Server?

It is the same. The query is non-deterministic and it can return any of the three rows. You can run thousand tests and always get the same result, but that proves nothing. The vendor may make a change to the optimizer, and the plan could be different and so could the result. And that would not be a bug or a breaking change.

Answer

The order of duplicate rows is not defined. The type of sort will determine how repeatable. Those that use a random pivot will not be repeatable. The type of sort is typically depend on key distributions.

if the order by uses an index rather than sort it may be repeatable.

note: by sql definition order by with duplicates is not repeatable, that is the order of the duplicates is random.

How does order by work when all column values are identical?

4 additional answers