Ordering guarantees in SQL Server...

Artikel
07/20/2005

Ordering guarantees of queries in various context is a common source of confusion. For example, a common workaround to make the results from querying a view ordered is to introduce TOP 100 PERCENT and ORDER BY in the view definition. But this however does not guarantee order in the actual results sent to the client since the query optimizer will re-order operations to find more efficient query plans. Note that even though this topic applies to SQL Server 2005 most of the rules are valid for SQL Server 2000 too.

Here are the scenarios that guarantee ordering:

If you have an ORDER BY in the top-most SELECT block in a query, the presentation order of the results honor that ORDER BY request
If you have a TOP in the same SELECT block as an ORDER BY, any TOP computation is performed with respect to that ORDER BY. For example, if there is a TOP 5 and ORDER BY clause then SQL Server picks the TOP 5 rows within a given sort. Note that this does not guarantee that subsequent operations will somehow retain the sort order of a previous operation. The query optimizer re-orders operations to find more efficient query plans
Cursors over queries containing ORDER BY in the top-most scope will navigate in that order
INSERT queries that use SELECT with ORDER BY to populate rows guarantees how identity values are computed but not the order in which the rows are inserted
SQL Server 2005 supports a number of new "sequence functions" like RANK(), ROW_NUMBER() that can be performed in a given order using a OVER clause with ORDER BY
For backwards compatibility reasons, SQL Server provides support for assignments of type SELECT @p = @p + 1 ... ORDER BY at the top-most scope.

Conor Cunningham

Query Optimizer Development Lead

Comments

Anonymous
August 05, 2005
Could you elaborate on statement #4. Are the identity values created in the sequence established by the ORDER BY? If a clustered index exists on the identity column, does this also then determine the order of insertion?
Anonymous
August 16, 2005
Yes, the identity values will be generated in the sequence established by the ORDER BY. If a clustered index exists on the identity column, then the values will be in the logical order of the index keys. This still doesn't guarantee physical order of insertion. Index maintenance is a different step and that could also be done in parallel for example. So you could end up generating the identity values based on ORDER BY clause and then feeding those rows to the clustered index insert operator which will perform the maintenance task. You can see this in the query plan. You should really NOT think about physical operations or order but instead think of a table as a unordered set of rows. The index can be used to sort rows in logical manner (using ORDER BY clause) efficiently.

--
Umachandar
Anonymous
October 11, 2005
Hi,

First of all, I know for a guaranteed order you have to apply order by to your select but out of curiosity I have a question.

I have a table with a combined nonunique clustered index on an int field and a smalldatetime field (reporting purposes). When I select from the table (without order by) with locking the records are returned in the order of the clustered index key. However when running the query in a READUNCOMMITTED isolation level the records are returned in a different order. Although the order stays the same throughout different runs it is not ordered by the clustered index key. Issuing a DBCC DBREINDEX gives me the correct order when the database exists of 1 file, when I use multiple files for the database the order seems to be random again.

The question is: how does READUNCOMMITTED affect the way that records are being returned?

I guess it has something to do with multiple threads and the order of the data on disk but I'd like to know for sure.

TIA

PS. My blog has a test script
Anonymous
October 12, 2005
The comment has been removed
Anonymous
October 12, 2005
Umachandar,

Thanks for a explanation so far.
But I'm still having difficulty understanding why exactly uncommitted isolation uses another method to reach the rows.

Herewith the queryplans for both queries:

Query plan with locking:
|--Clustered Index Scan(OBJECT:([RSStaging183].[dbo].[tblClustered].[ixID]))

Query plan without locking:
|--Clustered Index Scan(OBJECT:([RSStaging183].[dbo].[tblClustered].[ixID]))

As you can see the query plan produces the same output.

If you wish there is a simple testscript on my blog to simulate the behaviour.
Anonymous
October 14, 2005
I looked at your repro script and the reason is as I said the access methods for scanning a clustered index could be different. You can see this if you do the following:

1. Increase the number of rows in the table to say 100K or 200K
2. Run SET STATISTICS IO ON
3. Run queries below:

select * from tblClustered with(readuncommitted);
select * from tblClustered with(readuncommitted) order by Id, MyDate;
select * from tblClustered;
select * from tblClustered order by Id, MyDate;

You will notice that the io counters are different for the first SELECT which is an indirect indication of how the table scan is being performed. It is not conclusive but that is best we can do now using documented commands. Unfortunately some of this information is not exposed in the showplan output. I will post more details once I talk to the developer who owns this piece of code. :-)

--
Umachandar
Anonymous
October 16, 2005
Hehe, great Umachandar :-)
I'll check out your findings too.

Thanks for all the effort already.
Anonymous
January 07, 2006
Conor:-
How about providing script example for scenariowise for 1-5 using any common dB like northwind or adventureworks That would for sure make this entry more worthful.While i had gone thru i had to minutely looking for the meaning as i was looking for the oredr of rows retuend in nested queries.
Anonymous
June 14, 2006

ORDER BY has never been technically 'allowed' in a view definition. For
example, the following code:CREATE...
Anonymous
August 28, 2006
Considering the example code below, are you saying in scenario 4 that correspondence between the two columns is not guarenteed? If so, it seems to contradict this KB article...

http://support.microsoft.com/default.aspx?scid=kb;en-us;273586

CREATE TABLE #X (X int)
CREATE TABLE #Y (Y int, N int IDENTITY(1,1))

INSERT #X (X) VALUES (1)
INSERT #X (X) VALUES (2)
INSERT #X (X) VALUES (3)

INSERT #Y (Y) SELECT X FROM #X ORDER BY X

SELECT * FROM #Y ORDER BY Y

DROP TABLE #X
DROP TABLE #Y
Anonymous
August 28, 2006
Can you expand your question little bit? The KB article does not contradict scenario #4. It says that identity values will be generated based on the ORDER BY clause in INSERT..SELECT. And to retrieve rows in a specific order from any table or query you have to always use SELECT with ORDER BY clause.

Thanks
Umachandar
Anonymous
August 29, 2006
Again in the context of the example code...

Scenario #4 does not mention retrieving rows from table #Y. It seems to say that identity values will be generated in a 1,2,3 order (so what other order would make any sense), but the rows retrieved from table #X could be inserted in 2,3,1 order. Are you saying that the resulting rows in table #Y do not necessarily have the same value in the two columns, or are you simply saying that the rows of table #Y are not retrieved in any particular order without an ORDER BY clause? If it's the latter, why massively confuse the issue by involving the identity functionality?
Anonymous
August 29, 2006
The other possible plan for identity generation is to perform it before the SORT operator(s). This happens when you do SELECT...INTO for example and the SELECT statement contains an ORDER BY clause. And another option is that the identity value can be generated at the time of insertion into the table at which point the order of the rows could be different. It is important to understand that these are set operations. Also, consider example below:

CREATE TABLE #X (X int)
CREATE TABLE #Y (Y int, N int IDENTITY(1,1), G uniqueidentifier DEFAULT(NEWSEQUENTIALID()))

INSERT #X (X) VALUES (1)
INSERT #X (X) VALUES (2)
INSERT #X (X) VALUES (3)

INSERT #Y (Y) SELECT X FROM #X ORDER BY X

SELECT * FROM #Y ORDER BY Y
SELECT * FROM #Y ORDER BY G

DROP TABLE #X
DROP TABLE #Y

There is a default in the table that generates sequential GUIDs. But the values depend on which order the rows were inserted into the table. And this may or may not match that of the ORDER BY clause in the SELECT statement. So the final two SELECT statements need not necessarily produce the same results. As you can see, in this example the ORDER BY clause in the INSERT...SELECT only controls when the identity values on the table are generated not for the defaults. If you consider more complex schema elements that have say defaults with CLR UDFs that persist generates some sequential value the only way to control order is to insert the rows one by one.

Hope this makes sense now why the emphasis is on identity value generation with INSERT...SELECT ORDER BY. It is because there are other cases where the ORDER BY doesn't affect anything like computation of defaults for example.

--
Umachandar
Anonymous
August 29, 2006
Thanks for the clarification. Now that I understand the point of scenario #4 I see that it confirms the KB article. I think scenario #4 would have been much clearer if it had mentioned column defaults as being different from identity values.
Anonymous
September 19, 2006
Hi Conor,

Here's a question that's triggered by a post in Allen White's blog (http://www.sqljunkies.com/WebLog/marathonsqlguy/archive/2006/09/19/23557.aspx).

CREATE TABLE #X (X int)
CREATE TABLE #Y (Y int, N int IDENTITY(1,1))
CREATE TABLE #Y2 (Y int, N int, N2 int IDENTITY(1,1))

INSERT #X (X) VALUES (1)
INSERT #X (X) VALUES (2)
INSERT #X (X) VALUES (3)

INSERT #Y (Y)
OUTPUT INSERTED.Y, INSERTED.N INTO #Y2(Y,N)
SELECT X FROM #X ORDER BY X

SELECT * FROM #Y ORDER BY Y
SELECT * FROM #Y2 ORDER BY Y

DROP TABLE #X
DROP TABLE #Y
DROP TABLE #Y2

The IDENTITY values for #Y will obey the specified ORDER BY, as stated in scenario #4. The order in which rows actually get inserted in #Y is not guaranteed, though. So what will happen to #Y2? Are the IDENTITY values for this table calculated as rows are inserted, in the order in which they are inserted (i.e. not guaranteed)? Or is the IDENTITY column in the target table of an OUTPUT clause also included in the guarantee as outline above?

Best regards,

Hugo Kornelis
Anonymous
September 21, 2006
Hugo, here is the reply from Conor. Hope this clarifies your question. The jist of it is that the identity value generation order is guaranteed only on the INSERT INTO table. So any other assumption made on the order is risky and can lead to unpredictable behavior.

>>
There are no physical ordering guarantees on any insert operations in SQL 2005.

For example, consider the case when either/both of these tables are partitioned. Even when they are not, there is nothing in the code guaranteeing any physical insertion order of the rows to the heap or any secondary index.

In your example, the order by is guaranteed to influence the order in which identity values are assigned to the identity function for the first table. However, it does not guarantee any insertion order once those rows are generated . It does not provide guarantees on the second table at all. If such guarantees are needed, that's a feature request for us to add syntax.
>>

--
Umachandar
Anonymous
September 21, 2006
Umachandar, Conor,

Thanks for the clarification. The answer is as I expected.

Best regards,

Hugo Kornelis
Anonymous
March 12, 2009
PingBack from http://soci.hu/blog/index.php/2009/03/12/mikor-epithetunk-a-sorok-sorrendjere-sql-serverben/

Freigeben über

Ordering guarantees in SQL Server...

Comments

Zusätzliche Ressourcen