Effect of READ_COMMITTED_SNAPSHOT for Delete Statement w/ Subquery

Question

Effect of READ_COMMITTED_SNAPSHOT for Delete Statement w/ Subquery

Ed 26

Good day,

We recently encountered a race condition issue in SQL Server and we would like to ask if the behaviour is expected when using READ_COMMITTED_SNAPSHOT. Please see the details below.

SQL Server Options:

Transaction isolation level = READ COMMITTED
READ_COMMITTED_SNAPSHOT = ON
ALLOW_SNAPSHOT_ISOLATION = OFF

Transaction A:

insert into TABLE_A
insert into TABLE_B
commit

Transaction B:

delete from TABLE_B where COLUMN not in (select COLUMN from TABLE_A);
commit

Findings:

Even when using transactions, newly inserted records in TABLE_B by Transaction A got deleted by Transaction B. Seems the "select" and "delete" statements have different reference snapshots.
When disabling READ_COMMITTED_SNAPSHOT, we couldn't encounter this issue.

Questions:

Is it expected that SQL Server does not maintain "statement-level read consistency" for SQL statements with subquery?
If expected, what's your suggested approach in handling this?

Thank you!

Regards,
Ed

Accepted answer

3 additional answers

Your answer

Answer 1

Erland Sommarskog 121.4K MVP Volunteer Moderator

I was able to reproduce the issue (after some smaller modification to the scripts. The first script lacks BEGIN TRANSACTION, the other has a COMMIT too many).

This is my analysis of what is happening. In my test the plan for the DELETE was a MERGE JOIN of CI scans on the two tables. TABLE_A is accessed from the version store the normal way. For TABLE_B on the other hand, SQL Server wants an UPDATE lock. An UPDATE lock is a read lock which can only be held by one process. This sort of lock is taken when SQL Server is about to update the resource. If the update actually happens, the U lock is converted to an X lock.

If the INSERT transaction is in progress, this means that the scan over table A can still be carried out, since it runs over the snapshot. But the scan over table B is blocked by the transaction. The net result is that the result of the two scans are inconsistent.

I know of other anomalies that can occur with RSCI. Say for instance that a the procedure to deregister a product checks that there are no open orders for the product, and the procedure to add an order checks that all products are active. If they execute simultaneously, the business rules can be validated, because both procedures are reading data that is in fact stale. This is one I have not seen before or thought about.

There are at least two ways to skin the cat in your case. One is to use true snapshot isolation for the delete operation. When I tested this, no rows were deleted. The other solution is to use the hint READCOMMITTEDLOCK:

    delete
     from TABLE_B
    where SOME_PK not in (select SOME_PK
                            from TABLE_A WITH (READCOMMITTEDLOCK));

Here you are telling SQL Server that you don't want to use the snapshot in this case.

Ed 26 Reputation points

2021-03-08T00:35:27.453+00:00

Sorry for the late update, @Erland Sommarskog . Yes, it proves that reads don't block the writes and vice versa.

I also found a reference here (https://sqlperformance.com/2014/05/t-sql-queries/read-committed-snapshot-isolation) and it recommends the use of READCOMMITTEDLOCK hint as well.

Using the hint will result to better performance as we can only apply this to queries that experience the anomaly. But, if we want to mimic the behaviour of Oracle RDBMS, isn't it better to just use READ COMMITTED (w/ READ_COMMITTED_SNAPSHOT OFF) instead of SNAPSHOT (ALLOW_SNAPSHOT_ISOLATION ON)?

Thank you, Erland!
Erland Sommarskog 121.4K Reputation points MVP Volunteer Moderator

2021-03-08T22:41:04.457+00:00

I don't know the exact of semantics of Oracle (as I have never used Oracle), so I can't say what is the best way to mimick Oracle.

But as I understand it, READ COMMITED with RSCI OFF is the anti-thesis to the Oracle motto "readers don't block writers and writers don't block readers".

RSCI is certainly very useful to improve concurrency, but as Paul points out in his article, there are situations where it can bite you.
Ed 26 Reputation points

2021-03-09T00:27:03.973+00:00

I see. If that's the case, it's better to retain RCSI and just be careful of these kind of queries. Thank you very much for the help, @Erland Sommarskog .
Fred 0 Reputation points

2023-03-07T04:00:25.1433333+00:00

deleted...
Fred 0 Reputation points

2023-03-07T14:32:40.4766667+00:00

Sorry @Erland Sommarskog

I am an oracle developer, so not familiar with sqlserver.

After use the hint READCOMMITTEDLOCK, the subquery "select ... from TABLE_A ..." will be blocked if the insert transaction is in progress, and then continue to read the newly inserted rows after the insert transaction is committed?

Is it possible some TABLE_B records are still deleted by mistake even with the hint READCOMMITTEDLOCK? E.g. in Ed's original question, Transaction A inserts a pair of records into TABLE_A and TABLE_B, commits. In the delete statement of Transaction B, the scan over TABLE_A doesn't see the newly inserted record, but the scan over TABLE_B does.

reference:

https://learn.microsoft.com/en-us/answers/questions/264991/sql-server-isolation-behavior-during-count(*)-in-r

During a scan, a row will be counted only if it was inserted and committed before the scan point is reached...

https://techcommunity.microsoft.com/t5/sql-server-blog/read-committed-isolation-level/ba-p/383258#:~:text=When%20SQL%20Server%20executes%20a%20statement%20at%20the,each%20lock%20before%20proceeding%20to%20the%20next%20row.

Thank you.
Erland Sommarskog 121.4K Reputation points MVP Volunteer Moderator

2023-03-07T22:21:56.4533333+00:00

After use the hint READCOMMITTEDLOCK, the subquery "select ... from TABLE_A ..." will be blocked if the insert transaction is in progress, and then continue to read the newly inserted rows after the insert transaction is committed.

It may be blocked, yes. I say "may", because depends on the query plan, timing etc. But if the query follows an index, it may encounter an newly-inserted, not-yet-committed row, and in that case the SELECT will be blocked.

Is it possible some TABLE_B records are still deleted by mistake even with the hint READCOMMITTEDLOCK?

I can't answer this question, because I don't know what is the exact intention, and what is defined as "by mistake". But concurrency in database often requires careful understanding of what you are doing.

What if the insert transaction starts after the select in subquery is completed and committed before the scan for delete is completed?

Keep in mind that subqueries are, like anything else in a query, logical ways of expressing the query. The physical implementation may be different. In any case, there are too many ifs and buts here. I don't even know if you, "Fred" is the same "Ed" who asked the original question.
Fred 0 Reputation points

2023-03-08T02:55:21.1333333+00:00

Hi @Erland Sommarskog
Rephrased my question above. Ed is my ex-colleague. Sorry for the trouble and so appreciate for your help.
Erland Sommarskog 121.4K Reputation points MVP Volunteer Moderator

2023-03-08T21:56:52.7133333+00:00

I think it is better if you post a new Question, describing the problem from start to end. Particularly, if you have an actual scenario where strange things are happening, please share the code. When reviewing the original post, I find that there are a lot of things I need to read between the lines. And the original post was two years ago. You may have changed your code since them, and the problem is different from the original.

Answer 2

Erland Sommarskog 121.4K MVP Volunteer Moderator

There is something hiding here which does not exhibit in your outline. Would it be possible for you to create a repro that we can play with?

Answer 3

Thank you for the prompt responses, @Sean Gallardy - MSFT , @Erland Sommarskog !

From Sean:
[...] I'm not sure the initial reason for using RCSI so it's hard to say if anything else would work within whatever other goals you have that aren't listed might be.

We wanted to mimic the default behaviour of Oracle RDBMS for Microsoft SQL Server.

From Erland
[...] Would it be possible for you to create a repro that we can play with?

We used the scripts below for replicating the issue.

/*  
 * Transaction isolation level = READ COMMITTED  
 * READ_COMMITTED_SNAPSHOT = ON  
 * ALLOW_SNAPSHOT_ISOLATION = OFF  
 */  
  
/*  
 * Connection 1  
 */  
declare  
  @cnt numeric(10),  
  @somePk nvarchar(36),  
  @i numeric(10);  
begin  
set nocount off  
  set @i = 0;  
  while (@i < 100)  
    begin  
	  set @i = @i + 1;  
	  set @somePk = cast (@i as varchar(10));  
      
	  insert into TABLE_A(SOME_PK) values(@somePk);  
        
	  waitfor delay '00:00:00.300';  
  
	  insert into TABLE_B(SOME_PK) values (@somePk);  
  
	  commit;  
    end;  
end;  
  
/*  
 * Connection 2  
 */  
declare  
  @i numeric(10),  
  @cnt numeric(10) = 0,  
  @cntN numeric(10);  
begin  
  set nocount off  
  
  set @i = 0;  
  while (@i < 500)  
    begin  
      set @i = @i + 1;  
	    
	  waitfor delay '00:00:00.100';  
	    
      delete  
        from TABLE_B  
       where SOME_PK not in (select SOME_PK  
                               from TABLE_A);  
  
	  select @cnt = @@ROWCOUNT  
  
	  select @cntN = count(1)  
        from TABLE_B;  
  
      print concat('Count deleted:', @cnt, ' tbc:', @cntN);  
	    
	  -- Problem: Unexpected TABLE_B records got deleted.  
	  -- Suspicion: Connection 1 committed in between the "delete" and "select".   
	  --            Then, SQL Server used a newer snapshot for the delete which resulted to deleting unexpected TABLE_B records.  
	  -- Test: We used this script and it printed 12 deleted records out of 100.  
        
	 commit  
    end;  
end;

Answer 4

Sean Gallardy - MSFT 1,901 Microsoft Employee

Is it expected that SQL Server does not maintain "statement-level read consistency" for SQL statements with subquery?

The statement level read consistency is maintained. Once you insert into Table B from your other transaction, that statement is complete and thus can be seen by other sessions in RCSI. The statement level is fine, you want transaction level, which would mean not using RCSI. If you check out the Docs, you'll se this is stated:

That is, the SQL Server Database Engine uses row versioning to present each statement with a transactionally consistent snapshot of the data as it existed at the start of the statement.

Thus, so long as the data was there at the start of the delete statement, it will be seen. The delete... select.. is a single query, it does not make two different row version start times.

If expected, what's your suggested approach in handling this?

It seems you don't want anything to see or touch the changes made in Transaction A, you could either use the default read committed where you'll have blocking as the natural course of action for the delete (depending on where and when), or snapshot isolation which would give you transaction level row versioning but bring in potential write conflicts. I'm not sure the initial reason for using RCSI so it's hard to say if anything else would work within whatever other goals you have that aren't listed might be.

Fred 1 Reputation point

2021-03-08T18:14:52.257+00:00

@Sean Gallardy - MSFT
Since "delete from TABLE_B where COLUMN not in (select COLUMN from TABLE_A);" is a single statement, will the statement-level read consistency ensure both delete and the subquery select read the data snapshot at the start? or delete and the subquery select are two statements? or delete is a write so never reads snapshot data? Thanks
Erland Sommarskog 121.4K Reputation points MVP Volunteer Moderator

2021-03-08T22:36:19.3+00:00

Fred, if you read my accepted answer above, you will find the answers to your questions.
Fred 1 Reputation point

2021-03-09T00:00:47.167+00:00

Sorry @Erland Sommarskog
Is "the scan over table B is blocked by the transaction." because the insert statement imposes a table lock but not a row/page lock on table B? With the hint READCOMMITTEDLOCK, is the shared lock released after the whole delete statement or the subquery? If after the subquery, is it possible another tran commits modification to table A/B between the scan of table A and the scan of table B? Thank you.
Erland Sommarskog 121.4K Reputation points MVP Volunteer Moderator

2021-03-09T22:05:25.39+00:00

Is "the scan over table B is blocked by the transaction." because the insert statement imposes a table lock but not a row/page lock on table B?

No, the INSERT operation takes a row lock. But the DELETE operation more or less has to read all rows in the table due the semantics, although this is plan-dependent.

When I tested, the plan was a merge join + CI scan of the two tables. And the CI scan gets blocked when it encounters a locked row. Or when it tries to take a share table lock and is blocked by the Intent Lock held by the INSERT process. (An intent lock is a lock that signals that you are holding a real lock on lower level.)
Fred 1 Reputation point

2021-03-10T14:21:20.917+00:00

Thank you @Erland Sommarskog
so the subquery with hint READCOMMITTEDLOCK can block or be blocked by the insert operation because the shared lock imposed by the subquery is escalated to table level? or the select opetation has the semantics to read all rows?

If semantics only, insert can block select, but can select block insert?
Erland Sommarskog 121.4K Reputation points MVP Volunteer Moderator

2021-03-10T22:13:58.693+00:00

so the subquery with hint READCOMMITTEDLOCK can block or be blocked by the insert operation because the shared lock imposed by the subquery is escalated to table level?

It may or may not be escalated to a table lock, but from a semantic view it does not matter. The SELECT wants to read that row that is there, but which is not committed. So it will have to wait.

If semantics only, insert can block select, but can select block insert?

Yes. A long-running SELECT that has taken a table lock will block INSERT operations for a while.

Share via

Effect of READ_COMMITTED_SNAPSHOT for Delete Statement w/ Subquery

3 additional answers

Your answer