Deadlock when modifying different rows of small tables using read committed snapshot isolation

Question

Deadlock when modifying different rows of small tables using read committed snapshot isolation

Forján Csaba 1

Suppose we have a simple table with an index:

create table test (
        id bigint not null,
        col1 nvarchar(100) not null,
        col2 nvarchar(100),
        col3 nvarchar(100),
        col4 nvarchar(100),
        index idx (id, col1))

Total number of records in the table is low (<1000).
There are mutiple transactions running modifying (insert, update, delete) a disjunct set of of records (= the same record is not affected by 2 transactions).
Isolation level is read committed snapshot.
Lock escalation does not occur. Or at least no lock_escalation event is captured by the extended events session subscribed to those events.

In the above scenario we experience frequent deadlocks.
The problem can be demonstrated running the follwing C# program using the "mcr.microsoft.com/mssql/server:2019-latest" docker image:

using Microsoft.Data.SqlClient;
using System;
using System.Linq;
using System.Threading.Tasks;

namespace deadlock2
{
    internal class Program
    {
        const string connectionString = "Data Source=(local);Initial Catalog=Dummy;User ID=sa;Password=test";
        private static Random rng = new Random();

        static void Main(string[] args)
        {
            CreateTestData();

            for (int i = 0; i < 10; i++)
            {
                DoWork();
            }
        }

        private static void DoWork()
        {
            Parallel.ForEach(Enumerable.Range(850, 50).OrderBy(x => rng.Next()), number =>
            {
                var queryString = @"
                    update t 
                    set col1 = @Id
                    from test t with (FORCESCAN)
                    where id = @Id
                ";

                using (SqlConnection connection = new SqlConnection(connectionString))
                {
                    connection.Open();
                    var t = connection.BeginTransaction(System.Data.IsolationLevel.ReadCommitted);

                    SqlCommand command = new SqlCommand(queryString, connection, t);
                    command.Parameters.AddWithValue("@Id", number);
                    command.ExecuteNonQuery();

                    t.Commit();
                }
            });
        }

        private static void CreateTestData()
        {

            var queryString = "insert into test values (@Id, @Col1, @Col2, @Col3, @Col4)";

            using (SqlConnection connection = new SqlConnection(connectionString))
            {
                connection.Open();
                var t = connection.BeginTransaction();

                for (var i = 0; i < 1000; i++)
                {
                    // Create the Command and Parameter objects.
                    SqlCommand command = new SqlCommand(queryString, connection, t);
                    command.Parameters.AddWithValue("@Id", i);
                    command.Parameters.AddWithValue("@Col1", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa");
                    command.Parameters.AddWithValue("@Col2", "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb");
                    command.Parameters.AddWithValue("@Col3", "ccccccccccccccccccccccccccccccccccc");
                    command.Parameters.AddWithValue("@Col4", "ddddddddddddddddddddddddddddddddddd");
                    command.ExecuteNonQuery();
                }

                t.Commit();
            }
        }
    }
}

The FORCESCAN hint is only placed here as a demonstration. Our live code does not use hints but behaves the same. The workarounds that fix the problem above, does fix our live code as well e.g.:

Using FORCESEEK hints. This can be used for updates only, deletes and (bulk)inserts does not accept this hint.
When the table gets through the initial loading phase the deadlocks disappear. This is a solution for the prodcution server but the test database will always run on an almost empty table with a few hundred records.
There is no deadlock using Snapshot isolation, but our client is strictly against enabling this...

According to our analysis the problem here is that for small tables the query optimizer might ignore indexes and use table/index scans to find records affected. These scans somehow end up blocking and/or getting blocked by the exclusive locks the modifications place on the rows. We believe that using read committed snapshot isolation level this blocking shall never happen (same way as it does not happen with snapshot isolation).

While we try to persuade our client to enable snapshot isolation for us, is there a fix or a good workaround we can use to mitigate this issue that works with updates, deletes and (bulk)inserts as well at the same time? (We obviously want to keep running concurrent transactions, limiting conurrency is not an option)

AmeliaGu-MSFT 14,006 Reputation points Microsoft External Staff

2021-12-15T08:10:26.503+00:00

Hi ForjnCsaba-0506,

Welcome to Microsoft Q&A.
Could you please check the deadlock xml and deadlock graph?
You can go to SSMS Management -> Extended Events ->Sessions -> system_health-> right click package0.event_file-> View Target Data, and filter the information about deadlock details.
Please check this article which might be helpful.

Best Regards,
Amelia
Viorel 122.6K Reputation points

2021-12-15T09:06:44.367+00:00

Maybe it is not an extraordinary situation, and the resolution is to retry the transaction until it succeeds, maybe with reasonable delays, on C# or SQL side, especially when the SQL error message says “Rerun” or “Retry”.
Forján Csaba 1 Reputation point

2021-12-15T09:23:39.29+00:00

Hi Amelia,

Thanks for your reply!

Here is the deadlock graph:

And the corresponding deadlock report xml:
157824-xml-deadlock-report.xml

Best Regards,
Csaba

3 answers

Your answer

AmeliaGu-MSFT 14,006 Reputation points Microsoft External Staff

2021-12-15T08:10:26.503+00:00

Hi ForjnCsaba-0506,

Welcome to Microsoft Q&A.
Could you please check the deadlock xml and deadlock graph?
You can go to SSMS Management -> Extended Events ->Sessions -> system_health-> right click package0.event_file-> View Target Data, and filter the information about deadlock details.
Please check this article which might be helpful.

Best Regards,
Amelia
Viorel 122.6K Reputation points

2021-12-15T09:06:44.367+00:00

Maybe it is not an extraordinary situation, and the resolution is to retry the transaction until it succeeds, maybe with reasonable delays, on C# or SQL side, especially when the SQL error message says “Rerun” or “Retry”.
Forján Csaba 1 Reputation point

2021-12-15T09:23:39.29+00:00

Hi Amelia,

Thanks for your reply!

Here is the deadlock graph:

And the corresponding deadlock report xml:
157824-xml-deadlock-report.xml

Best Regards,
Csaba

Answer 1

Unfortunately, I don't have the time to try the repro right now, but I can see how it happens. I would guess it is most likely to be related to page locks, and with page locks you can get deadlocks even when unrelated rows are updated, if they are on the same page.

Since this is about updating operations, READ COMMITTED SNAPSHOT is not likely to help. The processes needs to first take U locks on the rows/pages which are then converted to X locks. I'm a little surprised when you say that snapshot isolation helps.

I'll try to see if I can find some time for this tomorrow night.

Answer 2

Using FORCESCAN means that the server must scan all the index and cannot use index seek. This directly leads in many cases to the deadlocks of the key as you have here. With that said, I did not reproduce the scenario since it based on C# app. Therefore, this is a general comment of optional reason (which fits your description).

Using FORCESEEK hints. This can be used for updates only, deletes and (bulk)inserts does not accept this hint.

Actually, this is not correct and we can use FORCESEEK with DELETE. Note that, FORCESEEK cannot be used for a table that is the target of an INSERT, UPDATE, or DELETE statement only when it is used with index parameters.

It can be used when we use the format DELEETE FROM

Here is a demo of using DELETE with FORCESEEK

create table #t(id int not null, f int)
GO
create unique clustered index ixuc__id on #t(id)
GO
insert #t (id) 
select top 100 ROW_NUMBER() over(order by (select null))
from sys.objects
GO
select * from #t
GO

delete #t  WITH (FORCESEEK) where id = 16
GO
-- The FORCESEEK hint is not allowed for target tables of INSERT, UPDATE, or DELETE statements.

-- Workaround: Using the format DELETE FROM
delete t 
FROM #t t with(forceseek) -- you can confirm that this use seek and you can use forcescan and compare the EP
where t.id = 16
GO -- OK

-- Same with UPDATE

update #t with(forceseek) set f = 2
where id = 10
GO
-- The FORCESEEK hint is not allowed for target tables of INSERT, UPDATE, or DELETE statements.

update t set t.f = 2
from #t t with(forceseek)
where t.id = 10
GO -- OK

Ronen Ariely 15,206 Reputation points

2021-12-16T06:00:03.183+00:00

https://learn.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-table?view=sql-server-ver15#examples

Answer 3

I've been looking at the deadlocks occurring, and I will have to admit that I cannot exactly pinpoint why they are happening. Then again, the analysis may be different for your actual case.

At first, just like Ronen, I was not able to repro the deadlocks at all. However, once I fired up Profiler and had the RPC:Completed event, I could get deadlocks. Profiler here serves as a go-slower button.

The deadlock often involve more than two processes, but here is a two process deadlock:

<resource-list>
   <keylock hobtid="576460752593485824" dbid="2" objectname="tempdb.dbo.test" indexname="idx" id="lock18dc6fb0980" mode="U" associatedObjectId="576460752593485824">
    <owner-list>
     <owner id="process18dbf830ca8" mode="U"/>
    </owner-list>
    <waiter-list>
     <waiter id="process18dc25664e8" mode="U" requestType="wait"/>
    </waiter-list>
   </keylock>
   <keylock hobtid="576460752593485824" dbid="2" objectname="tempdb.dbo.test" indexname="idx" id="lock18dc6fac480" mode="U" associatedObjectId="576460752593485824">
    <owner-list>
     <owner id="process18dc25664e8" mode="U"/>
    </owner-list>
    <waiter-list>
     <waiter id="process18dbf830ca8" mode="U" requestType="wait"/>
    </waiter-list>
   </keylock>

So my theory about page locks was wrong. It is all about U locks on key level. All processes are scanning the table, so each process will take a U lock on a row, release it, take the next lock etc. However, once it gets a match on Id, it will retain the U lock on that row. But it will continue scanning and attempt to acquire/release locks on the remaining rows as well.

If all processes would scan the rows in the same order, starting on the same point, there cannot be any deadlocks. The first process that finds an Id, will block all other processes that point.

And naïvely, I though that this how it works. But if that assumption are wrong, and the access order or the starting point is different, there can be deadlocks, because two processes each find a row and then they meet each other, not being able to move on.

Why would order or starting point be different? Starting points could be different due to merry-go-around scans, that is, a process piggybacks on an existing scan. But maybe there are other possible reasons.

I only tried this under READ COMMITTED LOCK. But I can't see that this would be any different under SNAPSHOT isolation. If you don't see this under SNAPSHOT, that could due to a timing issue. So if the deadlocks you are actually having problem with are like this one, snapshot is not the solution.

On the other hand, the FORCESEEK hint seems like a valid solution to me. You say:

This can be used for updates only, deletes and (bulk)inserts does not accept this hint.

No problem with DELETE:

DELETE test FROM test WITH(FORCESEEK) WHERE id = 99

But you need to use the FROM clause.

On an INSERT it is of course more difficult to apply this hint, but I don't see why it would be needed.

To reiterate, I would not be surprised that the deadlocks you see in your actual environment are different, so I would encourage you to share the deadlock XML for this. That is, the graph you posted above is useless; it hides a lot of information.

Share via

Deadlock when modifying different rows of small tables using read committed snapshot isolation

3 answers

Your answer