Deadlock when modifying different rows of small tables using read committed snapshot isolation

Forján Csaba 1 Reputation point
2021-12-15T02:32:26.26+00:00

Suppose we have a simple table with an index:

create table test (
        id bigint not null,
        col1 nvarchar(100) not null,
        col2 nvarchar(100),
        col3 nvarchar(100),
        col4 nvarchar(100),
        index idx (id, col1))

Total number of records in the table is low (<1000).
There are mutiple transactions running modifying (insert, update, delete) a disjunct set of of records (= the same record is not affected by 2 transactions).
Isolation level is read committed snapshot.
Lock escalation does not occur. Or at least no lock_escalation event is captured by the extended events session subscribed to those events.

In the above scenario we experience frequent deadlocks.
The problem can be demonstrated running the follwing C# program using the "mcr.microsoft.com/mssql/server:2019-latest" docker image:

using Microsoft.Data.SqlClient;
using System;
using System.Linq;
using System.Threading.Tasks;

namespace deadlock2
{
    internal class Program
    {
        const string connectionString = "Data Source=(local);Initial Catalog=Dummy;User ID=sa;Password=test";
        private static Random rng = new Random();

        static void Main(string[] args)
        {
            CreateTestData();

            for (int i = 0; i < 10; i++)
            {
                DoWork();
            }
        }

        private static void DoWork()
        {
            Parallel.ForEach(Enumerable.Range(850, 50).OrderBy(x => rng.Next()), number =>
            {
                var queryString = @"
                    update t 
                    set col1 = @Id
                    from test t with (FORCESCAN)
                    where id = @Id
                ";

                using (SqlConnection connection = new SqlConnection(connectionString))
                {
                    connection.Open();
                    var t = connection.BeginTransaction(System.Data.IsolationLevel.ReadCommitted);

                    SqlCommand command = new SqlCommand(queryString, connection, t);
                    command.Parameters.AddWithValue("@Id", number);
                    command.ExecuteNonQuery();

                    t.Commit();
                }
            });
        }

        private static void CreateTestData()
        {

            var queryString = "insert into test values (@Id, @Col1, @Col2, @Col3, @Col4)";

            using (SqlConnection connection = new SqlConnection(connectionString))
            {
                connection.Open();
                var t = connection.BeginTransaction();

                for (var i = 0; i < 1000; i++)
                {
                    // Create the Command and Parameter objects.
                    SqlCommand command = new SqlCommand(queryString, connection, t);
                    command.Parameters.AddWithValue("@Id", i);
                    command.Parameters.AddWithValue("@Col1", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa");
                    command.Parameters.AddWithValue("@Col2", "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb");
                    command.Parameters.AddWithValue("@Col3", "ccccccccccccccccccccccccccccccccccc");
                    command.Parameters.AddWithValue("@Col4", "ddddddddddddddddddddddddddddddddddd");
                    command.ExecuteNonQuery();
                }

                t.Commit();
            }
        }
    }
}

The FORCESCAN hint is only placed here as a demonstration. Our live code does not use hints but behaves the same. The workarounds that fix the problem above, does fix our live code as well e.g.:

  • Using FORCESEEK hints. This can be used for updates only, deletes and (bulk)inserts does not accept this hint.
  • When the table gets through the initial loading phase the deadlocks disappear. This is a solution for the prodcution server but the test database will always run on an almost empty table with a few hundred records.
  • There is no deadlock using Snapshot isolation, but our client is strictly against enabling this...

According to our analysis the problem here is that for small tables the query optimizer might ignore indexes and use table/index scans to find records affected. These scans somehow end up blocking and/or getting blocked by the exclusive locks the modifications place on the rows. We believe that using read committed snapshot isolation level this blocking shall never happen (same way as it does not happen with snapshot isolation).

While we try to persuade our client to enable snapshot isolation for us, is there a fix or a good workaround we can use to mitigate this issue that works with updates, deletes and (bulk)inserts as well at the same time? (We obviously want to keep running concurrent transactions, limiting conurrency is not an option)

SQL Server Other
{count} votes

3 answers

Sort by: Most helpful
  1. Erland Sommarskog 121.4K Reputation points MVP Volunteer Moderator
    2021-12-15T23:06:54.15+00:00

    Unfortunately, I don't have the time to try the repro right now, but I can see how it happens. I would guess it is most likely to be related to page locks, and with page locks you can get deadlocks even when unrelated rows are updated, if they are on the same page.

    Since this is about updating operations, READ COMMITTED SNAPSHOT is not likely to help. The processes needs to first take U locks on the rows/pages which are then converted to X locks. I'm a little surprised when you say that snapshot isolation helps.

    I'll try to see if I can find some time for this tomorrow night.

    0 comments No comments

  2. Ronen Ariely 15,206 Reputation points
    2021-12-16T05:44:32.02+00:00

    Using FORCESCAN means that the server must scan all the index and cannot use index seek. This directly leads in many cases to the deadlocks of the key as you have here. With that said, I did not reproduce the scenario since it based on C# app. Therefore, this is a general comment of optional reason (which fits your description).

    Using FORCESEEK hints. This can be used for updates only, deletes and (bulk)inserts does not accept this hint.

    Actually, this is not correct and we can use FORCESEEK with DELETE. Note that, FORCESEEK cannot be used for a table that is the target of an INSERT, UPDATE, or DELETE statement only when it is used with index parameters.

    It can be used when we use the format DELEETE FROM

    Here is a demo of using DELETE with FORCESEEK

    create table #t(id int not null, f int)
    GO
    create unique clustered index ixuc__id on #t(id)
    GO
    insert #t (id) 
    select top 100 ROW_NUMBER() over(order by (select null))
    from sys.objects
    GO
    select * from #t
    GO
    
    delete #t  WITH (FORCESEEK) where id = 16
    GO
    -- The FORCESEEK hint is not allowed for target tables of INSERT, UPDATE, or DELETE statements.
    
    -- Workaround: Using the format DELETE FROM
    delete t 
    FROM #t t with(forceseek) -- you can confirm that this use seek and you can use forcescan and compare the EP
    where t.id = 16
    GO -- OK
    
    -- Same with UPDATE
    
    update #t with(forceseek) set f = 2
    where id = 10
    GO
    -- The FORCESEEK hint is not allowed for target tables of INSERT, UPDATE, or DELETE statements.
    
    update t set t.f = 2
    from #t t with(forceseek)
    where t.id = 10
    GO -- OK
    

  3. Erland Sommarskog 121.4K Reputation points MVP Volunteer Moderator
    2021-12-16T21:17:12.767+00:00

    I've been looking at the deadlocks occurring, and I will have to admit that I cannot exactly pinpoint why they are happening. Then again, the analysis may be different for your actual case.

    At first, just like Ronen, I was not able to repro the deadlocks at all. However, once I fired up Profiler and had the RPC:Completed event, I could get deadlocks. Profiler here serves as a go-slower button.

    The deadlock often involve more than two processes, but here is a two process deadlock:

    <resource-list>
       <keylock hobtid="576460752593485824" dbid="2" objectname="tempdb.dbo.test" indexname="idx" id="lock18dc6fb0980" mode="U" associatedObjectId="576460752593485824">
        <owner-list>
         <owner id="process18dbf830ca8" mode="U"/>
        </owner-list>
        <waiter-list>
         <waiter id="process18dc25664e8" mode="U" requestType="wait"/>
        </waiter-list>
       </keylock>
       <keylock hobtid="576460752593485824" dbid="2" objectname="tempdb.dbo.test" indexname="idx" id="lock18dc6fac480" mode="U" associatedObjectId="576460752593485824">
        <owner-list>
         <owner id="process18dc25664e8" mode="U"/>
        </owner-list>
        <waiter-list>
         <waiter id="process18dbf830ca8" mode="U" requestType="wait"/>
        </waiter-list>
       </keylock>
    

    So my theory about page locks was wrong. It is all about U locks on key level. All processes are scanning the table, so each process will take a U lock on a row, release it, take the next lock etc. However, once it gets a match on Id, it will retain the U lock on that row. But it will continue scanning and attempt to acquire/release locks on the remaining rows as well.

    If all processes would scan the rows in the same order, starting on the same point, there cannot be any deadlocks. The first process that finds an Id, will block all other processes that point.

    And naïvely, I though that this how it works. But if that assumption are wrong, and the access order or the starting point is different, there can be deadlocks, because two processes each find a row and then they meet each other, not being able to move on.

    Why would order or starting point be different? Starting points could be different due to merry-go-around scans, that is, a process piggybacks on an existing scan. But maybe there are other possible reasons.

    I only tried this under READ COMMITTED LOCK. But I can't see that this would be any different under SNAPSHOT isolation. If you don't see this under SNAPSHOT, that could due to a timing issue. So if the deadlocks you are actually having problem with are like this one, snapshot is not the solution.

    On the other hand, the FORCESEEK hint seems like a valid solution to me. You say:

    This can be used for updates only, deletes and (bulk)inserts does not accept this hint.

    No problem with DELETE:

    DELETE test FROM test WITH(FORCESEEK) WHERE id = 99
    

    But you need to use the FROM clause.

    On an INSERT it is of course more difficult to apply this hint, but I don't see why it would be needed.

    To reiterate, I would not be surprised that the deadlocks you see in your actual environment are different, so I would encourage you to share the deadlock XML for this. That is, the graph you posted above is useless; it hides a lot of information.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.