Unhandled Deadlocks with SQL Server 2017

Question

We’re running into issues with unresolved deadlocks on the most recent version of SQL-Server 2017, has anybody seen a problem like this before?

Detailed explanation: We migrated our database from Sybase-ASE to SQL Server 2017 (RTM-CU23) two weeks ago. In that time there have been a number of deadlocks reported in the System Health Events, three of which SQL Server was unable to kill, resulting in our business processes being blocked. We had to kill the SPID by hand, so that processing was able to continue. The queries themselves are rather complex and return a large resultset, because of this SQL-Server runs several threads in parallel (we are not limiting “maxdop”). The queries causing the deadlock were not the same every time, but they always included parallelism and tables with triggers.

During an unresolved deadlock, the “xml_deadlock_report” event, which is generated every 100ms, shows the “waittime” of most stuck threads increasing, otherwise the status and content does not change. One of the stuck threads however has a waittime which does not increase but varies between 96 and 105ms. In the “xml_deadlock_report” the process owning this thread is reported as the “victim” (shown in the picture below).The problem is that this process does not actually get killed.

This problem looks related to the recent KB Entries KB4338715 and KB4089473, but as noted above we're running the latest version. We’re looking for any ideas or hints that could help us.

Accepted Answer

If we overlook the parallelism aspect of it, the pattern of the deadlock does not seem to unfamiliar. You have a reader that reads an entire table or a view. You have a writer that performs a multi-statement transaction. I can tell, because the current batch started later than the current transaction did.

The reader get stuck on a row, that the writer has updated but not yet committed. Eventually the writer gets blocked by the reader, because the reader has not yet released its lock. Which in this case may be due to that there are multiple reader threads, so that one thread is one waiting for the blocked thread to complete.

There are are two things to look into. First, why is the writer running a transaction that spans multiple calls to SQL Server? That does not have to be wrong, but I note that the statement updates a single row. Is the writer running a loop to update rows one-by-one when it should update all at once in a set-based operation? This would not remove the risk for deadlock entirely, but it would reduce the window where the deadlock can occur considerably.

The other thing you should consider is to set the database into READ_COMMITTED_SNAPSHOT. With this setting, the READ COMMITTED isolation level is implemented by reading from a version store in tempdb. In this mode, readers and writers do not block each other. The reader will get the result that was committed when then they query started. This is often good enough, but there are also situations where this is not acceptable, so you need to understand the consequences. The fix as such is simple:

ALTER DATABASE db SET SINGLE_USER WITH ROLLBACK IMMEDIATE
ALTER DATABASE db SET READ_COMMIITED_SNAPSHOT ON
ALTER DATABASE db SET MULTI_USER

Obviously, you need to find a maintenance to run this.

Finally, I should say that I have seen unresolved deadlocks with parallelism. But the case I investigated was even worse, because the deadlock was not even detected.

Answer

Deadlocks? About everyone encounters them, so welcome to the club.

However, deadlock which SQL Server cannot resolve are (thankfully) less common. I've seen deadlocks that have taken a very long time to resolve, because may processes have been involved, and SQL Server rolled back one at a time.

One possibility for a deadlock that does not resolves is that there is a CATCH handler in the process which is doing something stupid and does not roll back the transaction. But if you have just moved over from Sybase, I would not really expect that.

The deadlock graph may be pretty, but it is only including of a subset in the deadlock report event. Any chance that you can get hold of the XML and attach it?

Share via

Unhandled Deadlocks with SQL Server 2017

1 additional answer