How does Transaction rollback takes place when the transaction has "HADR_SYNC_COMMIT" wait type in an AlwaysonAvailability groups with synchronous replica ?

rahul G 1 Reputation point
2021-05-20T12:35:44.837+00:00

Question: How does Transaction rollback takes place when the transaction has "HADR_SYNC_COMMIT" wait type in an AlwaysonAvailability groups with synchronous replica ?

Environment & Description: There are 2 physical servers(Windows Server 2016 Std. Ed.) and SQL Server 2017 Ent. Ed. (CU22-GDR ) installed on each of these nodes.These 2 instances are configured through AlwaysOn Availability Group. One replica is primary and the other is a sync non-readable secondary.

Please refer to below example to better understand the issue.

Application command timeout: 15 secs

--> 10:00:00 AM: Insert command executed at the primary replica. --> 10:00:05 AM: Log hardening takes place on the secondary replica. --> 10:00:10 AM: Transaction shows "HARD_SYNC_COMMIT" wait type & log hardening is still in progress. --> 10:00:15 AM: Application command timeout occurs --> 10:00:20 AM: Transaction shows "HARD_SYNC_COMMIT" wait type & log hardening is still in progress. --> 10:00:25 AM: Log is hardened on the Secondary replica and the same data is commited on the primary replica.

Note:Though the application command timeout occurred at 10:00:15 AM we still see the data is inserted in the table on the primary replica.

Developer technologies | Transact-SQL
SQL Server | Other
{count} votes

4 answers

Sort by: Most helpful
  1. Erland Sommarskog 121.8K Reputation points MVP Volunteer Moderator
    2021-05-20T22:12:19.683+00:00

    The application stops "accepting" response from SQL Server, but the connection is still open and functional. SQL Server continues to run the command to completion.

    This is not correct. The timeout error occurs in the client API, so far so good. But SQL Server does not continue executing, because the client sends SQL Server an Attention signal which cancels the current execution. The current statement will be rolled back. However, any transaction will be left open, unless the setting XACT_ABORT is on.

    What actually happens in Rahul's case is hard to tell. Maybe client committed the transaction despite the error. Maybe there never was any transaction in the first place.

    In any case, the secondary replica does not change how the rollback works. It may add some waits, but the rollback will still happen. If there is any, that is.

    1 person found this answer helpful.
    0 comments No comments

  2. Dan Guzman 9,406 Reputation points
    2021-05-21T11:16:18.173+00:00

    My take on the sequence of events you observed is:

    1) the transaction was successfully committed on the primary node
    2) the hardening of the transaction on the synchronous replica took an unusually long 20 seconds
    3) the client API timed out and sent an attention request to SQL Server during the secondary hardening
    4) the attention request was honored but the transaction was not rolled back because it was already committed on the primary
    5) the log is hardened on the secondary

    The below documentation excerpt explains in more detail How Synchronization Works on a Secondary Replica. The text I bolded explains why the transaction was not rolled back in your scenario.

    Synchronous operation is maintained in the following manner:

    On receiving a transaction from a client, the primary replica writes the log for the transaction to the transaction log and concurrently sends the log record to the secondary replicas.

    Once a log record is written to the transaction log of the primary database, the transaction can be undone only if there is a failover at this point to a secondary that did not receive the log. The primary replica waits for confirmation from the synchronous-commit secondary replica.

    The secondary replica hardens the log and returns an acknowledgement to the primary replica.

    On receiving the confirmation from the secondary replica, the primary replica finishes the commit processing and sends a confirmation message to the client.

    Had the secondary HADR session actually timed-out, the secondary node would become DISCONNECTED. But the transaction would be hardened on the primary regardless.

    1 person found this answer helpful.
    0 comments No comments

  3. Tom Phillips 17,771 Reputation points
    2021-05-20T13:05:56.18+00:00

    The timeout is a "client side" timeout. It does not affect the running command, especially when using connection pooling. The application stops "accepting" response from SQL Server, but the connection is still open and functional. SQL Server continues to run the command to completion.

    This can cause problems in the application. In some cases, the app gets an error "timeout error - try again". So the user clicks "save" (or whatever) again and gets a "duplicate key error" or worse duplicate data inserted.

    0 comments No comments

  4. CarrinWu-MSFT 6,891 Reputation points
    2021-05-21T10:10:54.147+00:00

    Hi @rahul G ,

    Welcome to Microsoft Q&A!

    Please refer to this blog to get more information, see below:

    The client application completes an update on the primary replica successfully, but querying the secondary replica shows that the change is not reflected. This case assumes that your availability has a healthy synchronization state. In most cases, this behavior resolves itself after a few minutes.

    Best regards,
    Carrin


    If the answer is helpful, please click "Accept Answer" and upvote it.
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.