question

KapilBhasin1686 avatar image
0 Votes"
KapilBhasin1686 asked CarrinWu-MSFT commented

Avoiding blocking of REDO thread in readable secondary replica

Hello,

We are currently running into this issue for one of our availability group design:-

On Primary :- Nightly Table switch/Purges are happening (DDL) for approx. 2-3 hours

On Secondary which is configured as readable replica there will be SELECT queries running every now and then , as we do not have a window defined for secondary, REDO thread on secondary tends to get blocked by those SELECT queries which keep coming.

Currently to mitigate this issue, manually kill all select queries which keep blocking REDO thread ( this is in turn block other readers) until we see last commit time or redone time from SSMS AG dashboard sync up with the time DDL operation is completed.

Database in question is RCSI

Questions:-

Is this expected way to keep killing SELECT if you don't have a defined maint'ce window for secondary server

How are you guys dealing with this situation? I mean is there a way or script available online that it is automated to KILL such transactions on secondary in FLY when DDL is running on primary and wait until committed on secondary "

sql-server-general
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @KapilBhasin1686, we have not get a reply from you. Did any answers could help you? If there has an answer helped, please do "Accept Answer". If not, please let us know. By doing so, it will benefit for community members who have this similar issue. Your contribution is highly appreciated. Thank you!

0 Votes 0 ·
ErlandSommarskog avatar image
1 Vote"
ErlandSommarskog answered KapilBhasin1686 commented

I don't have a complete answer for you, but I can explain why the SELECT queries blocks the REDO threads.

Since you are doing DDL, these operations requires Sch-M (schema modification) locks, and Sch-M locks are incompatible with everything else. Even if the SELECT queries run with RSCI, they still take Sch-S (schema stability) locks on the tables. This is required, because there is no versioning of the schema.

Exactly what DDL operations are you running?

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

thanks for your inputs. The DB in question has lot of partitioned tables. Nightly we run purge of those partitions. Like Alter table switch partitions. Problem is currently we do not have control on secondary to stop people from querying. So was looking what approach should we take rather than just manually killing sessions in secondary till those DDL commits happen or complete on secondary

0 Votes 0 ·
ErlandSommarskog avatar image
0 Votes"
ErlandSommarskog answered ErlandSommarskog commented

A better option may be to run

ALTER DATABASE db SET RESTRICTED_USER WITH ROLLBACK IMMEDIATE

on the secondary. This will still kill the running queries, but in one single go, and not by manually killing. I am assuming here that your ALTER TABLE SWITCH operations can be run in one single batch in your purging scheme and they are not interleaved by operations that take longer time. This minimises the downtime for the readable secondary.

I would not be surprised if there are better alternatives. I don't work much with AGs myself, so my own experience is limited. I'm thinking that if you have more than one secondary, you could more the secondary role to another node in the AG after the purge operation. The node will not block the REDO threads, and once the readers are out of the other node, the node will catch up. But I am not sure if this works out in practice.

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Can we run this command on "ALTER DATABASE db SET RESTRICTED_USER WITH ROLLBACK IMMEDIATE" on readable , i thought you cannot

0 Votes 0 ·

It thought we could. :-)

But to be honest, that may be a naïve assumption on my part, and I have been wrong before. I don't have any good AG at home to test on.

Anyway, you could try it - if you get an error message- just forget that I ever opened my mouth.


0 Votes 0 ·
CarrinWu-MSFT avatar image
0 Votes"
CarrinWu-MSFT answered

Hi @KapilBhasin1686,

Welcome to Microsoft Q&A!

The steps in troubleshooting blocking as below, please get more details from Understand and resolve SQL Server blocking problems:
1.Identify the main blocking session (head blocker)
2.Find the query and transaction that is causing the blocking (what is holding locks for a prolonged period)
3.Analyze/understand why the prolonged blocking occurs
4.Resolve blocking issue by redesigning query and transaction

About SELECT queries blocks the REDO threads, please refer to Troubleshoot: Availability group exceeded RTO, see below:
86665-redo.png


Best regards,
Carrin


If the answer is helpful, please click "Accept Answer" and upvote it.
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.



redo.png (104.4 KiB)
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.