SQL Managed Instance Synching in Failover Group Stopped working

Question

SQL Managed Instance Synching in Failover Group Stopped working

TWeissenburg 25

We have a primary SQL Managed Instance in a failover group with a secondary read-only Managed Instance. Has worked great for 3 years. Transactions stopped synching 4 days ago. sys.dm_geo_replication_link_status says its in CATCH_UP mode but we can see that nothing is being synched. Our logs on the primary are growing huge and data is not being delivered to the secondary. The portal indicates it is synching but we know it is not. The used storage amt on the primary is about 400GBs bigger now than the amt used on the secondary and they should be relatively the same. We made no changes to either server. Microsoft performed regular maintenance in the region where the secondary is located but that was yesterday. All prior maintenance by MS was done a few weeks ago. They cannot seem to help us. Where do we look to figure out what the cause is or how to fix it? What is the impact to the primary if we 'simply' remove the failover group entirely and rebuild it later? Any help is appreciated! Thank you

Anonymous

2024-08-16T06:52:57.5766667+00:00

Hi,

I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this!

Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others ".

In order to benefit all community members who are having this similar issue, could you please accept the answer we discussed below.

By doing so, it will benefit all community members who are having this similar issue. Your contribution is highly appreciated.

Best regards,

Lucy Chen

Answer accepted by question author

0 additional answers

Your answer

Anonymous

2024-08-16T06:52:57.5766667+00:00

Hi,

I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this!

Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others ".

In order to benefit all community members who are having this similar issue, could you please accept the answer we discussed below.

By doing so, it will benefit all community members who are having this similar issue. Your contribution is highly appreciated.

Best regards,

Lucy Chen

Answer 1

Hi @TWeissenburg,

Thank you for your reaching out and welcome to Microsoft Q&A!

The portal indicates it is synching but we know it is not

The cause of this issue maybe the master database too busy for single replication worker to handle all of the changes. You need to cluster your tables, or you can try applying the latest transaction logs on the secondary server to see if that would kick-start the recovery process:

-- Remove database from Availability Group:    
Alter Database [StackExchange.Bicycles.Meta] SET HADR OFF;
-- Apply t-logs to catch up. This can be done manually in SSMS or via:
RESTORE LOG [StackExchange.Bicycles.Meta] FROM DISK = '\\ny-back01\backups\SQL\_Trans\SENetwork_AG\StackExchange.Bicycles.Meta\StackExchange.Bicycles.Meta_LOG_20160217_033201.trn' WITH NORECOVERY;
-- Re-join database to availability group
ALTER DATABASE [StackExchange.Bicycles.Meta] SET HADR AVAILABILITY GROUP = [SENetwork_AG];
ALTER DATABASE [StackExchange.Bicycles.Meta] SET HADR RESUME;

If the method doesn't work, could you please post the detailed logs here to help us narrow down the issue? Thanks for your understanding. Your time and cooperation are much valued by us. We are looking forward to hearing from you to assist further.

What is the impact to the primary if we 'simply' remove the failover group entirely and rebuild it later?

A database that is included in a failover group is not dropped when the failover group is dropped. You can follow the steps in this article to remove the database in Failover Group.

If a secondary failover group is dropped, any database previously included in the group loses read-only protection and becomes writable.
If the secondary failover group is re-created from the same primary failover group as before, the databases in the group are overwritten by the databases in the primary failover group during the first refresh. These databases are read-only.

For more information, please refer to this official document.

Feel free to share your issues here if you have any concerns.

Best regards,

Lucy Chen

If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".

Note: Please follow the steps in our Documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

https://docs.microsoft.com/en-us/answers/support/email-notifications

TWeissenburg 25 Reputation points

2024-08-09T12:46:55.75+00:00

Lucy, Thank you for you response. Both members of the failover group are SQL Managed Instances. I do not have access to any logs that would typically be retrieved from the server where SQL is installed. I also do not think we can simply remove a database from a failover group. The failover group is for the entire Managed Instance. We will likely try to remove the failover group entirely and recreate it at a later date. The logs are growing too large since they cannot be truncated.
Anonymous

2024-08-12T01:36:12.7333333+00:00

The logs are growing too large since they cannot be truncated.

TWeissenburg, could you please show us your backup plan?
TWeissenburg 25 Reputation points

2024-08-12T02:03:59.6633333+00:00

Thanks for keeping up with our problem. Thankfully we determined that a private endpoint that had been added more than a week previous to the incident was the cause. We removed the endpoint and DNS entry and after an hour or so the synchronization started running again.
Anonymous

2024-08-12T02:11:09.67+00:00

Great to hear that you were able to solve the issue!

Feel free to share your issues here if you have any concerns!

Share via

SQL Managed Instance Synching in Failover Group Stopped working

0 additional answers

Your answer