AG Listener Resolving to offline IP with Service Broker

Andrew Bainbridge 1 Reputation point
2023-02-22T08:59:47.66+00:00

SQL Server 2017 EE CU23, hosted in AWS. There are 2 VPCS, akin to virtual DCs. In each VPC there is a 4 node Windows cluster hosting 5 instances of SQL Server.

In one of the databases, we use Service Broker to replicate data changes from one VPC to the other. The Service Broker route on the initiator resolves to the AG Listener of the target. We have RegisterAllIPProviders registry setting set so the listener has only one DNS A record, resolving to the online IP of the AG.

This configuration has been working for 3 years in AWS.

Last night we performed monthly patching and followed our usual run-book. The AGs in the secondary VPC were failed over to free up one of the hosts for reboot. Post reboot, I moved the AGs back to the host that had been rebooted in order to continue with the patching. However, Service Broker replication would not resume between the 2 DBs across VPCs.

I checked the AG Listener and it correctly resolved to the online IP for the AG (resolve-dnsname agl). I checked this from the initiator, and the SQL Server bastion hosts - everything resolved it correctly. The infrastructure team were also able to confirm a single A record for the AG listener in DNS. I flushed DNS cache on the initiator, but although the host resolved to the correct IP, Service Broker was still not replicating.

I checked the transmission queue on the initiator, and the transmission_status was empty. I attached a profiler trace (with SB events) to the target AG primary replica, and there was nothing hitting that instance at all. I ran profiler against the target AG secondary replica (r/o) and could see all the messages were still being routed there. I was seeing this error:

The message has been dropped because the service broker in the target database is unavailable: 'The database is in read-only mode.'.

I have also moved the AGs back round, rebooted again, restarted SQL Server, but no joy. We don't use read-only routing, secondary replicas are non-readable.

Any ideas? Thanks

SQL Server | Other
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Andrew Bainbridge 1 Reputation point
    2023-02-27T14:30:31.8666667+00:00

    Following the AG failover, running the following command showed broker connections still established with the Service Broker endpoint on the old primary replica.

    select is_active from sys.dm broker connections

    Restarting the Service Broker endpoint force closed the connection, so the messages were correctly received on the new primary.

    alter endpoint xxx state = stopped;

    go

    alter endpoint xxx state = started;

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.