Azure MySQL Replication Lag Issues

Andrew Bolton 5 Reputation points
2023-11-15T10:14:08.1133333+00:00

I have two Azure Database for MySQL single servers running in our Azure cloud, these are accessed from a number of virtual machines running Ubuntu (code base is mostly PHP), also within the same cloud. The virtual machines use ProxySQL to access the two MySQL databases with all write queries directed to server 1 and read queries directed to server 2. There is a replication process running from server 1 to server 2.

Until the evening of 25/10/2023 this generally worked fine with the replication lag generally measured in milliseconds. However something changed around 2100 than means the replication lag can now be well over 2 or 3 minutes. The chart below shows the lag time as reported by server 2 in the Azure Metrics - red is the maximum, blue is the minimum.

User's image

As far as I am aware, nothing changed on these servers around this time. There were no changes to the VMs that connect to the database which means the number of queries running and volume of data being processed has not changed. As an example of this, the chart below shows one of the queries that was running at the time - the blue line shows the number of rows affected which remains relatively constant (although cyclic), the orange line shows the run time - it jumps significantly on 25/10 around 21:00. As noted above, there were no changes made to this query at this time.

User's image

I've tried the obvious things like restarting the SQL servers and the VMs but this hasn't changed anything. There appears to be sufficient storage (usage is around 33% on both servers). CPU load does peak a little at times but this has always been the case. There has been an increase in CPU load on server 1 from 26/10 21:00 (marked by the grey line) that matches the same time the issues began to occur. There is no corresponding change on server 2.

User's image

The only change I'm aware of that happened around this time was in one of our data centres where the proxy servers were changed. This took place around 12:00 on 25/10 but I've discounted this as a factor because of the following:

  • Data centre change took place at 12:00, server problems occurred at 21:00
  • The connection from the Azure Resource Group to the data centre is only used for one destination which is routed via a specific IP address and two dedicated reverse proxy servers. The final one of these had not been updated with the changes to the data centre proxy which meant the link had stopped working. This was fixed around 23:00 on 25/10, this is after the database issue started occurring and there are no significant changes around 23:00.
  • My understanding is that the connections to the MySQL server from the Azure VMs will remain in the Azure infrastructure and would not be routed via our data centre.
Azure Database for MySQL
Azure Database for MySQL
An Azure managed MySQL database service for app development and deployment.
742 questions
{count} votes