High availability in Azure Database for MySQL

By using Azure Database for MySQL Flexible Server, you can configure high availability with automatic failover. This solution ensures that failures never cause loss of committed data and that the database isn't a single point of failure in your software architecture. When you configure high availability, Flexible Server automatically provisions and manages a standby Hyper-V replica. You pay for the provisioned compute and storage for both the primary and secondary replicas. Two high-availability architectural models are available:

Zone-redundant high availability. This option provides complete isolation and redundancy of infrastructure across multiple availability zones. It offers the highest level of availability, but it requires you to configure application redundancy across zones. Choose zone-redundant high availability when you want to protect against any infrastructure failure in the availability zone and when latency across the availability zone is acceptable. You can enable zone-redundant high availability only when you create the server. Zone-redundant high availability is available in a subset of Azure regions where the region supports multiple availability zones and zone-redundant Premium file shares are available.
Local-redundant high availability. This option provides infrastructure redundancy with lower network latency because the primary and standby servers are in the same availability zone. It offers high availability without the need to configure application redundancy across zones. Choose local-redundant high availability when you want to achieve the highest level of availability within a single availability zone with the lowest network latency. Local-redundant high availability is available in all Azure regions where you can use Azure Database for MySQL Flexible Server.

Zone-redundant high-availability (HA) architecture

When you deploy a server with zone-redundant high availability, Azure creates two servers:

A primary server in one availability zone.
A standby replica server in another availability zone of the same Azure region. The standby replica server has the same configuration as the primary server, including the compute tier, compute size, storage size, and network configuration.

You can choose the availability zone for both the primary server and the standby replica. Placing the primary server and the standby server in the same zone reduces latency, whereas placing them in different zones helps you prepare for disaster recovery situations and zone-down scenarios.

The data and log files are hosted in zone-redundant storage (ZRS). The standby server continuously reads and replays the log files from the primary server's storage account, which storage-level replication protects.

If a failover occurs:

The standby replica activates.
The binary log files of the primary server continue to apply to the standby server to bring it online to the last committed transaction on the primary server.

Logs in ZRS are accessible even when the primary server is unavailable. This availability helps to ensure there's no loss of data. After the standby replica activates and binary logs are applied, the current standby replica server takes the role of the primary server. DNS updates so that client connections direct to the new primary when the client reconnects. The failover is fully transparent from the client application and doesn't require any action from you. The HA solution then brings back the old primary server when possible and places it as a standby.

You use the database server name to connect applications to the primary server. The solution doesn't expose standby replica information for direct access. Commits and writes are acknowledged after the log files are flushed at the primary server's ZRS. Because of the sync replication technology used in ZRS storage, you can expect 5-10 percent increased latency for application writes and commits.

The primary database server automatically backs up both snapshots and log backups on zone-redundant storage.

Local-redundant high-availability (HA) architecture

When you deploy a server with local-redundant HA, you create two servers in the same zone:

A primary server
A standby replica server that has the same configuration as the primary server (compute tier, compute size, storage size, and network configuration)

The standby server provides infrastructure redundancy by using a separate virtual machine (compute). This redundancy reduces failover time and network latency between the application and the database server because of colocation.

The data and log files are hosted in locally redundant storage (LRS). The standby server continuously reads and replays the log files from the primary server's storage account, which is protected by storage-level replication.

If a failover occurs:

The standby replica activates.
The binary log files of the primary server continue to apply to the standby server to bring it online to the last committed transaction on the primary server.

Logs in LRS are accessible even when the primary server is unavailable. This availability helps to ensure there's no loss of data. After the standby replica activates and binary logs are applied, the current standby replica takes the role of the primary server. DNS is updated to redirect connections to the new primary when the client reconnects. The failover is fully transparent from the client application and doesn't require any action from you. The HA solution then brings back the old primary server when possible and places it as a standby.

The database server name connects applications to the primary server. Standby replica information isn't exposed for direct access. Commits and writes are acknowledged after the log files are flushed at the primary server's LRS. Because the primary and the standby replica are in the same zone, there's less replication lag and lower latency between the application server and the database server. The local-redundant setup doesn't provide high availability when dependent infrastructures are down for the specific availability zone. There's downtime until all dependent services are back online for that availability zone.

The primary database server automatically backs up both snapshots and log backups to locally redundant storage.

Note

For both zone-redundant and local-redundant HA:

If a failure occurs, the time needed for the standby replica to take over the role of primary depends on the time it takes to replay the binary log from the primary storage account to the standby. To reduce failover time, use primary keys on all tables. Failover times typically take between 60 and 120 seconds.
The standby server isn't available for read or write operations. It's a passive standby to enable fast failover.
Always use a fully qualified domain name (FQDN) to connect to your primary server. Avoid using an IP address to connect. If a failover occurs, after the primary and standby server roles are switched, a DNS A record might change. That change prevents the application from connecting to the new primary server if an IP address is used in the connection string.

Migrate from an existing server to a zone-redundant server

If you originally provisioned your Azure Database for MySQL server as a non-HA server, you can enable it for locally redundant HA architecture. However, if you want to enable it for zone-redundant HA architecture, you need to create a new server with your desired configuration and migrate to it by following these steps:

Create a new server with zone-redundant high availability enabled by following the instructions for your preferred deployment tool:
- Azure portal: Manage zone redundant high availability in Azure Database for MySQL with the Azure portal
- Azure CLI: Manage zone redundant high-availability in Azure Database for MySQL with Azure CLI
Migrate your workload to the new server by following one of these approaches. Depending on the migration approach, downtime might be required.
- Offline migration approaches: If your application can afford some downtime, offline migrations are always the preferred choice, as they're simple and easy to execute. With an offline migration, the source server is taken offline, and a dump and restore of the databases are performed on the target server. This option requires the most downtime. The duration of the downtime is determined by the time it takes to perform the restoration on the target server.
  - Data Migration Service (DMS): To learn how to use DMS, see Migrate from MySQL to Azure Database for MySQL offline using DMS via the Azure portal.
    
    Although the tutorial outlines steps for migrating from an on-premises MySQL server to Azure Database for MySQL, you can use the same procedure for migrating data from one Azure Database for MySQL server that doesn't support availability zones to another that supports availability zones.
  - Open-source tools: You can migrate offline by using open-source tools such as MySQL Workbench, mydumper/myloader, or mysqldump to back up and restore the database. For information on how to use these tools, see Options for migrating Azure Database for MySQL - Single Server to Flexible Server. Although the tutorial outlines steps for migrating from Azure MySQL Single Server to Flexible Server, you can use the same procedure for migrating data from one Azure Database for MySQL Flexible Server that doesn't support availability zones to another that supports availability zones.
- Online migration approaches: Online migrations minimize application downtime. The source server allows updates, and the migration solution replicates the ongoing changes between the source and target server along with the initial dump and restore on the target. However, these approaches are more complex to implement than an offline migration.
  - Data Migration Service (DMS): To learn how to use DMS, see Migrate from MySQL to Azure Database for MySQL online using DMS via the Azure portal.
    
    Although the tutorial outlines steps for migrating from an on-premises MySQL server to Azure Database for MySQL, you can use the same procedure for migrating data from one Azure Database for MySQL server that doesn't support availability zones to another that supports availability zones.
  - Open-source tools: You can use a combination of open-source tools such as mydumper/myloader together with Data-in replication.

Failover process

During the failover process in Azure Database for MySQL, the system automatically switches from the primary server to the standby replica. This switch ensures continuity and minimizes downtime. When the system detects a failure, it promotes the standby replica to become the new primary server. The system applies the binary log files from the original primary server to the standby replica. This process synchronizes the standby replica with the last committed transaction and ensures no data loss. This seamless transition helps maintain high availability and reliability of the database service.

Note

To reduce failover time dependency on DNS caching, starting October 2025, all new high availability servers created with public access or private link adopt the new architecture featuring a dedicated SLB for each high availability server. By managing the MySQL data traffic path, SLB eliminates the need for DNS changes during failover and significantly improves failover performance. It redirects traffic to the current primary instance during failover by using load-balancing rules. Existing servers with public access or private link are migrating gradually to minimize impact. Customers who prefer early migration can disable and re-enable high availability. This feature isn't supported for servers using private access with VNet integration.

Planned: Forced failover

Azure Database for MySQL Flexible Server forced failover enables you to manually force a failover. This capability allows you to test the functionality with your application scenarios and helps you prepare for outages.

Forced failover triggers a failover that activates the standby replica to become the primary server by using the same database server name and updating the DNS record. The original primary server restarts and switches to the standby replica. Client connections disconnect and need to reconnect to resume their operations.

The overall failover time depends on the current workload and the last checkpoint. In general, it takes between 60 and 120 seconds.

Note

An Azure Resource Health event is generated during a planned failover. The event represents the failover time during which the server is unavailable. You can see the triggered events when selected on Resource Health in the left pane. The status represents user-initiated or manual failover as Unavailable and tagged as Planned. For example, A failover operation was triggered by an authorized user (Planned). If your resource remains in this state for an extended period, open a support ticket and we assist you.

Unplanned: Automatic failover

Unplanned service downtime can occur due to software bugs or infrastructure faults, such as compute, network, or storage failures. Power outages can also affect the availability of the database. If the database becomes unavailable, replication to the standby replica stops, and the standby replica becomes the primary database. DNS updates occur, and clients reconnect to the database server, resuming their operations.

The overall failover time is usually between 60 and 120 seconds. However, depending on the activity in the primary database server at the time of the failover (such as large transactions and recovery time), the failover might take longer.

Note

An unplanned failover generates a Resource Health event. The event represents the failover time when the server is unavailable. You can see the triggered events when you select Resource Health in the left pane. Automatic failover shows a status of Unavailable and is tagged as Unplanned.

For example, Unavailable: A failover operation was triggered automatically (Unplanned). If your resource stays in this state for a long time, open a support ticket and we help you.

How automatic failover detection works in HA enabled servers

The primary server and the secondary server each have two network endpoints:

Customer Endpoint: Customers connect and run queries on the instance by using this endpoint.
Management Endpoint: Used internally for service communications to management components and to connect to backend storage.

The health monitor component continuously does the following checks:

The monitor pings the node's management network endpoint. If this check fails two times in a row, it triggers an automatic failover operation. This health check addresses scenarios such as node unavailability or nonresponsiveness due to OS problems, networking problems between management components and nodes, and similar problems.
The monitor runs a simple query on the instance. If the queries fail to run, automatic failover triggers. This health check addresses scenarios such as MySQL daemon crashes, stops, or hangs, and backend storage problems and similar problems.

Note

The health check doesn't monitor networking problems between the application and the customer networking endpoint (Private/Public access). These problems can occur in the networking path, on the endpoint, or in DNS problems on the client side. If you use private access, make sure that the NSG rules for the virtual network don't block communication to the instance customer networking endpoint on port 3306. For public access, make sure that the firewall rules are set and network traffic is allowed on port 3306 (if the network path has any other firewalls). You also need to take care of DNS resolution from the client application side.

Monitor high availability

To check the server's high-availability configuration status, use the high-availability Status in the server's high-availability pane in the portal.

Status	Description
NotEnabled	High availability isn't enabled.
ReplicatingData	Standby server synchronizes with the primary server during high availability server provisioning or when you enable the high availability option.
FailingOver	The database server is failing over from the primary to the standby.
Healthy	High availability option is enabled.
RemovingStandby	The deletion process is underway when you disable the high availability option.

To monitor the health of the high availability server, use the following metrics.

Metric display name	Metric	Unit	Description
HA `IO` Status	ha_io_running	State	HA `IO` Status shows the state of HA replication. The metric value is 1 if the I/O thread is running and 0 if not.
HA SQL Status	ha_sql_running	State	HA SQL Status shows the state of HA replication. The metric value is 1 if the SQL thread is running and 0 if not.
HA Replication Lag	replication_lag	Seconds	Replication lag is the number of seconds the standby is behind in replaying the transactions received at the primary server.

Limitations

Keep the following considerations in mind when you use high availability:

You can configure zone-redundant high availability only during server creation.
The burstable compute tier doesn't support high availability.
Restarting the primary database server to apply static parameter changes also restarts the standby replica.
The solution turns on GTID mode because it uses GTID. Check whether your workload has restrictions or limitations on replication with GTIDs.

Note

Storage autogrow is enabled by default for a high-availability configured server and can't be disabled.

Known problems

Azure Database for MySQL Flexible Server uses native MySQL replication at the backend. A known problem exists in the MySQL Community Edition 8.0 and greater that can break replication when performing a multitable DELETE operation that relies on foreign key constraints with ON DELETE CASCADE. This problem is tracked as MySQL Bug 102586. As a result, when you enable high availability on Azure Database for MySQL Flexible Server, avoid using cascaded deletes with foreign keys, as this pattern can lead to replication failures and might affect the availability of the server.

Health Check

When you configure high availability (HA) for Azure Database for MySQL, Health Check plays a crucial role in maintaining the reliability and performance of your database. These checks continuously monitor the status and health of both the primary and standby replicas, ensuring that they detect any problems promptly. By tracking various metrics such as server responsiveness, replication lag, and resource utilization, Health Check help ensure that failover processes can be executed seamlessly, minimizing downtime and preventing data loss. Properly configured Health Check are essential for achieving the desired level of availability and resilience in your database setup.

Monitor health

You can monitor the health of your HA setup through the Azure portal. Key metrics to observe include:

Server responsiveness: Indicates whether the primary server is reachable.
Replication lag: Measures the delay between the primary and standby replicas, ensuring data consistency.
Resource utilization: Monitors CPU, memory, and storage usage to prevent bottlenecks.

Reliability and resilience

For a comprehensive overview of reliability in Azure Database for MySQL, including transient fault handling, availability zone resilience, cross-region disaster recovery with read replicas, backup and restore, and service maintenance, see Reliability in Azure Database for MySQL.

คำติชม

หน้านี้มีประโยชน์หรือไม่

Last updated on 2026-04-22

High availability in Azure Database for MySQL

Zone-redundant high-availability (HA) architecture

Local-redundant high-availability (HA) architecture

Migrate from an existing server to a zone-redundant server

Failover process

Planned: Forced failover

Unplanned: Automatic failover

How automatic failover detection works in HA enabled servers

Monitor high availability

Limitations

Known problems

Health Check

Monitor health

Reliability and resilience

Related content

คำติชม

แหล่งทรัพยากรเพิ่มเติม