Configure a Pacemaker cluster for SQL Server availability groups
Applies to: SQL Server - Linux
This article describes how to create a three-node cluster on Linux using Pacemaker, and add a previously created availability group as a resource in the cluster. For high availability, an availability group on Linux requires three nodes - see High availability and data protection for availability group configurations.
This article contains references to the term slave, a term Microsoft considers offensive when used in this context. The term appears in this article because it currently appears in the software. When the term is removed from the software, we will remove it from the article.
SQL Server isn't as tightly integrated with Pacemaker on Linux as it is with Windows Server failover clustering (WSFC). A SQL Server instance isn't aware of the cluster, and all orchestration is from the outside in. Pacemaker provides cluster resource orchestration. Also, the virtual network name is specific to Windows Server failover clustering; there's no equivalent in Pacemaker. Availability group dynamic management views (DMVs) that query cluster information return empty rows on Pacemaker clusters. To create a listener for transparent reconnection after failover, manually register the listener name in DNS with the IP used to create the virtual IP resource.
You can still create a listener for transparent reconnection after failover, but you have to manually register the listener name in the DNS server with the IP used to create the virtual IP resource (as explained in the following sections).
The following sections walk through the steps to set up a Pacemaker cluster and add an availability group as resource in the cluster for high availability, for each supported Linux distribution.
Access to Red Hat full documentation requires a valid subscription.
For more information on cluster configuration, resource agents options, and management, visit RHEL reference documentation.
The steps to create an availability group on Linux servers for high availability are different from the steps on a Windows Server failover cluster. The following list describes the high-level steps:
Configure a cluster resource manager, like Pacemaker. These instructions are in this article.
The way to configure a cluster resource manager depends on the specific Linux distribution.
Production environments require a fencing agent for high availability. The demonstrations in this documentation don't use fencing agents. The demonstrations are for testing and validation only. A Linux cluster uses fencing to return the cluster to a known state. The way to configure fencing depends on the distribution and the environment. Currently, fencing isn't available in some cloud environments. For more information, see Support Policies for RHEL High Availability Clusters - Virtualization Platforms.
Configure high availability for RHEL
To configure high availability for RHEL, enable the high availability subscription and then configure Pacemaker.
Enable the high availability subscription for RHEL
Each node in the cluster must have an appropriate subscription for RHEL and the High Availability Add on. Review the requirements at How to install High Availability cluster packages in Red Hat Enterprise Linux. Follow these steps to configure the subscription and repos:
Register the system.
sudo subscription-manager register
Provide your user name and password.
List the available pools for registration.
sudo subscription-manager list --available
From the list of available pools, note the pool ID for the high availability subscription.
Update the following script. Replace
<pool id>with the pool ID for high availability from the preceding step. Run the script to attach the subscription.
sudo subscription-manager attach --pool=<pool id>
Enable the repository.
sudo subscription-manager repos --enable=rhel-ha-for-rhel-7-server-rpms
sudo subscription-manager repos --enable=rhel-8-for-x86_64-highavailability-rpms
For more information, see Pacemaker - The Open Source, High Availability Cluster.
After you have configured the subscription, complete the following steps to configure Pacemaker:
After you register the subscription, complete the following steps to configure Pacemaker:
On all cluster nodes, open the Pacemaker firewall ports. To open these ports with
firewalld, run the following command:
sudo firewall-cmd --permanent --add-service=high-availability sudo firewall-cmd --reload
If the firewall doesn't have a built-in high-availability configuration, open the following ports for Pacemaker.
- TCP: Ports 2224, 3121, 21064
- UDP: Port 5405
Install Pacemaker packages on all nodes.
sudo yum install pacemaker pcs fence-agents-all resource-agents
Set the password for the default user that is created when installing Pacemaker and Corosync packages. Use the same password on all nodes.
sudo passwd hacluster
To allow nodes to rejoin the cluster after the restart, enable and start
pcsdservice and Pacemaker. Run the following command on all nodes.
sudo systemctl enable pcsd sudo systemctl start pcsd sudo systemctl enable pacemaker
Create the Cluster. To create the cluster, run the following command:
sudo pcs cluster auth <node1> <node2> <node3> -u hacluster -p <password for hacluster> sudo pcs cluster setup --name <clusterName> <node1> <node2> <node3> sudo pcs cluster start --all sudo pcs cluster enable --all
For RHEL 8, you will need to authenticate the nodes separately. Manually enter in the Username and Password for hacluster when prompted.
sudo pcs host auth <node1> <node2> <node3> sudo pcs cluster setup <clusterName> <node1> <node2> <node3> sudo pcs cluster start --all sudo pcs cluster enable --all
If you previously configured a cluster on the same nodes, you need to use
--forceoption when running
pcs cluster setup. This option is equivalent to running
pcs cluster destroy. To re-enable Pacemaker, run
sudo systemctl enable pacemaker.
Install SQL Server resource agent for SQL Server. Run the following commands on all nodes.
sudo yum install mssql-server-ha
After Pacemaker is configured, use
pcs to interact with the cluster. Execute all commands on one node from the cluster.
Considerations for multiple network interfaces (NICs)
When setting up high availability with servers that have multiple NICs, follow these suggestions:
Make sure the
hostsfile is set up so that the server IP addresses for the multiple NICs resolve to the hostname of the Linux server on each node.
When setting up the cluster using Pacemaker, using the hostname of the servers should configure Corosync to set the configuration for all of the NICs. We only want the Pacemaker/Corosync communication over a single NIC. Once the Pacemaker cluster is configured, modify the configuration in the
corosync.conffile, and update the IP address for the dedicated NIC you want to use for the Pacemaker/Corosync communication.
<hostname>given in the
corosync.conffile should be the same as the output given when doing a reverse lookup (
ping -a <ip_address>), and should be the short name configured on the host. Make sure the
hostsfile also represents the proper IP address to name resolution.
The changes to the
corosync.conf file example are highlighted below:
Configure a fencing device
Pacemaker cluster vendors require fencing a failed node, using a fencing device configured for a supported cluster setup. When the cluster resource manager can't determine the state of a node or of a resource on a node, fencing brings the cluster to a known state again.
A fencing device provides a fencing agent. Setting up Pacemaker on Red Hat Enterprise Linux in Azure provides an example of how to create a fencing device for this cluster in Azure. Modify the instructions for your environment.
Resource level fencing ensures that there's no data corruption in an outage by configuring a resource. For example, you can use resource level fencing to mark the disk on a node as outdated when the communication link goes down.
Node level fencing ensures that a node doesn't run any resources. This is done by resetting the node. Pacemaker supports a great variety of fencing devices. Examples include an uninterruptible power supply or management interface cards for servers.
For information about fencing a failed node, see the following articles:
- Pacemaker Clusters from Scratch
- Fencing and STONITH
- Red Hat High Availability Add-On with Pacemaker: Fencing
Because the node level fencing configuration depends heavily on your environment, disable it for this tutorial (it can be configured later). The following script disables node level fencing:
sudo pcs property set stonith-enabled=false
Disabling fencing is just for testing purposes. If you plan to use Pacemaker in a production environment, you should plan a fencing implementation depending on your environment and keep it enabled.
Set cluster property cluster-recheck-interval
cluster-recheck-interval indicates the polling interval at which the cluster checks for changes in the resource parameters, constraints, or other cluster options. If a replica goes down, the cluster tries to restart the replica at an interval that is bound by the
failure-timeout value and the
cluster-recheck-interval value. For example, if
failure-timeout is set to 60 seconds and
cluster-recheck-interval is set to 120 seconds, the restart is tried at an interval that is greater than 60 seconds but less than 120 seconds. We recommend that you set failure-timeout to 60 seconds and
cluster-recheck-interval to a value that is greater than 60 seconds. Setting
cluster-recheck-interval to a small value isn't recommended.
To update the property value to
2 minutes run:
sudo pcs property set cluster-recheck-interval=2min
If you already have an availability group resource managed by a Pacemaker cluster, Pacemaker package 1.1.18-11.el7 introduced a behavior change for the
start-failure-is-fatal cluster setting when its value is
false. This change affects the failover workflow. If a primary replica experiences an outage, the cluster is expected to fail over to one of the available secondary replicas. Instead, users notice that the cluster keeps trying to start the failed primary replica. If that primary never comes online (because of a permanent outage), the cluster never fails over to another available secondary replica. Because of this change, a previously recommended configuration to set
start-failure-is-fatal is no longer valid, and the setting needs to be reverted back to its default value of
Additionally, the AG resource needs to be updated to include the
To update the property value to
sudo pcs property set start-failure-is-fatal=true
To update the
ag_cluster resource property
pcs resource update ag_cluster meta failure-timeout=60s
For information on Pacemaker cluster properties, see Pacemaker Clusters Properties.
Create a SQL Server login for Pacemaker
On all SQL Server instances, create a Server login for Pacemaker. The following Transact-SQL creates a login:
USE [master] GO CREATE LOGIN [pacemakerLogin] with PASSWORD= N'ComplexP@$$w0rd!'; ALTER SERVER ROLE [sysadmin] ADD MEMBER [pacemakerLogin];
At the time of availability group creation, the pacemaker user will require ALTER, CONTROL and VIEW DEFINITION permissions on the availability group, after it's created but before any nodes are added to it.
On all SQL Server instances, save the credentials for the SQL Server login.
echo 'pacemakerLogin' >> ~/pacemaker-passwd echo 'ComplexP@$$w0rd!' >> ~/pacemaker-passwd sudo mv ~/pacemaker-passwd /var/opt/mssql/secrets/passwd sudo chown root:root /var/opt/mssql/secrets/passwd sudo chmod 400 /var/opt/mssql/secrets/passwd # Only readable by root
Create availability group resource
To create the availability group resource, use
pcs resource create command and set the resource properties. The following command creates a
ocf:mssql:ag master/subordinate type resource for availability group with name
sudo pcs resource create ag_cluster ocf:mssql:ag ag_name=ag1 meta failure-timeout=60s master notify=true
With the availability of RHEL 8, the create syntax has changed. If you use RHEL 8, the terminology
master has changed to
promotable. Use the following create command instead of the above command:
sudo pcs resource create ag_cluster ocf:mssql:ag ag_name=ag1 meta failure-timeout=60s promotable notify=true
When you create the resource, and periodically afterwards, the Pacemaker resource agent automatically sets the value of
REQUIRED_SYNCHRONIZED_SECONDARIES_TO_COMMIT on the availability group based on the availability group's configuration. For example, if the availability group has three synchronous replicas, the agent will set
1. For details and additional configuration options, see High availability and data protection for availability group configurations.
Create virtual IP resource
To create the virtual IP address resource, run the following command on one node. Use an available static IP address from the network. Replace the IP address between
<10.128.16.240> with a valid IP address.
sudo pcs resource create virtualip ocf:heartbeat:IPaddr2 ip=<10.128.16.240>
There's no virtual server name equivalent in Pacemaker. To use a connection string that points to a string server name instead of an IP address, register the virtual IP resource address and desired virtual server name in DNS. For DR configurations, register the desired virtual server name and IP address with the DNS servers on both primary and DR site.
Add colocation constraint
Almost every decision in a Pacemaker cluster, like choosing where a resource should run, is done by comparing scores. Scores are calculated per resource. The cluster resource manager chooses the node with the highest score for a particular resource. If a node has a negative score for a resource, the resource can't run on that node.
On a pacemaker cluster, you can manipulate the decisions of the cluster with constraints. Constraints have a score. If a constraint has a score lower than
INFINITY, Pacemaker regards it as recommendation. A score of
INFINITY is mandatory.
To ensure that primary replica and the virtual ip resources run on the same host, define a colocation constraint with a score of INFINITY. To add the colocation constraint, run the following command on one node.
When you create the
ag_cluster resource in RHEL 7, it creates the resource as
ag_cluster-master. Use the following command for RHEL 7:
sudo pcs constraint colocation add virtualip ag_cluster-master INFINITY with-rsc-role=Master
When you create the
ag_cluster resource in RHEL 8, it creates the resource as
ag_cluster-clone. Use the following command for RHEL 8:
sudo pcs constraint colocation add virtualip with master ag_cluster-clone INFINITY with-rsc-role=Master
Add ordering constraint
The colocation constraint has an implicit ordering constraint. It moves the virtual IP resource before it moves the availability group resource. By default the sequence of events is:
pcs resource moveto the availability group primary from node1 to node2.
The virtual IP resource stops on node 1.
The virtual IP resource starts on node 2.
At this point, the IP address temporarily points to node 2 while node 2 is still a pre-failover secondary.
The availability group primary on node 1 is demoted to secondary.
The availability group secondary on node 2 is promoted to primary.
To prevent the IP address from temporarily pointing to the node with the pre-failover secondary, add an ordering constraint.
To add an ordering constraint, run the following command on one node:
sudo pcs constraint order promote ag_cluster-master then start virtualip
sudo pcs constraint order promote ag_cluster-clone then start virtualip
After you configure the cluster and add the availability group as a cluster resource, you can't use Transact-SQL to fail over the availability group resources. SQL Server cluster resources on Linux aren't coupled as tightly with the operating system as they are on a Windows Server Failover Cluster (WSFC). SQL Server service isn't aware of the presence of the cluster. All orchestration is done through the cluster management tools. In RHEL or Ubuntu use
pcs and in SLES use
Manually fail over the availability group with
pcs. Don't initiate failover with Transact-SQL. For instructions, see Failover.