Configure failover cluster instance - SQL Server on Linux (RHEL)
Applies to: SQL Server - Linux
A SQL Server two-node shared disk failover cluster instance provides server-level redundancy for high availability. In this tutorial, you learn how to create a two-node failover cluster instance of SQL Server on Linux. The specific steps that you'll complete include:
- Set up and configure Linux
- Install and configure SQL Server
- Configure the hosts file
- Configure shared storage and move the database files
- Install and configure Pacemaker on each cluster node
- Configure the failover cluster instance
This article explains how to create a two-node shared disk failover cluster instance (FCI) for SQL Server. The article includes instructions and script examples for Red Hat Enterprise Linux (RHEL). Ubuntu distributions are similar to RHEL so the script examples will normally also work on Ubuntu.
For conceptual information, see Failover Cluster Instances - SQL Server on Linux.
To complete the following end-to-end scenario, you need two machines to deploy the two nodes cluster and another server for storage. Below steps outline how these servers will be configured.
The first step is to configure the operating system on the cluster nodes. On each node in the cluster, configure a linux distribution. Use the same distribution and version on both nodes. Use either one or the other of the following distributions:
- RHEL with a valid subscription for the HA add-on
Install and set up SQL Server on both nodes. For detailed instructions, see Installation guidance for SQL Server on Linux.
Designate one node as primary and the other as secondary, for purposes of configuration. Use these terms for the following this guide.
On the secondary node, stop and disable SQL Server. The following example stops and disables SQL Server:
sudo systemctl stop mssql-server sudo systemctl disable mssql-server
Note
At set up time, a Server Master Key is generated for the SQL Server instance and placed at
var/opt/mssql/secrets/machine-key
. On Linux, SQL Server, always runs as a local account calledmssql
. Because it's a local account, its identity isn't shared across nodes. Therefore, you need to copy the encryption key from primary node to each secondary node so each localmssql
account can access it to decrypt the Server Master Key.On the primary node, create a SQL Server login for Pacemaker and grant the login permission to run
sp_server_diagnostics
. Pacemaker uses this account to verify which node is running SQL Server.sudo systemctl start mssql-server
Connect to the SQL Server
master
database with thesa
account and run the following:USE [master]; GO CREATE LOGIN [<loginName>] WITH PASSWORD = N'<password>'; ALTER SERVER ROLE [sysadmin] ADD MEMBER [<loginName>];
Caution
Your password should follow the SQL Server default password policy. By default, the password must be at least eight characters long and contain characters from three of the following four sets: uppercase letters, lowercase letters, base-10 digits, and symbols. Passwords can be up to 128 characters long. Use passwords that are as long and complex as possible.
Alternatively, you can set the permissions at a more granular level. The Pacemaker login requires
VIEW SERVER STATE
to query health status withsp_server_diagnostics
, setupadmin, and ALTER ANY LINKED SERVER to update the FCI instance name with the resource name, by runningsp_dropserver
andsp_addserver
.On the primary node, stop and disable SQL Server.
On each cluster node, configure the hosts file. The hosts file must include the IP address and name of every cluster node.
Check the IP address for each node. The following script shows the IP address of your current node.
sudo ip addr show
Set the computer name on each node. Give each node a unique name that is 15 characters or less. Set the computer name by adding it to
/etc/hosts
. The following script lets you edit/etc/hosts
withvi
.sudo vi /etc/hosts
The following example shows
/etc/hosts
with additions for two nodes namedsqlfcivm1
andsqlfcivm2
.127.0.0.1 localhost localhost4 localhost4.localdomain4 ::1 localhost localhost6 localhost6.localdomain6 10.128.18.128 sqlfcivm1 10.128.16.77 sqlfcivm2
You need to provide storage that both nodes can access. You can use iSCSI, NFS, or SMB. Configure storage, present the storage to the cluster nodes, and then move the database files to the new storage. The following articles explain the steps for each storage type:
- Configure failover cluster instance - iSCSI - SQL Server on Linux
- Configure failover cluster instance - NFS - SQL Server on Linux
- Configure SMB storage failover cluster instance - SQL Server on Linux
On both cluster nodes, create a file to store the SQL Server username and password for the Pacemaker login.
The following command creates and populates this file:
sudo touch /var/opt/mssql/secrets/passwd sudo echo '<loginName>' >> /var/opt/mssql/secrets/passwd sudo echo '<loginPassword>' >> /var/opt/mssql/secrets/passwd sudo chown root:root /var/opt/mssql/secrets/passwd sudo chmod 600 /var/opt/mssql/secrets/passwd
On both cluster nodes, open the Pacemaker firewall ports. To open these ports with
firewalld
, run the following command:sudo firewall-cmd --permanent --add-service=high-availability sudo firewall-cmd --reload
If you're using another firewall that doesn't have a built-in high-availability configuration, the following ports need to be opened for Pacemaker to be able to communicate with other nodes in the cluster:
- TCP: Ports 2224, 3121, 21064
- UDP: Port 5405
Install Pacemaker packages on each node.
sudo yum install pacemaker pcs fence-agents-all resource-agents
Set the password for the default user that is created when installing Pacemaker and Corosync packages. Use the same password on both nodes.
sudo passwd hacluster
Enable and start
pcsd
service and Pacemaker. This will allow nodes to rejoin the cluster after the reboot. Run the following command on both nodes.sudo systemctl enable pcsd sudo systemctl start pcsd sudo systemctl enable pacemaker
Install the FCI resource agent for SQL Server. Run the following commands on both nodes.
sudo yum install mssql-server-ha
The FCI will be created in a resource group. This is slightly easier since the resource group alleviates the need for constraints. However, add the resources into the resource group in the order they should start. The order they should start is:
- Storage resource
- Network resource
- Application resource
This example creates an FCI in the group NewLinFCIGrp. The name of the resource group must be unique from any resource created in Pacemaker.
Create the disk resource. You get no response back if there isn't a problem. The way to create the disk resource depends on the storage type. The following section shows examples for each storage type (iSCSI, NFS, and SMB). Use the example that applies to the storage type for your clustered storage.
sudo pcs resource create <iSCSIDiskResourceName> Filesystem device="/dev/<VolumeGroupName>/<LogicalVolumeName>" directory="<FolderToMountiSCSIDisk>" fstype="<FileSystemType>" --group RGName
<iSCSIDIskResourceName>
is the name of the resource associated with the iSCSI disk<VolumeGroupName>
is the name of the volume group<LogicalVolumeName>
is the name of the logical volume that was created<FolderToMountiSCSIDIsk>
is the folder to mount the disk (for system databases and the default location, it would be/var/opt/mssql/data
)<FileSystemType>
would be EXT4 or XFS, depending on how things were formatted and what the distribution supports.
Create the IP address that will be used by the FCI. You get no response back if there isn't a problem.
sudo pcs resource create <IPResourceName> ocf:heartbeat:IPaddr2 ip=<IPAddress> nic=<NetworkCard> cidr_netmask=<NetMask> --group <RGName>
<IPResourceName>
is the name of the resource associated with the IP address<IPAddress>
is the IP address for the FCI<NetworkCard>
is the network card associated with the subnet (that is, eth0)<NetMask>
is the netmask of the subnet (that is, 24)<RGName>
is the name of the resource group
Create the FCI resource. You get no response back if there isn't a problem.
sudo pcs resource create FCIResourceName ocf:mssql:fci op defaults timeout=60s --group RGName
<FCIResourceName>
isn't only the name of the resource, but the friendly name that is associated with the FCI. This is what users and applications use to connect.<RGName>
is the name of the resource group.
Run the command
sudo pcs resource
. The FCI should be online.Connect to the FCI with SSMS or sqlcmd using the DNS/resource name of the FCI.
Issue the statement
SELECT @@SERVERNAME
. It should return the name of the FCI.Issue the statement
SELECT SERVERPROPERTY('ComputerNamePhysicalNetBIOS')
. It should return the name of the node that the FCI is running on.Manually fail the FCI to the other node(s). See the instructions under Operate failover cluster instance - SQL Server on Linux.
Finally, fail the FCI back to the original node and remove the colocation constraint.
In this tutorial, you completed the following tasks.
- Set up and configure Linux
- Install and configure SQL Server
- Configure the hosts file
- Configure shared storage and move the database files
- Install and configure Pacemaker on each cluster node
- Configure the failover cluster instance