High availability of IBM Db2 LUW on Azure VMs on Red Hat Enterprise Linux Server
IBM Db2 for Linux, UNIX, and Windows (LUW) in high availability and disaster recovery (HADR) configuration consists of one node that runs a primary database instance and at least one node that runs a secondary database instance. Changes to the primary database instance are replicated to a secondary database instance synchronously or asynchronously, depending on your configuration.
Note
This article contains references to terms that Microsoft no longer uses. When these terms are removed from the software, we'll remove them from this article.
This article describes how to deploy and configure the Azure virtual machines (VMs), install the cluster framework, and install the IBM Db2 LUW with HADR configuration.
The article doesn't cover how to install and configure IBM Db2 LUW with HADR or SAP software installation. To help you accomplish these tasks, we provide references to SAP and IBM installation manuals. This article focuses on parts that are specific to the Azure environment.
The supported IBM Db2 versions are 10.5 and later, as documented in SAP note 1928533.
Before you begin an installation, see the following SAP notes and documentation:
SAP note | Description |
---|---|
1928533 | SAP applications on Azure: Supported products and Azure VM types |
2015553 | SAP on Azure: Support prerequisites |
2178632 | Key monitoring metrics for SAP on Azure |
2191498 | SAP on Linux with Azure: Enhanced monitoring |
2243692 | Linux on Azure (IaaS) VM: SAP license issues |
2002167 | Red Hat Enterprise Linux 7.x: Installation and Upgrade |
2694118 | Red Hat Enterprise Linux HA Add-On on Azure |
1999351 | Troubleshooting enhanced Azure monitoring for SAP |
2233094 | DB6: SAP applications on Azure that use IBM Db2 for Linux, UNIX, and Windows - additional information |
1612105 | DB6: FAQ on Db2 with HADR |
Overview
To achieve high availability, IBM Db2 LUW with HADR is installed on at least two Azure virtual machines, which are deployed in an virtual machine scale set with flexible orchestration across availability zones or in an availability set.
The following graphics display a setup of two database server Azure VMs. Both database server Azure VMs have their own storage attached and are up and running. In HADR, one database instance in one of the Azure VMs has the role of the primary instance. All clients are connected to primary instance. All changes in database transactions are persisted locally in the Db2 transaction log. As the transaction log records are persisted locally, the records are transferred via TCP/IP to the database instance on the second database server, the standby server, or standby instance. The standby instance updates the local database by rolling forward the transferred transaction log records. In this way, the standby server is kept in sync with the primary server.
HADR is only a replication functionality. It has no failure detection and no automatic takeover or failover facilities. A takeover or transfer to the standby server must be initiated manually by a database administrator. To achieve an automatic takeover and failure detection, you can use the Linux Pacemaker clustering feature. Pacemaker monitors the two database server instances. When the primary database server instance crashes, Pacemaker initiates an automatic HADR takeover by the standby server. Pacemaker also ensures that the virtual IP address is assigned to the new primary server.
To have SAP application servers connect to primary database, you need a virtual host name and a virtual IP address. After a failover, the SAP application servers connect to new primary database instance. In an Azure environment, an Azure load balancer is required to use a virtual IP address in the way that's required for HADR of IBM Db2.
To help you fully understand how IBM Db2 LUW with HADR and Pacemaker fits into a highly available SAP system setup, the following image presents an overview of a highly available setup of an SAP system based on IBM Db2 database. This article covers only IBM Db2, but it provides references to other articles about how to set up other components of an SAP system.
High-level overview of the required steps
To deploy an IBM Db2 configuration, you need to follow these steps:
- Plan your environment.
- Deploy the VMs.
- Update RHEL Linux and configure file systems.
- Install and configure Pacemaker.
- Setup glusterfs cluster or Azure NetApp Files
- Install ASCS/ERS on a separate cluster.
- Install IBM Db2 database with Distributed/High Availability option (SWPM).
- Install and create a secondary database node and instance, and configure HADR.
- Confirm that HADR is working.
- Apply the Pacemaker configuration to control IBM Db2.
- Configure Azure Load Balancer.
- Install primary and dialog application servers.
- Check and adapt the configuration of SAP application servers.
- Perform failover and takeover tests.
Plan Azure infrastructure for hosting IBM Db2 LUW with HADR
Complete the planning process before you execute the deployment. Planning builds the foundation for deploying a configuration of Db2 with HADR in Azure. Key elements that need to be part of planning for IMB Db2 LUW (database part of SAP environment) are listed in the following table:
Topic | Short description |
---|---|
Define Azure resource groups | Resource groups where you deploy VM, virtual network, Azure Load Balancer, and other resources. Can be existing or new. |
Virtual network / Subnet definition | Where VMs for IBM Db2 and Azure Load Balancer are being deployed. Can be existing or newly created. |
Virtual machines hosting IBM Db2 LUW | VM size, storage, networking, IP address. |
Virtual host name and virtual IP for IBM Db2 database | The virtual IP or host name is used for connection of SAP application servers. db-virt-hostname, db-virt-ip. |
Azure fencing | Method to avoid split brain situations is prevented. |
Azure Load Balancer | Usage of Standard (recommended), probe port for Db2 database (our recommendation 62500) probe-port. |
Name resolution | How name resolution works in the environment. DNS service is highly recommended. Local hosts file can be used. |
For more information about Linux Pacemaker in Azure, see Setting up Pacemaker on Red Hat Enterprise Linux in Azure.
Important
For Db2 versions 11.5.6 and higher we highly recommend Integrated solution using Pacemaker from IBM.
Deployment on Red Hat Enterprise Linux
The resource agent for IBM Db2 LUW is included in Red Hat Enterprise Linux Server HA Addon. For the setup that's described in this document, you should use Red Hat Enterprise Linux for SAP. The Azure Marketplace contains an image for Red Hat Enterprise Linux 7.4 for SAP or higher that you can use to deploy new Azure virtual machines. Be aware of the various support or service models that are offered by Red Hat through the Azure Marketplace when you choose a VM image in the Azure VM Marketplace.
Hosts: DNS updates
Make a list of all host names, including virtual host names, and update your DNS servers to enable proper IP address to host-name resolution. If a DNS server doesn't exist or you can't update and create DNS entries, you need to use the local host files of the individual VMs that are participating in this scenario. If you're using host files entries, make sure that the entries are applied to all VMs in the SAP system environment. However, we recommend that you use your DNS that, ideally, extends into Azure
Manual deployment
Make sure that the selected OS is supported by IBM/SAP for IBM Db2 LUW. The list of supported OS versions for Azure VMs and Db2 releases is available in SAP note 1928533. The list of OS releases by individual Db2 release is available in the SAP Product Availability Matrix. We highly recommend a minimum of Red Hat Enterprise Linux 7.4 for SAP because of Azure-related performance improvements in this or later Red Hat Enterprise Linux versions.
- Create or select a resource group.
- Create or select a virtual network and subnet.
- Choose a suitable deployment type for SAP virtual machines. Typically a virtual machine scale set with flexible orchestration.
- Create Virtual Machine 1.
- Use Red Hat Enterprise Linux for SAP image in the Azure Marketplace.
- Select the scale set, availability zone or availability set created in step 3.
- Create Virtual Machine 2.
- Use Red Hat Enterprise Linux for SAP image in the Azure Marketplace.
- Select the scale set, availability zone or availability set created in step 3 (not the same zone as in step 4).
- Add data disks to the VMs, and then check the recommendation of a file system setup in the article IBM Db2 Azure Virtual Machines DBMS deployment for SAP workload.
Install the IBM Db2 LUW and SAP environment
Before you start the installation of an SAP environment based on IBM Db2 LUW, review the following documentation:
- Azure documentation.
- SAP documentation.
- IBM documentation.
Links to this documentation are provided in the introductory section of this article.
Check the SAP installation manuals about installing NetWeaver-based applications on IBM Db2 LUW. You can find the guides on the SAP Help portal by using the SAP Installation Guide Finder.
You can reduce the number of guides displayed in the portal by setting the following filters:
- I want to: Install a new system.
- My Database: IBM Db2 for Linux, Unix, and Windows.
- Additional filters for SAP NetWeaver versions, stack configuration, or operating system.
Red Hat firewall rules
Red Hat Enterprise Linux has firewall enabled by default.
#Allow access to SWPM tool. Rule is not permanent.
sudo firewall-cmd --add-port=4237/tcp
Installation hints for setting up IBM Db2 LUW with HADR
To set up the primary IBM Db2 LUW database instance:
- Use the high availability or distributed option.
- Install the SAP ASCS/ERS and Database instance.
- Take a backup of the newly installed database.
Important
Write down the "Database Communication port" that's set during installation. It must be the same port number for both database instances.
IBM Db2 HADR settings for Azure
When you use an Azure Pacemaker fencing agent, set the following parameters:
- HADR peer window duration (seconds) (HADR_PEER_WINDOW) = 240
- HADR timeout value (HADR_TIMEOUT) = 45
We recommend the preceding parameters based on initial failover/takeover testing. It's mandatory that you test for proper functionality of failover and takeover with these parameter settings. Because individual configurations can vary, the parameters might require adjustment.
Note
Specific to IBM Db2 with HADR configuration with normal startup: The secondary or standby database instance must be up and running before you can start the primary database instance.
Note
For installation and configuration that's specific to Azure and Pacemaker: During the installation procedure through SAP Software Provisioning Manager, there is an explicit question about high availability for IBM Db2 LUW:
- Do not select IBM Db2 pureScale.
- Do not select Install IBM Tivoli System Automation for Multiplatforms.
- Do not select Generate cluster configuration files.
To set up the Standby database server by using the SAP homogeneous system copy procedure, execute these steps:
- Select the System copy option > Target systems > Distributed > Database instance.
- As a copy method, select Homogeneous System so that you can use backup to restore a backup on the standby server instance.
- When you reach the exit step to restore the database for homogeneous system copy, exit the installer. Restore the database from a backup of the primary host. All subsequent installation phases have already been executed on the primary database server.
Red Hat firewall rules for DB2 HADR
Add firewall rules to allow traffic to DB2 and between DB2 for HADR to work:
- Database communication port. If using partitions, add those ports too.
- HADR port (value of DB2 parameter HADR_LOCAL_SVC).
- Azure probe port.
sudo firewall-cmd --add-port=<port>/tcp --permanent
sudo firewall-cmd --reload
IBM Db2 HADR check
For demonstration purposes and the procedures described in this article, the database SID is ID2.
After you've configured HADR and the status is PEER and CONNECTED on the primary and standby nodes, perform the following check:
Execute command as db2<sid> db2pd -hadr -db <SID>
#Primary output:
Database Member 0 -- Database ID2 -- Active -- Up 1 days 15:45:23 -- Date 2019-06-25-10.55.25.349375
HADR_ROLE = PRIMARY
REPLAY_TYPE = PHYSICAL
HADR_SYNCMODE = NEARSYNC
STANDBY_ID = 1
LOG_STREAM_ID = 0
HADR_STATE = PEER
HADR_FLAGS =
PRIMARY_MEMBER_HOST = az-idb01
PRIMARY_INSTANCE = db2id2
PRIMARY_MEMBER = 0
STANDBY_MEMBER_HOST = az-idb02
STANDBY_INSTANCE = db2id2
STANDBY_MEMBER = 0
HADR_CONNECT_STATUS = CONNECTED
HADR_CONNECT_STATUS_TIME = 06/25/2019 10:55:05.076494 (1561460105)
HEARTBEAT_INTERVAL(seconds) = 7
HEARTBEAT_MISSED = 5
HEARTBEAT_EXPECTED = 52
HADR_TIMEOUT(seconds) = 30
TIME_SINCE_LAST_RECV(seconds) = 5
PEER_WAIT_LIMIT(seconds) = 0
LOG_HADR_WAIT_CUR(seconds) = 0.000
LOG_HADR_WAIT_RECENT_AVG(seconds) = 598.000027
LOG_HADR_WAIT_ACCUMULATED(seconds) = 598.000
LOG_HADR_WAIT_COUNT = 1
SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 46080
SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 369280
PRIMARY_LOG_FILE,PAGE,POS = S0000012.LOG, 14151, 3685322855
STANDBY_LOG_FILE,PAGE,POS = S0000012.LOG, 14151, 3685322855
HADR_LOG_GAP(bytes) = 132242668
STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000012.LOG, 14151, 3685322855
STANDBY_RECV_REPLAY_GAP(bytes) = 0
PRIMARY_LOG_TIME = 06/25/2019 10:45:42.000000 (1561459542)
STANDBY_LOG_TIME = 06/25/2019 10:45:42.000000 (1561459542)
STANDBY_REPLAY_LOG_TIME = 06/25/2019 10:45:42.000000 (1561459542)
STANDBY_RECV_BUF_SIZE(pages) = 2048
STANDBY_RECV_BUF_PERCENT = 0
STANDBY_SPOOL_LIMIT(pages) = 1000
STANDBY_SPOOL_PERCENT = 0
STANDBY_ERROR_TIME = NULL
PEER_WINDOW(seconds) = 300
PEER_WINDOW_END = 06/25/2019 11:12:03.000000 (1561461123)
READS_ON_STANDBY_ENABLED = N
#Secondary output:
Database Member 0 -- Database ID2 -- Standby -- Up 1 days 15:45:18 -- Date 2019-06-25-10.56.19.820474
HADR_ROLE = STANDBY
REPLAY_TYPE = PHYSICAL
HADR_SYNCMODE = NEARSYNC
STANDBY_ID = 0
LOG_STREAM_ID = 0
HADR_STATE = PEER
HADR_FLAGS =
PRIMARY_MEMBER_HOST = az-idb01
PRIMARY_INSTANCE = db2id2
PRIMARY_MEMBER = 0
STANDBY_MEMBER_HOST = az-idb02
STANDBY_INSTANCE = db2id2
STANDBY_MEMBER = 0
HADR_CONNECT_STATUS = CONNECTED
HADR_CONNECT_STATUS_TIME = 06/25/2019 10:55:05.078116 (1561460105)
HEARTBEAT_INTERVAL(seconds) = 7
HEARTBEAT_MISSED = 0
HEARTBEAT_EXPECTED = 10
HADR_TIMEOUT(seconds) = 30
TIME_SINCE_LAST_RECV(seconds) = 1
PEER_WAIT_LIMIT(seconds) = 0
LOG_HADR_WAIT_CUR(seconds) = 0.000
LOG_HADR_WAIT_RECENT_AVG(seconds) = 598.000027
LOG_HADR_WAIT_ACCUMULATED(seconds) = 598.000
LOG_HADR_WAIT_COUNT = 1
SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 46080
SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 367360
PRIMARY_LOG_FILE,PAGE,POS = S0000012.LOG, 14151, 3685322855
STANDBY_LOG_FILE,PAGE,POS = S0000012.LOG, 14151, 3685322855
HADR_LOG_GAP(bytes) = 0
STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000012.LOG, 14151, 3685322855
STANDBY_RECV_REPLAY_GAP(bytes) = 0
PRIMARY_LOG_TIME = 06/25/2019 10:45:42.000000 (1561459542)
STANDBY_LOG_TIME = 06/25/2019 10:45:42.000000 (1561459542)
STANDBY_REPLAY_LOG_TIME = 06/25/2019 10:45:42.000000 (1561459542)
STANDBY_RECV_BUF_SIZE(pages) = 2048
STANDBY_RECV_BUF_PERCENT = 0
STANDBY_SPOOL_LIMIT(pages) = 1000
STANDBY_SPOOL_PERCENT = 0
STANDBY_ERROR_TIME = NULL
PEER_WINDOW(seconds) = 1000
PEER_WINDOW_END = 06/25/2019 11:12:59.000000 (1561461179)
READS_ON_STANDBY_ENABLED = N
Configure Azure Load Balancer
During VM configuration, you have an option to create or select exiting load balancer in networking section. Follow below steps, to set up standard load balancer for high availability setup of DB2 database.
Follow the steps in Create load balancer to set up a standard load balancer for a high-availability SAP system by using the Azure portal. During the setup of the load balancer, consider the following points:
- Frontend IP Configuration: Create a front-end IP. Select the same virtual network and subnet name as your database virtual machines.
- Backend Pool: Create a back-end pool and add database VMs.
- Inbound rules: Create a load-balancing rule. Follow the same steps for both load-balancing rules.
- Frontend IP address: Select a front-end IP.
- Backend pool: Select a back-end pool.
- High-availability ports: Select this option.
- Protocol: Select TCP.
- Health Probe: Create a health probe with the following details:
- Protocol: Select TCP.
- Port: For example, 625<instance-no.>.
- Interval: Enter 5.
- Probe Threshold: Enter 2.
- Idle timeout (minutes): Enter 30.
- Enable Floating IP: Select this option.
Note
The health probe configuration property numberOfProbes
, otherwise known as Unhealthy threshold in the portal, isn't respected. To control the number of successful or failed consecutive probes, set the property probeThreshold
to 2
. It's currently not possible to set this property by using the Azure portal, so use either the Azure CLI or the PowerShell command.
Note
When VMs without public IP addresses are placed in the back-end pool of an internal (no public IP address) instance of Standard Azure Load Balancer, there's no outbound internet connectivity unless more configuration is performed to allow routing to public endpoints. For more information on how to achieve outbound connectivity, see Public endpoint connectivity for VMs using Azure Standard Load Balancer in SAP high-availability scenarios.
Important
Don't enable TCP timestamps on Azure VMs placed behind Azure Load Balancer. Enabling TCP timestamps could cause the health probes to fail. Set the parameter net.ipv4.tcp_timestamps
to 0
. For more information, see Load Balancer health probes.
[A] Add firewall rule for probe port:
sudo firewall-cmd --add-port=<probe-port>/tcp --permanent
sudo firewall-cmd --reload
Create the Pacemaker cluster
To create a basic Pacemaker cluster for this IBM Db2 server, see Setting up Pacemaker on Red Hat Enterprise Linux in Azure.
Db2 Pacemaker configuration
When you use Pacemaker for automatic failover in the event of a node failure, you need to configure your Db2 instances and Pacemaker accordingly. This section describes this type of configuration.
The following items are prefixed with either:
- [A]: Applicable to all nodes
- [1]: Applicable only to node 1
- [2]: Applicable only to node 2
[A] Prerequisite for Pacemaker configuration:
Shut down both database servers with user db2<sid> with db2stop.
Change the shell environment for db2<sid> user to /bin/ksh:
# Install korn shell: sudo yum install ksh # Change users shell: sudo usermod -s /bin/ksh db2<sid>
Pacemaker configuration
[1] IBM Db2 HADR-specific Pacemaker configuration:
# Put Pacemaker into maintenance mode sudo pcs property set maintenance-mode=true
[1] Create IBM Db2 resources:
If building a cluster on RHEL 7.x, make sure to update package resource-agents to version
resource-agents-4.1.1-61.el7_9.15
or higher. Use the following commands to create the cluster resources:# Replace bold strings with your instance name db2sid, database SID, and virtual IP address/Azure Load Balancer. sudo pcs resource create Db2_HADR_ID2 db2 instance='db2id2' dblist='ID2' master meta notify=true resource-stickiness=5000 #Configure resource stickiness and correct cluster notifications for master resoruce sudo pcs resource update Db2_HADR_ID2-master meta notify=true resource-stickiness=5000 # Configure virtual IP - same as Azure Load Balancer IP sudo pcs resource create vip_db2id2_ID2 IPaddr2 ip='10.100.0.40' # Configure probe port for Azure load Balancer sudo pcs resource create nc_db2id2_ID2 azure-lb port=62500 #Create a group for ip and Azure loadbalancer probe port sudo pcs resource group add g_ipnc_db2id2_ID2 vip_db2id2_ID2 nc_db2id2_ID2 #Create colocation constrain - keep Db2 HADR Master and Group on same node sudo pcs constraint colocation add g_ipnc_db2id2_ID2 with master Db2_HADR_ID2-master #Create start order constrain sudo pcs constraint order promote Db2_HADR_ID2-master then g_ipnc_db2id2_ID2
If building a cluster on RHEL 8.x, make sure to update package resource-agents to version
resource-agents-4.1.1-93.el8
or higher. For details see Red Hat KBAdb2
resource with HADR fails promote with statePRIMARY/REMOTE_CATCHUP_PENDING/CONNECTED
. Use the following commands to create the cluster resources:# Replace bold strings with your instance name db2sid, database SID, and virtual IP address/Azure Load Balancer. sudo pcs resource create Db2_HADR_ID2 db2 instance='db2id2' dblist='ID2' promotable meta notify=true resource-stickiness=5000 #Configure resource stickiness and correct cluster notifications for master resoruce sudo pcs resource update Db2_HADR_ID2-clone meta notify=true resource-stickiness=5000 # Configure virtual IP - same as Azure Load Balancer IP sudo pcs resource create vip_db2id2_ID2 IPaddr2 ip='10.100.0.40' # Configure probe port for Azure load Balancer sudo pcs resource create nc_db2id2_ID2 azure-lb port=62500 #Create a group for ip and Azure loadbalancer probe port sudo pcs resource group add g_ipnc_db2id2_ID2 vip_db2id2_ID2 nc_db2id2_ID2 #Create colocation constrain - keep Db2 HADR Master and Group on same node sudo pcs constraint colocation add g_ipnc_db2id2_ID2 with master Db2_HADR_ID2-clone #Create start order constrain sudo pcs constraint order promote Db2_HADR_ID2-clone then g_ipnc_db2id2_ID2
[1] Start IBM Db2 resources:
Put Pacemaker out of maintenance mode.
# Put Pacemaker out of maintenance-mode - that start IBM Db2 sudo pcs property set maintenance-mode=false
[1] Make sure that the cluster status is OK and that all of the resources are started. It's not important which node the resources are running on.
sudo pcs status 2 nodes configured 5 resources configured Online: [ az-idb01 az-idb02 ] Full list of resources: rsc_st_azure (stonith:fence_azure_arm): Started az-idb01 Master/Slave Set: Db2_HADR_ID2-master [Db2_HADR_ID2] Masters: [ az-idb01 ] Slaves: [ az-idb02 ] Resource Group: g_ipnc_db2id2_ID2 vip_db2id2_ID2 (ocf::heartbeat:IPaddr2): Started az-idb01 nc_db2id2_ID2 (ocf::heartbeat:azure-lb): Started az-idb01 Daemon Status: corosync: active/disabled pacemaker: active/disabled pcsd: active/enabled
Important
You must manage the Pacemaker clustered Db2 instance by using Pacemaker tools. If you use db2 commands such as db2stop, Pacemaker detects the action as a failure of resource. If you're performing maintenance, you can put the nodes or resources in maintenance mode. Pacemaker suspends monitoring resources, and you can then use normal db2 administration commands.
Make changes to SAP profiles to use virtual IP for connection
To connect to the primary instance of the HADR configuration, the SAP application layer needs to use the virtual IP address that you defined and configured for the Azure Load Balancer. The following changes are required:
/sapmnt/<SID>/profile/DEFAULT.PFL
SAPDBHOST = db-virt-hostname
j2ee/dbhost = db-virt-hostname
/sapmnt/<SID>/global/db6/db2cli.ini
Hostname=db-virt-hostname
Install primary and dialog application servers
When you install primary and dialog application servers against a Db2 HADR configuration, use the virtual host name that you picked for the configuration.
If you performed the installation before you created the Db2 HADR configuration, make the changes as described in the preceding section and as follows for SAP Java stacks.
ABAP+Java or Java stack systems JDBC URL check
Use the J2EE Config tool to check or update the JDBC URL. Because the J2EE Config tool is a graphical tool, you need to have X server installed:
Sign in to the primary application server of the J2EE instance and execute:
sudo /usr/sap/*SID*/*Instance*/j2ee/configtool/configtool.sh
In the left frame, choose security store.
In the right frame, choose the key
jdbc/pool/\<SAPSID>/url
.Change the host name in the JDBC URL to the virtual host name.
jdbc:db2://db-virt-hostname:5912/TSP:deferPrepares=0
Select Add.
To save your changes, select the disk icon at the upper left.
Close the configuration tool.
Restart the Java instance.
Configure log archiving for HADR setup
To configure the Db2 log archiving for HADR setup, we recommend that you configure both the primary and the standby database to have automatic log retrieval capability from all log archive locations. Both the primary and standby database must be able to retrieve log archive files from all the log archive locations to which either one of the database instances might archive log files.
The log archiving is performed only by the primary database. If you change the HADR roles of the database servers or if a failure occurs, the new primary database is responsible for log archiving. If you've set up multiple log archive locations, your logs might be archived twice. In the event of a local or remote catch-up, you might also have to manually copy the archived logs from the old primary server to the active log location of the new primary server.
We recommend configuring a common NFS share or GlusterFS, where logs are written from both nodes. The NFS share or GlusterFS has to be highly available.
You can use existing highly available NFS shares or GlusterFS for transports or a profile directory. For more information, see:
- GlusterFS on Azure VMs on Red Hat Enterprise Linux for SAP NetWeaver.
- High availability for SAP NetWeaver on Azure VMs on Red Hat Enterprise Linux with Azure NetApp Files for SAP Applications.
- Azure NetApp Files (to create NFS shares).
Test the cluster setup
This section describes how you can test your Db2 HADR setup. Every test assumes IBM Db2 primary is running on the az-idb01 virtual machine. User with sudo privileges or root (not recommended) must be used.
The initial status for all test cases is explained here: (crm_mon -r or pcs status)
- pcs status is a snapshot of Pacemaker status at execution time.
- crm_mon -r is continuous output of Pacemaker status.
2 nodes configured
5 resources configured
Online: [ az-idb01 az-idb02 ]
Full list of resources:
rsc_st_azure (stonith:fence_azure_arm): Started az-idb01
Master/Slave Set: Db2_HADR_ID2-master [Db2_HADR_ID2]
Masters: [ az-idb01 ]
Slaves: [ az-idb02 ]
Resource Group: g_ipnc_db2id2_ID2
vip_db2id2_ID2 (ocf::heartbeat:IPaddr2): Started az-idb01
nc_db2id2_ID2 (ocf::heartbeat:azure-lb): Started az-idb01
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
The original status in an SAP system is documented in Transaction DBACOCKPIT > Configuration > Overview, as shown in the following image:
Test takeover of IBM Db2
Important
Before you start the test, make sure that:
Pacemaker doesn't have any failed actions (pcs status).
There are no location constraints (leftovers of migration test).
The IBM Db2 HADR synchronization is working. Check with user db2<sid>.
db2pd -hadr -db <DBSID>
Migrate the node that's running the primary Db2 database by executing following command:
# On RHEL 7.x
sudo pcs resource move Db2_HADR_ID2-master
# On RHEL 8.x
sudo pcs resource move Db2_HADR_ID2-clone --master
After the migration is done, the crm status output looks like:
2 nodes configured
5 resources configured
Online: [ az-idb01 az-idb02 ]
Full list of resources:
rsc_st_azure (stonith:fence_azure_arm): Started az-idb01
Master/Slave Set: Db2_HADR_ID2-master [Db2_HADR_ID2]
Masters: [ az-idb02 ]
Stopped: [ az-idb01 ]
Resource Group: g_ipnc_db2id2_ID2
vip_db2id2_ID2 (ocf::heartbeat:IPaddr2): Started az-idb02
nc_db2id2_ID2 (ocf::heartbeat:azure-lb): Started az-idb02
The original status in an SAP system is documented in Transaction DBACOCKPIT > Configuration > Overview, as shown in the following image:
Resource migration with "pcs resource move" creates location constraints. Location constraints in this case are preventing running IBM Db2 instance on az-idb01. If location constraints aren't deleted, the resource can't fail back.
Remove the location constrain and standby node would be started on az-idb01.
# On RHEL 7.x
sudo pcs resource clear Db2_HADR_ID2-master
# On RHEL 8.x
sudo pcs resource clear Db2_HADR_ID2-clone
And cluster status changes to:
2 nodes configured
5 resources configured
Online: [ az-idb01 az-idb02 ]
Full list of resources:
rsc_st_azure (stonith:fence_azure_arm): Started az-idb01
Master/Slave Set: Db2_HADR_ID2-master [Db2_HADR_ID2]
Masters: [ az-idb02 ]
Slaves: [ az-idb01 ]
Resource Group: g_ipnc_db2id2_ID2
vip_db2id2_ID2 (ocf::heartbeat:IPaddr2): Started az-idb02
nc_db2id2_ID2 (ocf::heartbeat:azure-lb): Started az-idb02
Migrate the resource back to az-idb01 and clear the location constraints
# On RHEL 7.x
sudo pcs resource move Db2_HADR_ID2-master az-idb01
sudo pcs resource clear Db2_HADR_ID2-master
# On RHEL 8.x
sudo pcs resource move Db2_HADR_ID2-clone --master
sudo pcs resource clear Db2_HADR_ID2-clone
- On RHEL 7.x -
pcs resource move <resource_name> <host>
: Creates location constraints and can cause issues with takeover - On RHEL 8.x -
pcs resource move <resource_name> --master
: Creates location constraints and can cause issues with takeover pcs resource clear <resource_name>
: Clears location constraintspcs resource cleanup <resource_name>
: Clears all errors of the resource
Test a manual takeover
You can test a manual takeover by stopping the Pacemaker service on az-idb01 node:
systemctl stop pacemaker
status on az-ibdb02
2 nodes configured
5 resources configured
Node az-idb01: pending
Online: [ az-idb02 ]
Full list of resources:
rsc_st_azure (stonith:fence_azure_arm): Started az-idb02
Master/Slave Set: Db2_HADR_ID2-master [Db2_HADR_ID2]
Masters: [ az-idb02 ]
Stopped: [ az-idb01 ]
Resource Group: g_ipnc_db2id2_ID2
vip_db2id2_ID2 (ocf::heartbeat:IPaddr2): Started az-idb02
nc_db2id2_ID2 (ocf::heartbeat:azure-lb): Started az-idb02
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
After the failover, you can start the service again on az-idb01.
systemctl start pacemaker
Kill the Db2 process on the node that runs the HADR primary database
#Kill main db2 process - db2sysc
[sapadmin@az-idb02 ~]$ sudo ps -ef|grep db2sysc
db2ptr 34598 34596 8 14:21 ? 00:00:07 db2sysc 0
[sapadmin@az-idb02 ~]$ sudo kill -9 34598
The Db2 instance is going to fail, and Pacemaker will move master node and report following status:
2 nodes configured
5 resources configured
Online: [ az-idb01 az-idb02 ]
Full list of resources:
rsc_st_azure (stonith:fence_azure_arm): Started az-idb02
Master/Slave Set: Db2_HADR_ID2-master [Db2_HADR_ID2]
Masters: [ az-idb02 ]
Stopped: [ az-idb01 ]
Resource Group: g_ipnc_db2id2_ID2
vip_db2id2_ID2 (ocf::heartbeat:IPaddr2): Started az-idb02
nc_db2id2_ID2 (ocf::heartbeat:azure-lb): Started az-idb02
Failed Actions:
* Db2_HADR_ID2_demote_0 on az-idb01 'unknown error' (1): call=49, status=complete, exitreason='none',
last-rc-change='Wed Jun 26 09:57:35 2019', queued=0ms, exec=362ms
Pacemaker restarts the Db2 primary database instance on the same node, or it fails over to the node that's running the secondary database instance and an error is reported.
Kill the Db2 process on the node that runs the secondary database instance
[sapadmin@az-idb02 ~]$ sudo ps -ef|grep db2sysc
db2id2 23144 23142 2 09:53 ? 00:00:13 db2sysc 0
[sapadmin@az-idb02 ~]$ sudo kill -9 23144
The node gets into failed stated and error reported.
2 nodes configured
5 resources configured
Online: [ az-idb01 az-idb02 ]
Full list of resources:
rsc_st_azure (stonith:fence_azure_arm): Started az-idb02
Master/Slave Set: Db2_HADR_ID2-master [Db2_HADR_ID2]
Masters: [ az-idb01 ]
Slaves: [ az-idb02 ]
Resource Group: g_ipnc_db2id2_ID2
vip_db2id2_ID2 (ocf::heartbeat:IPaddr2): Started az-idb01
nc_db2id2_ID2 (ocf::heartbeat:azure-lb): Started az-idb01
Failed Actions:
* Db2_HADR_ID2_monitor_20000 on az-idb02 'not running' (7): call=144, status=complete, exitreason='none',
last-rc-change='Wed Jun 26 10:02:09 2019', queued=0ms, exec=0ms
The Db2 instance gets restarted in the secondary role it had assigned before.
Stop DB via db2stop force on the node that runs the HADR primary database instance
As user db2<sid> execute command db2stop force:
az-idb01:db2ptr> db2stop force
Failure detected:
2 nodes configured
5 resources configured
Online: [ az-idb01 az-idb02 ]
Full list of resources:
rsc_st_azure (stonith:fence_azure_arm): Started az-idb02
Master/Slave Set: Db2_HADR_ID2-master [Db2_HADR_ID2]
Slaves: [ az-idb02 ]
Stopped: [ az-idb01 ]
Resource Group: g_ipnc_db2id2_ID2
vip_db2id2_ID2 (ocf::heartbeat:IPaddr2): Stopped
nc_db2id2_ID2 (ocf::heartbeat:azure-lb): Stopped
Failed Actions:
* Db2_HADR_ID2_demote_0 on az-idb01 'unknown error' (1): call=110, status=complete, exitreason='none',
last-rc-change='Wed Jun 26 14:03:12 2019', queued=0ms, exec=355ms
The Db2 HADR secondary database instance got promoted into the primary role.
2 nodes configured
5 resources configured
Online: [ az-idb01 az-idb02 ]
Full list of resources:
rsc_st_azure (stonith:fence_azure_arm): Started az-idb02
Master/Slave Set: Db2_HADR_ID2-master [Db2_HADR_ID2]
Masters: [ az-idb02 ]
Slaves: [ az-idb01 ]
Resource Group: g_ipnc_db2id2_ID2
vip_db2id2_ID2 (ocf::heartbeat:IPaddr2): Started az-idb02
nc_db2id2_ID2 (ocf::heartbeat:azure-lb): Started az-idb02
Failed Actions:
* Db2_HADR_ID2_demote_0 on az-idb01 'unknown error' (1): call=110, status=complete, exitreason='none',
last-rc-change='Wed Jun 26 14:03:12 2019', queued=0ms, exec=355ms
Crash the VM that runs the HADR primary database instance with "halt"
#Linux kernel panic.
sudo echo b > /proc/sysrq-trigger
In such a case, Pacemaker detects that the node that's running the primary database instance isn't responding.
2 nodes configured
5 resources configured
Node az-idb01: UNCLEAN (online)
Online: [ az-idb02 ]
Full list of resources:
rsc_st_azure (stonith:fence_azure_arm): Started az-idb02
Master/Slave Set: Db2_HADR_ID2-master [Db2_HADR_ID2]
Masters: [ az-idb01 ]
Slaves: [ az-idb02 ]
Resource Group: g_ipnc_db2id2_ID2
vip_db2id2_ID2 (ocf::heartbeat:IPaddr2): Started az-idb01
nc_db2id2_ID2 (ocf::heartbeat:azure-lb): Started az-idb01
The next step is to check for a Split brain situation. After the surviving node has determined that the node that last ran the primary database instance is down, a failover of resources is executed.
2 nodes configured
5 resources configured
Online: [ az-idb02 ]
OFFLINE: [ az-idb01 ]
Full list of resources:
rsc_st_azure (stonith:fence_azure_arm): Started az-idb02
Master/Slave Set: Db2_HADR_ID2-master [Db2_HADR_ID2]
Masters: [ az-idb02 ]
Stopped: [ az-idb01 ]
Resource Group: g_ipnc_db2id2_ID2
vip_db2id2_ID2 (ocf::heartbeat:IPaddr2): Started az-idb02
nc_db2id2_ID2 (ocf::heartbeat:azure-lb): Started az-idb02
In the event of a kernel panic, the failed node will be restarted by fencing agent. After the failed node is back online, you must start pacemaker cluster by
sudo pcs cluster start
it starts the Db2 instance into the secondary role.
2 nodes configured
5 resources configured
Online: [ az-idb01 az-idb02 ]
Full list of resources:
rsc_st_azure (stonith:fence_azure_arm): Started az-idb02
Master/Slave Set: Db2_HADR_ID2-master [Db2_HADR_ID2]
Masters: [ az-idb02 ]
Slaves: [ az-idb01 ]
Resource Group: g_ipnc_db2id2_ID2
vip_db2id2_ID2 (ocf::heartbeat:IPaddr2): Started az-idb02
nc_db2id2_ID2 (ocf::heartbeat:azure-lb): Started az-idb02