本文介绍如何部署虚拟机、配置虚拟机、安装群集框架,以及安装高可用性 SAP NetWeaver 或基于 SAP ABAP 平台的系统。 在示例配置中,使用了 ASCS 实例编号 00、ERS 实例编号 02 和 SAP 系统 ID NW1。
NFS 服务器、SAP NetWeaver ASCS、SAP NetWeaver SCS、SAP NetWeaver ERS 和 SAP HANA 数据库使用虚拟主机名和虚拟 IP 地址。 在 Azure 上,需要负载均衡器才能使用虚拟 IP 地址。 建议使用标准负载均衡器。 显示的配置展示了一个负载均衡器,其中:
SAP 实例的资源代理包含在 SUSE Linux Enterprise Server for SAP Applications 中。 可在 Azure 市场中找到 SUSE Linux Enterprise Server for SAP Applications 12/15 的映像。 可使用该映像来部署新的 VM。
使用 SLES for SAP Applications 映像部署虚拟机。 选择 SAP 系统支持的、合适的 SLES 映像版本。 可以通过任何一个可用性选项(虚拟机规模集、可用性区域或可用性集)来部署 VM。
[1] 为 ASCS 实例创建虚拟 IP 资源和运行状况探测
重要
最近的测试表明,由于积压工作 (backlog) 及其仅处理一个连接的限制,netcat 停止响应请求。 netcat 资源停止侦听 Azure 负载均衡器请求,并且浮动 IP 变为不可用。
对于现有 Pacemaker 群集,我们过去建议将 netcat 替换为 socat。 当前,我们建议使用 azure-lb 资源代理,它是包 resource-agents 的一部分,具有以下包版本要求:
- 对于 SLES 12 SP4/SP5,版本必须至少为 resource-agents-4.3.018.a7fb5035-3.30.1。
- 对于 SLES 15/15 SP1,版本必须至少为 resource-agents-4.3.0184.6ee15eb2-4.13.1。
请注意,更改将需要短暂的停机时间。
对于现有的 Pacemaker 群集,如果已经按照 Azure 负载平衡器检测强化中所述将配置更改为使用 socat,则无需立即切换到 azure-lb 资源代理。
sudo crm node standby nw1-cl-1
sudo crm configure primitive fs_NW1_ASCS Filesystem device='nw1-nfs:/NW1/ASCS' directory='/usr/sap/NW1/ASCS00' fstype='nfs4' \
op start timeout=60s interval=0 \
op stop timeout=60s interval=0 \
op monitor interval=20s timeout=40s
sudo crm configure primitive vip_NW1_ASCS IPaddr2 \
params ip=10.0.0.7 \
op monitor interval=10 timeout=20
sudo crm configure primitive nc_NW1_ASCS azure-lb port=62000 \
op monitor timeout=20s interval=10
sudo crm configure group g-NW1_ASCS fs_NW1_ASCS nc_NW1_ASCS vip_NW1_ASCS \
meta resource-stickiness=3000
请确保群集状态正常,并且所有资源都已启动。 资源在哪个节点上运行并不重要。
sudo crm_mon -r
# Node nw1-cl-1: standby
# Online: [ nw1-cl-0 ]
#
# Full list of resources:
#
# stonith-sbd (stonith:external/sbd): Started nw1-cl-0
# Resource Group: g-NW1_ASCS
# fs_NW1_ASCS (ocf::heartbeat:Filesystem): Started nw1-cl-0
# nc_NW1_ASCS (ocf::heartbeat:azure-lb): Started nw1-cl-0
# vip_NW1_ASCS (ocf::heartbeat:IPaddr2): Started nw1-cl-0
[1] 安装 SAP NetWeaver ASCS
使用映射到适用于 ASCS 的负载均衡器前端配置的 IP 地址(例如 nw1-ascs、10.0.0.7)以及用于负载均衡器探测的实例编号(例如 00)的虚拟主机名,在第一个节点上以 root 身份安装 SAP NetWeaver ASCS。
可以使用 sapinst 参数 SAPINST_REMOTE_ACCESS_USER 允许非根用户连接到 sapinst。
sudo <swpm>/sapinst SAPINST_REMOTE_ACCESS_USER=sapadmin SAPINST_USE_HOSTNAME=virtual_hostname
如果安装过程无法在 /usr/sap/NW1/ASCS00 中创建子文件夹,请尝试设置 ASCS00 文件夹的所有者和组,然后重试。
chown nw1adm /usr/sap/NW1/ASCS00
chgrp sapsys /usr/sap/NW1/ASCS00
[1] 为 ERS 实例创建虚拟 IP 资源和运行状况探测
sudo crm node online nw1-cl-1
sudo crm node standby nw1-cl-0
sudo crm configure primitive fs_NW1_ERS Filesystem device='nw1-nfs:/NW1/ASCSERS' directory='/usr/sap/NW1/ERS02' fstype='nfs4' \
op start timeout=60s interval=0 \
op stop timeout=60s interval=0 \
op monitor interval=20s timeout=40s
sudo crm configure primitive vip_NW1_ERS IPaddr2 \
params ip=10.0.0.8 \
op monitor interval=10 timeout=20
sudo crm configure primitive nc_NW1_ERS azure-lb port=62102 \
op monitor timeout=20s interval=10
sudo crm configure group g-NW1_ERS fs_NW1_ERS nc_NW1_ERS vip_NW1_ERS
请确保群集状态正常,并且所有资源都已启动。 资源在哪个节点上运行并不重要。
sudo crm_mon -r
# Node nw1-cl-0: standby
# Online: [ nw1-cl-1 ]
#
# Full list of resources:
#
# stonith-sbd (stonith:external/sbd): Started nw1-cl-1
# Resource Group: g-NW1_ASCS
# fs_NW1_ASCS (ocf::heartbeat:Filesystem): Started nw1-cl-1
# nc_NW1_ASCS (ocf::heartbeat:azure-lb): Started nw1-cl-1
# vip_NW1_ASCS (ocf::heartbeat:IPaddr2): Started nw1-cl-1
# Resource Group: g-NW1_ERS
# fs_NW1_ERS (ocf::heartbeat:Filesystem): Started nw1-cl-1
# nc_NW1_ERS (ocf::heartbeat:azure-lb): Started nw1-cl-1
# vip_NW1_ERS (ocf::heartbeat:IPaddr2): Started nw1-cl-1
[2] 安装 SAP Netweaver ERS
使用映射到适用于 ERS 的负载均衡器前端配置的 IP 地址(例如 nw1-ers、10.0.0.8)以及用于负载均衡器探测的实例编号(例如 02)的虚拟主机名,在第二个节点上以 root 身份安装 SAP NetWeaver ERS。
可以使用 sapinst 参数 SAPINST_REMOTE_ACCESS_USER 允许非根用户连接到 sapinst。
sudo <swpm>/sapinst SAPINST_REMOTE_ACCESS_USER=sapadmin SAPINST_USE_HOSTNAME=virtual_hostname
备注
使用 SWPM SP 20 PL 05 或更高版本。 较低版本不会正确设置权限,安装将失败。
如果安装过程无法在 /usr/sap/NW1/ERS02 中创建子文件夹,请尝试设置 ERS02 文件夹的所有者和组,然后重试。
chown nw1adm /usr/sap/NW1/ERS02
chgrp sapsys /usr/sap/NW1/ERS02
[1] 调整 ASCS/SCS 和 ERS 实例配置文件
ASCS/SCS 配置文件
sudo vi /sapmnt/NW1/profile/NW1_ASCS00_nw1-ascs
# Change the restart command to a start command
#Restart_Program_01 = local $(_EN) pf=$(_PF)
Start_Program_01 = local $(_EN) pf=$(_PF)
# Add the following lines
service/halib = $(DIR_CT_RUN)/saphascriptco.so
service/halib_cluster_connector = /usr/bin/sap_suse_cluster_connector
# Add the keep alive parameter, if using ENSA1
enque/encni/set_so_keepalive = TRUE
对于 ENSA1 和 ENSA2,请确保按 SAP 说明 1410736 中所述设置 keepalive
OS 参数。
ERS 配置文件
sudo vi /sapmnt/NW1/profile/NW1_ERS02_nw1-aers
# Change the restart command to a start command
#Restart_Program_00 = local $(_ER) pf=$(_PFL) NR=$(SCSID)
Start_Program_00 = local $(_ER) pf=$(_PFL) NR=$(SCSID)
# Add the following lines
service/halib = $(DIR_CT_RUN)/saphascriptco.so
service/halib_cluster_connector = /usr/bin/sap_suse_cluster_connector
# remove Autostart from ERS profile
# Autostart = 1
[A] 配置 Keep Alive
SAP NetWeaver 应用程序服务器和 ASCS/SCS 之间的通信是通过软件负载均衡器进行路由的。 负载均衡器在可配置的超时之后将断开非活动连接。 若要防止出现这种情况,需要在 SAP NetWeaver ASCS/SCS 配置文件中设置参数(如果使用 ENSA1),并在所有 SAP 服务器上为 ENSA1/ENSA2 更改 Linux 系统 keepalive
设置。 有关详细信息,请参阅 SAP 说明 1410736。
# Change the Linux system configuration
sudo sysctl net.ipv4.tcp_keepalive_time=300
[A] 在安装后配置 SAP 用户
# Add sidadm to the haclient group
sudo usermod -aG haclient nw1adm
[1] 将 ASCS 和 ERS SAP 服务添加到 sapservice 文件
将 ASCS 服务入口添加到第二个节点,并将 ERS 服务入口复制到第一个节点。
cat /usr/sap/sapservices | grep ASCS00 | sudo ssh nw1-cl-1 "cat >>/usr/sap/sapservices"
sudo ssh nw1-cl-1 "cat /usr/sap/sapservices" | grep ERS02 | sudo tee -a /usr/sap/sapservices
[A] 禁用 ASCS 和 ERS SAP 实例的 systemd
服务。 仅当 SAP 启动框架由 systemd 管理(如 SAP 说明 3115048 中描述的那样)时,此步骤才适用
备注
使用 SLES 群集配置来管理 SAP ASCS 和 SAP ERS 之类的 SAP 实例时,需要进行额外的修改以将群集与基于 systemd 的原生 SAP 启动框架集成。 这可以确保维护过程不会损害群集稳定性。 按照 SAP 说明 3115048 安装 SAP 启动框架或将其切换到启用 systemd 的设置后,应该为 ASCS 和 ERS SAP 实例禁用 systemd
服务。
# Stop ASCS and ERS instances using <sid>adm
sapcontrol -nr 00 -function Stop
sapcontrol -nr 00 -function StopService
sapcontrol -nr 01 -function Stop
sapcontrol -nr 01 -function StopService
# Execute below command on VM where you have performed ASCS instance installation (e.g. nw1-cl-0)
sudo systemctl disable SAPNW1_00
# Execute below command on VM where you have performed ERS instance installation (e.g. nw1-cl-1)
sudo systemctl disable SAPNW1_01
[1] 创建 SAP 群集资源
根据是运行 ENSA1 还是 ENSA2 系统,选择相应的选项卡来定义资源。 SAP 在 SAP NetWeaver 7.52 中引入了对 ENSA2 的支持,包括复制。 从 ABAP 平台 1809 开始,默认会安装 ENSA2。 有关 ENSA2 支持,请参阅 SAP 说明 2630416。
sudo crm configure property maintenance-mode="true"
sudo crm configure primitive rsc_sap_NW1_ASCS00 SAPInstance \
operations \$id=rsc_sap_NW1_ASCS00-operations \
op monitor interval=11 timeout=60 on-fail=restart \
params InstanceName=NW1_ASCS00_nw1-ascs START_PROFILE="/sapmnt/NW1/profile/NW1_ASCS00_nw1-ascs" \
AUTOMATIC_RECOVER=false \
meta resource-stickiness=5000 failure-timeout=60 migration-threshold=1 priority=10
sudo crm configure primitive rsc_sap_NW1_ERS02 SAPInstance \
operations \$id=rsc_sap_NW1_ERS02-operations \
op monitor interval=11 timeout=60 on-fail=restart \
params InstanceName=NW1_ERS02_nw1-aers START_PROFILE="/sapmnt/NW1/profile/NW1_ERS02_nw1-aers" AUTOMATIC_RECOVER=false IS_ERS=true \
meta priority=1000
sudo crm configure modgroup g-NW1_ASCS add rsc_sap_NW1_ASCS00
sudo crm configure modgroup g-NW1_ERS add rsc_sap_NW1_ERS02
sudo crm configure colocation col_sap_NW1_no_both -5000: g-NW1_ERS g-NW1_ASCS
sudo crm configure location loc_sap_NW1_failover_to_ers rsc_sap_NW1_ASCS00 rule 2000: runs_ers_NW1 eq 1
sudo crm configure order ord_sap_NW1_first_start_ascs Optional: rsc_sap_NW1_ASCS00:start rsc_sap_NW1_ERS02:stop symmetrical=false
sudo crm_attribute --delete --name priority-fencing-delay
sudo crm node online nw1-cl-0
sudo crm configure property maintenance-mode="false"
备注
如果有运行 ENSA2 的双节点群集,则可以选择配置 priority-fencing-delay 群集属性。 当发生分脑方案时,此属性会在隔离具有较高总重新优先级的节点时引入额外的延迟。 有关详细信息,请参阅 SUSE Linux Enteprise Server 高可用性扩展管理指南。
属性 priority-fencing-delay 仅适用于在双节点群集上运行的 ENSA2。
sudo crm configure property maintenance-mode="true"
sudo crm configure property priority-fencing-delay=30
sudo crm configure primitive rsc_sap_NW1_ASCS00 SAPInstance \
operations \$id=rsc_sap_NW1_ASCS00-operations \
op monitor interval=11 timeout=60 on-fail=restart \
params InstanceName=NW1_ASCS00_nw1-ascs START_PROFILE="/sapmnt/NW1/profile/NW1_ASCS00_nw1-ascs" \
AUTOMATIC_RECOVER=false \
meta resource-stickiness=5000 priority=100
sudo crm configure primitive rsc_sap_NW1_ERS02 SAPInstance \
operations \$id=rsc_sap_NW1_ERS02-operations \
op monitor interval=11 timeout=60 on-fail=restart \
params InstanceName=NW1_ERS02_nw1-aers START_PROFILE="/sapmnt/NW1/profile/NW1_ERS02_nw1-aers" AUTOMATIC_RECOVER=false IS_ERS=true
sudo crm configure modgroup g-NW1_ASCS add rsc_sap_NW1_ASCS00
sudo crm configure modgroup g-NW1_ERS add rsc_sap_NW1_ERS02
sudo crm configure colocation col_sap_NW1_no_both -5000: g-NW1_ERS g-NW1_ASCS
sudo crm configure order ord_sap_NW1_first_start_ascs Optional: rsc_sap_NW1_ASCS00:start rsc_sap_NW1_ERS02:stop symmetrical=false
sudo crm node online nw1-cl-0
sudo crm configure property maintenance-mode="false"
测试 HAGetFailoverConfig、HACheckConfig 和 HACheckFailoverConfig
在当前运行 ASCS 实例的节点上运行以下命令作为 <sapsid>adm。 如果命令失败,并显示“失败: 内存不足”消息,则原因可能是主机名中存在短划线。 这是一个已知问题,将由 SUSE 在 sap-suse-cluster-connector 包中进行修复。
nw1-cl-0:nw1adm 54> sapcontrol -nr 00 -function HAGetFailoverConfig
# 15.08.2018 13:50:36
# HAGetFailoverConfig
# OK
# HAActive: TRUE
# HAProductVersion: Toolchain Module
# HASAPInterfaceVersion: Toolchain Module (sap_suse_cluster_connector 3.0.1)
# HADocumentation: https://www.suse.com/products/sles-for-sap/resource-library/sap-best-practices/
# HAActiveNode:
# HANodes: nw1-cl-0, nw1-cl-1
nw1-cl-0:nw1adm 55> sapcontrol -nr 00 -function HACheckConfig
# 15.08.2018 14:00:04
# HACheckConfig
# OK
# state, category, description, comment
# SUCCESS, SAP CONFIGURATION, Redundant ABAP instance configuration, 2 ABAP instances detected
# SUCCESS, SAP CONFIGURATION, Redundant Java instance configuration, 0 Java instances detected
# SUCCESS, SAP CONFIGURATION, Enqueue separation, All Enqueue server separated from application server
# SUCCESS, SAP CONFIGURATION, MessageServer separation, All MessageServer separated from application server
# SUCCESS, SAP CONFIGURATION, ABAP instances on multiple hosts, ABAP instances on multiple hosts detected
# SUCCESS, SAP CONFIGURATION, Redundant ABAP SPOOL service configuration, 2 ABAP instances with SPOOL service detected
# SUCCESS, SAP STATE, Redundant ABAP SPOOL service state, 2 ABAP instances with active SPOOL service detected
# SUCCESS, SAP STATE, ABAP instances with ABAP SPOOL service on multiple hosts, ABAP instances with active ABAP SPOOL service on multiple hosts detected
# SUCCESS, SAP CONFIGURATION, Redundant ABAP BATCH service configuration, 2 ABAP instances with BATCH service detected
# SUCCESS, SAP STATE, Redundant ABAP BATCH service state, 2 ABAP instances with active BATCH service detected
# SUCCESS, SAP STATE, ABAP instances with ABAP BATCH service on multiple hosts, ABAP instances with active ABAP BATCH service on multiple hosts detected
# SUCCESS, SAP CONFIGURATION, Redundant ABAP DIALOG service configuration, 2 ABAP instances with DIALOG service detected
# SUCCESS, SAP STATE, Redundant ABAP DIALOG service state, 2 ABAP instances with active DIALOG service detected
# SUCCESS, SAP STATE, ABAP instances with ABAP DIALOG service on multiple hosts, ABAP instances with active ABAP DIALOG service on multiple hosts detected
# SUCCESS, SAP CONFIGURATION, Redundant ABAP UPDATE service configuration, 2 ABAP instances with UPDATE service detected
# SUCCESS, SAP STATE, Redundant ABAP UPDATE service state, 2 ABAP instances with active UPDATE service detected
# SUCCESS, SAP STATE, ABAP instances with ABAP UPDATE service on multiple hosts, ABAP instances with active ABAP UPDATE service on multiple hosts detected
# SUCCESS, SAP STATE, SCS instance running, SCS instance status ok
# SUCCESS, SAP CONFIGURATION, SAPInstance RA sufficient version (nw1-ascs_NW1_00), SAPInstance includes is-ers patch
# SUCCESS, SAP CONFIGURATION, Enqueue replication (nw1-ascs_NW1_00), Enqueue replication enabled
# SUCCESS, SAP STATE, Enqueue replication state (nw1-ascs_NW1_00), Enqueue replication active
nw1-cl-0:nw1adm 56> sapcontrol -nr 00 -function HACheckFailoverConfig
# 15.08.2018 14:04:08
# HACheckFailoverConfig
# OK
# state, category, description, comment
# SUCCESS, SAP CONFIGURATION, SAPInstance RA sufficient version, SAPInstance includes is-ers patch
手动迁移 ASCS 实例
开始测试之前的资源状态:
stonith-sbd (stonith:external/sbd): Started nw1-cl-0
Resource Group: g-NW1_ASCS
fs_NW1_ASCS (ocf::heartbeat:Filesystem): Started nw1-cl-0
nc_NW1_ASCS (ocf::heartbeat:azure-lb): Started nw1-cl-0
vip_NW1_ASCS (ocf::heartbeat:IPaddr2): Started nw1-cl-0
rsc_sap_NW1_ASCS00 (ocf::heartbeat:SAPInstance): Started nw1-cl-0
Resource Group: g-NW1_ERS
fs_NW1_ERS (ocf::heartbeat:Filesystem): Started nw1-cl-1
nc_NW1_ERS (ocf::heartbeat:azure-lb): Started nw1-cl-1
vip_NW1_ERS (ocf::heartbeat:IPaddr2): Started nw1-cl-1
rsc_sap_NW1_ERS02 (ocf::heartbeat:SAPInstance): Started nw1-cl-1
运行以下命令作为根,迁移 ASCS 实例。
nw1-cl-0:~ # crm resource migrate rsc_sap_NW1_ASCS00 force
# INFO: Move constraint created for rsc_sap_NW1_ASCS00
nw1-cl-0:~ # crm resource unmigrate rsc_sap_NW1_ASCS00
# INFO: Removed migration constraints for rsc_sap_NW1_ASCS00
# Remove failed actions for the ERS that occurred as part of the migration
nw1-cl-0:~ # crm resource cleanup rsc_sap_NW1_ERS02
测试之后的资源状态:
stonith-sbd (stonith:external/sbd): Started nw1-cl-0
Resource Group: g-NW1_ASCS
fs_NW1_ASCS (ocf::heartbeat:Filesystem): Started nw1-cl-1
nc_NW1_ASCS (ocf::heartbeat:azure-lb): Started nw1-cl-1
vip_NW1_ASCS (ocf::heartbeat:IPaddr2): Started nw1-cl-1
rsc_sap_NW1_ASCS00 (ocf::heartbeat:SAPInstance): Started nw1-cl-1
Resource Group: g-NW1_ERS
fs_NW1_ERS (ocf::heartbeat:Filesystem): Started nw1-cl-0
nc_NW1_ERS (ocf::heartbeat:azure-lb): Started nw1-cl-0
vip_NW1_ERS (ocf::heartbeat:IPaddr2): Started nw1-cl-0
rsc_sap_NW1_ERS02 (ocf::heartbeat:SAPInstance): Started nw1-cl-0
测试 HAFailoverToNode
开始测试之前的资源状态:
stonith-sbd (stonith:external/sbd): Started nw1-cl-0
Resource Group: g-NW1_ASCS
fs_NW1_ASCS (ocf::heartbeat:Filesystem): Started nw1-cl-1
nc_NW1_ASCS (ocf::heartbeat:azure-lb): Started nw1-cl-1
vip_NW1_ASCS (ocf::heartbeat:IPaddr2): Started nw1-cl-1
rsc_sap_NW1_ASCS00 (ocf::heartbeat:SAPInstance): Started nw1-cl-1
Resource Group: g-NW1_ERS
fs_NW1_ERS (ocf::heartbeat:Filesystem): Started nw1-cl-0
nc_NW1_ERS (ocf::heartbeat:azure-lb): Started nw1-cl-0
vip_NW1_ERS (ocf::heartbeat:IPaddr2): Started nw1-cl-0
rsc_sap_NW1_ERS02 (ocf::heartbeat:SAPInstance): Started nw1-cl-0
运行以下命令作为 <sapsid>adm,迁移 ASCS 实例。
nw1-cl-0:nw1adm 55> sapcontrol -nr 00 -host nw1-ascs -user nw1adm <password> -function HAFailoverToNode ""
# run as root
# Remove failed actions for the ERS that occurred as part of the migration
nw1-cl-0:~ # crm resource cleanup rsc_sap_NW1_ERS02
# Remove migration constraints
nw1-cl-0:~ # crm resource clear rsc_sap_NW1_ASCS00
#INFO: Removed migration constraints for rsc_sap_NW1_ASCS00
测试之后的资源状态:
stonith-sbd (stonith:external/sbd): Started nw1-cl-0
Resource Group: g-NW1_ASCS
fs_NW1_ASCS (ocf::heartbeat:Filesystem): Started nw1-cl-0
nc_NW1_ASCS (ocf::heartbeat:azure-lb): Started nw1-cl-0
vip_NW1_ASCS (ocf::heartbeat:IPaddr2): Started nw1-cl-0
rsc_sap_NW1_ASCS00 (ocf::heartbeat:SAPInstance): Started nw1-cl-0
Resource Group: g-NW1_ERS
fs_NW1_ERS (ocf::heartbeat:Filesystem): Started nw1-cl-1
nc_NW1_ERS (ocf::heartbeat:azure-lb): Started nw1-cl-1
vip_NW1_ERS (ocf::heartbeat:IPaddr2): Started nw1-cl-1
rsc_sap_NW1_ERS02 (ocf::heartbeat:SAPInstance): Started nw1-cl-1
模拟节点故障
开始测试之前的资源状态:
stonith-sbd (stonith:external/sbd): Started nw1-cl-0
Resource Group: g-NW1_ASCS
fs_NW1_ASCS (ocf::heartbeat:Filesystem): Started nw1-cl-0
nc_NW1_ASCS (ocf::heartbeat:azure-lb): Started nw1-cl-0
vip_NW1_ASCS (ocf::heartbeat:IPaddr2): Started nw1-cl-0
rsc_sap_NW1_ASCS00 (ocf::heartbeat:SAPInstance): Started nw1-cl-0
Resource Group: g-NW1_ERS
fs_NW1_ERS (ocf::heartbeat:Filesystem): Started nw1-cl-1
nc_NW1_ERS (ocf::heartbeat:azure-lb): Started nw1-cl-1
vip_NW1_ERS (ocf::heartbeat:IPaddr2): Started nw1-cl-1
rsc_sap_NW1_ERS02 (ocf::heartbeat:SAPInstance): Started nw1-cl-1
在其中运行 ASCS 实例的节点上运行以下命令作为根
nw1-cl-0:~ # echo b > /proc/sysrq-trigger
如果使用 SBD,则 Pacemaker 不应在已终止的节点上自动启动。 节点再次启动后的状态应类似如下所示。
Online: [ nw1-cl-1 ]
OFFLINE: [ nw1-cl-0 ]
Full list of resources:
stonith-sbd (stonith:external/sbd): Started nw1-cl-1
Resource Group: g-NW1_ASCS
fs_NW1_ASCS (ocf::heartbeat:Filesystem): Started nw1-cl-1
nc_NW1_ASCS (ocf::heartbeat:azure-lb): Started nw1-cl-1
vip_NW1_ASCS (ocf::heartbeat:IPaddr2): Started nw1-cl-1
rsc_sap_NW1_ASCS00 (ocf::heartbeat:SAPInstance): Started nw1-cl-1
Resource Group: g-NW1_ERS
fs_NW1_ERS (ocf::heartbeat:Filesystem): Started nw1-cl-1
nc_NW1_ERS (ocf::heartbeat:azure-lb): Started nw1-cl-1
vip_NW1_ERS (ocf::heartbeat:IPaddr2): Started nw1-cl-1
rsc_sap_NW1_ERS02 (ocf::heartbeat:SAPInstance): Started nw1-cl-1
Failed Actions:
* rsc_sap_NW1_ERS02_monitor_11000 on nw1-cl-1 'not running' (7): call=219, status=complete, exitreason='none',
last-rc-change='Wed Aug 15 14:38:38 2018', queued=0ms, exec=0ms
运行以下命令在已终止的节点上启动 Pacemaker,并清理 SBD 消息和已失败的资源。
# run as root
# list the SBD device(s)
nw1-cl-0:~ # cat /etc/sysconfig/sbd | grep SBD_DEVICE=
# SBD_DEVICE="/dev/disk/by-id/scsi-36001405772fe8401e6240c985857e116;/dev/disk/by-id/scsi-36001405034a84428af24ddd8c3a3e9e1;/dev/disk/by-id/scsi-36001405cdd5ac8d40e548449318510c3"
nw1-cl-0:~ # sbd -d /dev/disk/by-id/scsi-36001405772fe8401e6240c985857e116 -d /dev/disk/by-id/scsi-36001405034a84428af24ddd8c3a3e9e1 -d /dev/disk/by-id/scsi-36001405cdd5ac8d40e548449318510c3 message nw1-cl-0 clear
nw1-cl-0:~ # systemctl start pacemaker
nw1-cl-0:~ # crm resource cleanup rsc_sap_NW1_ASCS00
nw1-cl-0:~ # crm resource cleanup rsc_sap_NW1_ERS02
测试之后的资源状态:
stonith-sbd (stonith:external/sbd): Started nw1-cl-1
Resource Group: g-NW1_ASCS
fs_NW1_ASCS (ocf::heartbeat:Filesystem): Started nw1-cl-1
nc_NW1_ASCS (ocf::heartbeat:azure-lb): Started nw1-cl-1
vip_NW1_ASCS (ocf::heartbeat:IPaddr2): Started nw1-cl-1
rsc_sap_NW1_ASCS00 (ocf::heartbeat:SAPInstance): Started nw1-cl-1
Resource Group: g-NW1_ERS
fs_NW1_ERS (ocf::heartbeat:Filesystem): Started nw1-cl-0
nc_NW1_ERS (ocf::heartbeat:azure-lb): Started nw1-cl-0
vip_NW1_ERS (ocf::heartbeat:IPaddr2): Started nw1-cl-0
rsc_sap_NW1_ERS02 (ocf::heartbeat:SAPInstance): Started nw1-cl-0
阻止网络通信
开始测试之前的资源状态:
stonith-sbd (stonith:external/sbd): Started nw1-cl-1
Resource Group: g-NW1_ASCS
fs_NW1_ASCS (ocf::heartbeat:Filesystem): Started nw1-cl-1
nc_NW1_ASCS (ocf::heartbeat:azure-lb): Started nw1-cl-1
vip_NW1_ASCS (ocf::heartbeat:IPaddr2): Started nw1-cl-1
rsc_sap_NW1_ASCS00 (ocf::heartbeat:SAPInstance): Started nw1-cl-1
Resource Group: g-NW1_ERS
fs_NW1_ERS (ocf::heartbeat:Filesystem): Started nw1-cl-0
nc_NW1_ERS (ocf::heartbeat:azure-lb): Started nw1-cl-0
vip_NW1_ERS (ocf::heartbeat:IPaddr2): Started nw1-cl-0
rsc_sap_NW1_ERS02 (ocf::heartbeat:SAPInstance): Started nw1-cl-0
执行防火墙规则以阻止其中一个节点上的通信。
# Execute iptable rule on nw1-cl-0 (10.0.0.5) to block the incoming and outgoing traffic to nw1-cl-1 (10.0.0.6)
iptables -A INPUT -s 10.0.0.6 -j DROP; iptables -A OUTPUT -d 10.0.0.6 -j DROP
当群集节点无法相互通信时,存在脑裂情况的风险。 在这种情况下,群集节点尝试同时相互隔离,从而导致隔离竞赛。
配置隔离设备时,建议配置 pcmk_delay_max
属性。 因此,在脑裂场景中,群集会向每个节点上的隔离操作引入随机延迟,最高为 pcmk_delay_max
值。 将选择延迟最短的节点进行隔离。
此外,在 ENSA 2 配置中,若要在 Split Brain 场景中将托管 ASCS 资源的节点优先于其他节点,建议在群集中配置 priority-fencing-delay
属性。 通过启用 priority-fencing-delay 属性,群集会专门在托管 ASCS 资源的节点上引入额外的隔离操作延迟,从而让 ASCS 节点赢得隔离竞赛。
执行以下命令以删除防火墙规则。
# If the iptables rule set on the server gets reset after a reboot, the rules will be cleared out. In case they have not been reset, please proceed to remove the iptables rule using the following command.
iptables -D INPUT -s 10.0.0.6 -j DROP; iptables -D OUTPUT -d 10.0.0.6 -j DROP
测试 ASCS 实例的手动重启
开始测试之前的资源状态:
stonith-sbd (stonith:external/sbd): Started nw1-cl-1
Resource Group: g-NW1_ASCS
fs_NW1_ASCS (ocf::heartbeat:Filesystem): Started nw1-cl-1
nc_NW1_ASCS (ocf::heartbeat:azure-lb): Started nw1-cl-1
vip_NW1_ASCS (ocf::heartbeat:IPaddr2): Started nw1-cl-1
rsc_sap_NW1_ASCS00 (ocf::heartbeat:SAPInstance): Started nw1-cl-1
Resource Group: g-NW1_ERS
fs_NW1_ERS (ocf::heartbeat:Filesystem): Started nw1-cl-0
nc_NW1_ERS (ocf::heartbeat:azure-lb): Started nw1-cl-0
vip_NW1_ERS (ocf::heartbeat:IPaddr2): Started nw1-cl-0
rsc_sap_NW1_ERS02 (ocf::heartbeat:SAPInstance): Started nw1-cl-0
通过编辑事务 su01 中的用户等方式创建一个排队锁。 在运行 ASCS 实例的节点上,以 <sapsid>adm 身份运行以下命令。 这些命令将停止 ASCS 实例并重新启动该实例。 如果使用排队服务器 1 体系结构,则排队锁预计会在此测试中丢失。 如果使用排队服务器 2 体系结构,则将保留该排队。
nw1-cl-1:nw1adm 54> sapcontrol -nr 00 -function StopWait 600 2
ASCS 实例在 Pacemaker 中现在应为禁用状态
rsc_sap_NW1_ASCS00 (ocf::heartbeat:SAPInstance): Stopped (disabled)
在同一节点上再次启动 ASCS 实例。
nw1-cl-1:nw1adm 54> sapcontrol -nr 00 -function StartWait 600 2
事务 su01 的排队锁应会丢失,且后端应已重置。 测试之后的资源状态:
stonith-sbd (stonith:external/sbd): Started nw1-cl-1
Resource Group: g-NW1_ASCS
fs_NW1_ASCS (ocf::heartbeat:Filesystem): Started nw1-cl-1
nc_NW1_ASCS (ocf::heartbeat:azure-lb): Started nw1-cl-1
vip_NW1_ASCS (ocf::heartbeat:IPaddr2): Started nw1-cl-1
rsc_sap_NW1_ASCS00 (ocf::heartbeat:SAPInstance): Started nw1-cl-1
Resource Group: g-NW1_ERS
fs_NW1_ERS (ocf::heartbeat:Filesystem): Started nw1-cl-0
nc_NW1_ERS (ocf::heartbeat:azure-lb): Started nw1-cl-0
vip_NW1_ERS (ocf::heartbeat:IPaddr2): Started nw1-cl-0
rsc_sap_NW1_ERS02 (ocf::heartbeat:SAPInstance): Started nw1-cl-0
终止消息服务器进程
开始测试之前的资源状态:
stonith-sbd (stonith:external/sbd): Started nw1-cl-1
Resource Group: g-NW1_ASCS
fs_NW1_ASCS (ocf::heartbeat:Filesystem): Started nw1-cl-1
nc_NW1_ASCS (ocf::heartbeat:azure-lb): Started nw1-cl-1
vip_NW1_ASCS (ocf::heartbeat:IPaddr2): Started nw1-cl-1
rsc_sap_NW1_ASCS00 (ocf::heartbeat:SAPInstance): Started nw1-cl-1
Resource Group: g-NW1_ERS
fs_NW1_ERS (ocf::heartbeat:Filesystem): Started nw1-cl-0
nc_NW1_ERS (ocf::heartbeat:azure-lb): Started nw1-cl-0
vip_NW1_ERS (ocf::heartbeat:IPaddr2): Started nw1-cl-0
rsc_sap_NW1_ERS02 (ocf::heartbeat:SAPInstance): Started nw1-cl-0
运行以下命令作为根,确定消息服务器的进程并将其终止。
nw1-cl-1:~ # pgrep -f ms.sapNW1 | xargs kill -9
如果仅终止消息服务器一次,则 sapstart 会重启它。 如果经常终止消息服务器,Pacemaker 会最终将 ASCS 实例移到另一个节点(在使用 ENSA1 的情况下)。 运行以下命令作为根,清除测试后的 ASCS 和 ERS 实例的资源状态。
nw1-cl-0:~ # crm resource cleanup rsc_sap_NW1_ASCS00
nw1-cl-0:~ # crm resource cleanup rsc_sap_NW1_ERS02
测试之后的资源状态:
stonith-sbd (stonith:external/sbd): Started nw1-cl-1
Resource Group: g-NW1_ASCS
fs_NW1_ASCS (ocf::heartbeat:Filesystem): Started nw1-cl-0
nc_NW1_ASCS (ocf::heartbeat:azure-lb): Started nw1-cl-0
vip_NW1_ASCS (ocf::heartbeat:IPaddr2): Started nw1-cl-0
rsc_sap_NW1_ASCS00 (ocf::heartbeat:SAPInstance): Started nw1-cl-0
Resource Group: g-NW1_ERS
fs_NW1_ERS (ocf::heartbeat:Filesystem): Started nw1-cl-1
nc_NW1_ERS (ocf::heartbeat:azure-lb): Started nw1-cl-1
vip_NW1_ERS (ocf::heartbeat:IPaddr2): Started nw1-cl-1
rsc_sap_NW1_ERS02 (ocf::heartbeat:SAPInstance): Started nw1-cl-1
终止排队服务器进程
开始测试之前的资源状态:
stonith-sbd (stonith:external/sbd): Started nw1-cl-1
Resource Group: g-NW1_ASCS
fs_NW1_ASCS (ocf::heartbeat:Filesystem): Started nw1-cl-0
nc_NW1_ASCS (ocf::heartbeat:azure-lb): Started nw1-cl-0
vip_NW1_ASCS (ocf::heartbeat:IPaddr2): Started nw1-cl-0
rsc_sap_NW1_ASCS00 (ocf::heartbeat:SAPInstance): Started nw1-cl-0
Resource Group: g-NW1_ERS
fs_NW1_ERS (ocf::heartbeat:Filesystem): Started nw1-cl-1
nc_NW1_ERS (ocf::heartbeat:azure-lb): Started nw1-cl-1
vip_NW1_ERS (ocf::heartbeat:IPaddr2): Started nw1-cl-1
rsc_sap_NW1_ERS02 (ocf::heartbeat:SAPInstance): Started nw1-cl-1
在运行 ASCS 实例的节点上,运行以下命令作为根,以终止排队服务器。
nw1-cl-0:~ #
#If using ENSA1
pgrep -f en.sapNW1 | xargs kill -9
#If using ENSA2
pgrep -f enq.sapNW1 | xargs kill -9
在使用 ENSA1 的情况下,ASCS 实例应会立即故障转移到另一个节点。 ASCS 实例启动后,ERS 实例也会进行故障转移。 运行以下命令作为根,清除测试后的 ASCS 和 ERS 实例的资源状态。
nw1-cl-0:~ # crm resource cleanup rsc_sap_NW1_ASCS00
nw1-cl-0:~ # crm resource cleanup rsc_sap_NW1_ERS02
测试之后的资源状态:
stonith-sbd (stonith:external/sbd): Started nw1-cl-1
Resource Group: g-NW1_ASCS
fs_NW1_ASCS (ocf::heartbeat:Filesystem): Started nw1-cl-1
nc_NW1_ASCS (ocf::heartbeat:azure-lb): Started nw1-cl-1
vip_NW1_ASCS (ocf::heartbeat:IPaddr2): Started nw1-cl-1
rsc_sap_NW1_ASCS00 (ocf::heartbeat:SAPInstance): Started nw1-cl-1
Resource Group: g-NW1_ERS
fs_NW1_ERS (ocf::heartbeat:Filesystem): Started nw1-cl-0
nc_NW1_ERS (ocf::heartbeat:azure-lb): Started nw1-cl-0
vip_NW1_ERS (ocf::heartbeat:IPaddr2): Started nw1-cl-0
rsc_sap_NW1_ERS02 (ocf::heartbeat:SAPInstance): Started nw1-cl-0
终止排队复制服务器进程
开始测试之前的资源状态:
stonith-sbd (stonith:external/sbd): Started nw1-cl-1
Resource Group: g-NW1_ASCS
fs_NW1_ASCS (ocf::heartbeat:Filesystem): Started nw1-cl-1
nc_NW1_ASCS (ocf::heartbeat:azure-lb): Started nw1-cl-1
vip_NW1_ASCS (ocf::heartbeat:IPaddr2): Started nw1-cl-1
rsc_sap_NW1_ASCS00 (ocf::heartbeat:SAPInstance): Started nw1-cl-1
Resource Group: g-NW1_ERS
fs_NW1_ERS (ocf::heartbeat:Filesystem): Started nw1-cl-0
nc_NW1_ERS (ocf::heartbeat:azure-lb): Started nw1-cl-0
vip_NW1_ERS (ocf::heartbeat:IPaddr2): Started nw1-cl-0
rsc_sap_NW1_ERS02 (ocf::heartbeat:SAPInstance): Started nw1-cl-0
在运行 ERS 实例的节点上,运行以下命令作为根,以终止排队复制服务器。
nw1-cl-0:~ # pgrep -f er.sapNW1 | xargs kill -9
如果仅运行该命令一次,则 sapstart 会重启该进程。 如果经常运行此命令,则 sapstart 不会重启该进程,且资源会处于停止状态。 运行以下命令作为根,清除测试后的 ERS 实例的资源状态。
nw1-cl-0:~ # crm resource cleanup rsc_sap_NW1_ERS02
测试之后的资源状态:
stonith-sbd (stonith:external/sbd): Started nw1-cl-1
Resource Group: g-NW1_ASCS
fs_NW1_ASCS (ocf::heartbeat:Filesystem): Started nw1-cl-1
nc_NW1_ASCS (ocf::heartbeat:azure-lb): Started nw1-cl-1
vip_NW1_ASCS (ocf::heartbeat:IPaddr2): Started nw1-cl-1
rsc_sap_NW1_ASCS00 (ocf::heartbeat:SAPInstance): Started nw1-cl-1
Resource Group: g-NW1_ERS
fs_NW1_ERS (ocf::heartbeat:Filesystem): Started nw1-cl-0
nc_NW1_ERS (ocf::heartbeat:azure-lb): Started nw1-cl-0
vip_NW1_ERS (ocf::heartbeat:IPaddr2): Started nw1-cl-0
rsc_sap_NW1_ERS02 (ocf::heartbeat:SAPInstance): Started nw1-cl-0
终止排队 sapstartsrv 进程
开始测试之前的资源状态:
stonith-sbd (stonith:external/sbd): Started nw1-cl-1
Resource Group: g-NW1_ASCS
fs_NW1_ASCS (ocf::heartbeat:Filesystem): Started nw1-cl-1
nc_NW1_ASCS (ocf::heartbeat:azure-lb): Started nw1-cl-1
vip_NW1_ASCS (ocf::heartbeat:IPaddr2): Started nw1-cl-1
rsc_sap_NW1_ASCS00 (ocf::heartbeat:SAPInstance): Started nw1-cl-1
Resource Group: g-NW1_ERS
fs_NW1_ERS (ocf::heartbeat:Filesystem): Started nw1-cl-0
nc_NW1_ERS (ocf::heartbeat:azure-lb): Started nw1-cl-0
vip_NW1_ERS (ocf::heartbeat:IPaddr2): Started nw1-cl-0
rsc_sap_NW1_ERS02 (ocf::heartbeat:SAPInstance): Started nw1-cl-0
在运行 ASCS 的节点上运行以下命令作为根。
nw1-cl-1:~ # pgrep -fl ASCS00.*sapstartsrv
# 59545 sapstartsrv
nw1-cl-1:~ # kill -9 59545
Pacemaker 资源代理应会始终重启 sapstartsrv 进程。 测试之后的资源状态:
stonith-sbd (stonith:external/sbd): Started nw1-cl-1
Resource Group: g-NW1_ASCS
fs_NW1_ASCS (ocf::heartbeat:Filesystem): Started nw1-cl-1
nc_NW1_ASCS (ocf::heartbeat:azure-lb): Started nw1-cl-1
vip_NW1_ASCS (ocf::heartbeat:IPaddr2): Started nw1-cl-1
rsc_sap_NW1_ASCS00 (ocf::heartbeat:SAPInstance): Started nw1-cl-1
Resource Group: g-NW1_ERS
fs_NW1_ERS (ocf::heartbeat:Filesystem): Started nw1-cl-0
nc_NW1_ERS (ocf::heartbeat:azure-lb): Started nw1-cl-0
vip_NW1_ERS (ocf::heartbeat:IPaddr2): Started nw1-cl-0
rsc_sap_NW1_ERS02 (ocf::heartbeat:SAPInstance): Started nw1-cl-0