New Cluster Quorum Models in Windows 2008
We Highly Recommend the New Quorum Models in Windows 2008
Microsoft Cluster Services has been completely redesigned in Windows 2008. Windows 2008 offers New Quorum Models that are much superior to the Quorum Models in Windows 2003.
1. What is Quorum and why is it so important?
Quorum is extremely important for any high availability solution, particularly so when the cluster nodes are in different data centers. Consider a cluster that is several office blocks in distance apart. If the network between the two data centers were to fail and isolate or "partition" each of the cluster nodes a cluster "split" can occur. If both each node in each data center were to believe that it should run, the cluster has "split brain syndrome" - meaning the highly available single SAP system has split into two systems. In the worst case scenario users would log into each of the two "split" systems and start entering data.
To avoid split brain syndrome most cluster implementations use some form of voting in order to "elect" by majority an "owner" of a cluster service (such as SAP, SQL or any other service). The concept is similar to a parliament, house or committee. If there are too few members present (less than (n/2) + 1 node) then the committee does not have a quorum (required minimum number of votes) and cannot hold an election. Also in cases where there are only two cluster nodes a "tie breaker" vote can be cast by a Witness or arbitrator
2. What happens if a cluster loses Quorum?
Many customers & partners are surprised to learn that if a healthy running cluster loses Quorum, Microsoft Cluster Services is deliberately designed to stop the cluster services (meaning SAP and SQL will be shutdown). The reason for this relates to topic #1 above. The cluster software must protect against a "split brain" and this can only be guaranteed if (n/2) + 1 nodes are available and cast a "vote".
Therefore it is critical to ensure that Quorum is always maintained or the cluster service will be stopped. Majority Node Set clusters can be forced to manually - see ForceQuorum.
3. Which Quorum Models are Available in Windows 2003?
The most commonly deployed Quorum Model is the Shared Disk Quorum Model. This is very often the "Q: Drive" on Windows 2003 clusters (though there is no requirement for "Q:"). This Quorum Model uses SCSI RESERVE commandsto establish possession of a shared Quorum disk.
The biggest drawback with having a single disk as the Quorum Model is that the disk is in and of itself, a Single Point of Failure (SPOF). The design premise of the HA solution is to eliminate all SPOF in an infrastructure. For this reason Microsoft released an enhancement in Service Pack 1 of Windows 2003 to support Majority Node Set with a File Share Witness. This Quorum Model does not require a "Q: Drive" and the Quorum state is replicated into the %SystemRoot%Cluster directory of each node. This Quorum Model is highly recommended for mission critical clusters or geographically dispersed clusters.
4. Which Quorum Models are Available in Windows 2008 or higher?
Windows 2008 and higher offers these Quorum Models.
The Quorum models are discussed in detail here.
- Node Majority quorum mode - this model requires an odd number of nodes. Uncommon for SAP systems
- Node and Disk Majority quorum mode - this is a combination of Node and Quorum disk. This Quorum Model can be used for clusters where the nodes are all in the one data center
- Node and File Share Majority quorum mode - Common for SAP systems and can also be used for Geographically Dispersed Clusters
- No Majority: Disk Only quorum mode - Traditional Windows 2003 Quorum Disk Model. Recommend to discontinue use of this Model
In addition to the above there are configurations using Node Majority where the cluster is stretched across two datacenters. These "stretch clusters" are called Geoclusters. Please contact Microsoft if you are planning to implement a Geocluster for SAP. Also please note that although Windows 2008 supports cluster nodes on different IP Subnets, SAP does not. Therefore it is still mandatory to span a VLAN across multiple data centers. Technically the SAP application server cannot handle the Message Server changing its IP address suddenly. Today we have many customers running SAP on SQL Server with a Geocluster. An example is Queensland Railways.
5. Which Quorum Model is recommended for SAP Systems?
Each customer environment is different, however in general we would encourage customers to evaluate Node and File Share Majority or Node and Disk Majority on Windows 2008 or higher. Further information can be found in the section "Choosing the quorum mode for a particular cluster"
6. Do SAP Support the new Quorum Models?
Yes, in fact SAP support the Majority Node Set and File Share Witness even for Windows 2003. Today some of our largest customers are running using this Quorum Model on Windows 2003 or Windows 2008. Quanta in Taiwan are using a Majority Node Set and File Share Witness Quorum Model.
In the back pages of the SAP Windows SQL Installation Guide there are sections dealing with MSCS clustering. Majority Node configuration is discussed in this SAP document.
7. What are Multi-SID SAP Clusters?
SAP support installing multiple SAP systems onto a single set of servers in a cluster. As SAP Benchmarks show that Intel Servers are becoming extremely powerful, a single 4CPU Intel Server supporting over 57,000 SAPS and 10,000 users it no longer makes sense to create separate Active/Passive clusters for each SAP component (such as ECC, BW, EP etc).
The Best Practices for installing a Multi-SID SAP cluster will be the topic of a Blog coming soon. We will also provide some examples of successful Multi-SID customer deployments. The SAP Installation Guide for Multi-SID Clusters is available for download.
8. What are some of the other changes in Windows 2008 Clustering?
Windows 2008 clustering has changed so dramatically from Windows 2003 that a direct upgrade is not possible. There are some options discussed in this blog, however for SAP systems the only supported procedure is a SAP Homogeneous System Copy (this involves a complete reinstallation of the operating system and database). This is very simple and quick for SAP on SQL Server. Refer to the SAP System Copy Guide and the notes listed in section 3 of this blog.
Other changes include a redesigned cluster admin tool, cluster validation tool and an increase of the maximum number of nodes in a single cluster from 8 to 16. Today many SAP customers are deploying 3-4 node Multi-SID clusters.
9. List of some important Notes, KB articles and Documents for Clustering
Below are a list of interesting links:
Highly recommended is this blog : https://blogs.msdn.com/b/clustering/
https://blogs.msdn.com/b/clustering/archive/2008/05/10/8483427.aspx
This webcast is strongly recommended for those evaluating a Geographically Dispersed Cluster:
https://msmvps.com/blogs/jtoner/default.aspx
- https://support.microsoft.com/kb/952886
- https://support.microsoft.com/kb/921181
- https://download.microsoft.com/download/4/4/c/44ce4af4-1f41-4e2d-a404-8f50b315a59c/WINDOWS%20SERVER%202008/Windows%20Server%202008%20Building%20High%20Availability%20Infrastructures-%20Ramnish%20%20Singh.pdf
- www.microsoft.com/windowsserver2008/failover-clusters.mspx
- https://download.microsoft.com/download/3/B/5/3B51A025-7522-4686-AA16-8AE2E536034D/Overview%20of%20Failover%20Clustering%20with%20Windows%20Server%202008.doc
- https://h20195.www2.hp.com/V2/getdocument.aspx?docname=4AA2-2644ENW.pdf