Implementing Remote Mirroring and Stretch Clustering
Updated : November 14, 2002
This chapter teaches you how to increase the availability of a Microsoft SQL Server data center by using remote mirroring and stretch clustering with geographically separated nodes. After reading this chapter, you will be able to configure each component in these high availability solutions, including the storage device, network, Microsoft Cluster Service (MSCS), Microsoft Distributed Transaction Coordinator (MS DTC), and Microsoft SQL Server 2000.
The implementation steps are annotated to help you understand the selected configuration choices. The configuration options presented in this chapter have been designed using input from the field, reviewed thoroughly by Microsoft Product Support Services (PSS), and tested by an independent team to ensure accuracy and functionality.
Remote mirroring and stretch clustering are only parts of achieving a highly available data center. To achieve a highly available data center, you must also follow the processes outlined in the Planning Guide.
On This Page
Understanding Remote Mirroring and Stretch Clustering
Deploying a Remote Mirror
Failing Over to a Remote Mirror
Deploying Stretch Clustering
Failing Over to a Secondary Node in a Stretch Cluster
Failing Back to the Original Node
Restoring a Node After a Catastrophic Failure
References
Understanding Remote Mirroring and Stretch Clustering
Remote mirroring is a solution offered by third-party vendors that allows you to maintain a real-time mirror of your primary storage device at a remote site, and to protect your data on that mirror from site destruction. In a remote mirror, redundant server hardware and a redundant storage system are maintained at the remote site. SQL Server is installed on both the local and remote mirrors. SQL Server transactions affecting one or more production databases are concurrently written to the local storage system and the remote storage system. A remote mirroring solution ensures absolute transactional currency and consistency for the mirrored production databases. This solution does not have any mechanism for failover for mirroring objects stored in the master or msdb database. For more information on ensuring these objects are updated from the primary site, see Solution Guide Chapter 3, "Implementing Transactional Replication."
Stretch clustering is a high-availability solution offered by third-party vendors that uses a remote mirror and that extends the capabilities of the failover clustering solution provided by SQL Server 2000 Enterprise Edition and the clustering services provided by either Microsoft Windows 2000 Advanced Server or Microsoft Windows 2000 Datacenter Server. With stretch clustering, SQL Server transactions are concurrently written to the local storage system and the remote storage system.
Stretch clustering includes most of the benefits of standard failover clustering but also protects the data center from site destruction. Stretch clustering ensures absolute transactional currency and consistency and makes failover virtually transparent to the client. Consider using stretch clustering to protect against site destruction if the risk of site destruction is high, the cost of an unavailable data center is high, and you require transactional currency.
Stretch clustering maintains at least one standby server in an MSCS cluster in case the primary data center server fails. Windows 2000 Advanced Server supports two-server clusters, and Windows 2000 Datacenter Server supports clusters containing up to four servers. With stretch clustering, at least one standby server is in a remote site. If you use Windows 2000 Datacenter Server, you can have an MSCS cluster with two local servers and two remote servers providing server redundancy in each site and failover in the event of site destruction.
When the MSCS detects that the primary server has failed, it automatically starts the cluster resources that were running on the failed primary server on a standby node. MSCS then redirects all client traffic to the standby server. MSCS then redirects all client traffic to the standby server. MSCS does not, however, control the system area network (SAN) and cannot automatically promote the mirror of the primary SAN device. This task must be performed manually. With stretch clustering, committed transactions are always available through the standby server at a remote site after the primary server fails.
To implement a remote mirror or stretch-clustering solution, you need to have certain things in place:
You need people in the following roles:
Database administrator
SAN storage administrator
Network administrator
You need the following software:
Windows 2000 Server (remote mirroring solution only) or either Windows 2000 Advanced Server or Windows 2000 Datacenter Server (remote mirroring or stretch clustering)
Windows 2000 Service Pack 3 or later
SQL Server 2000 Standard Edition (remote mirroring only) or SQL Server Enterprise Edition (remote mirroring or stretch clustering)
You need two identical SAN units capable of synchronous mirroring.
You need systems and system components that are certified to work together and with the chosen Windows 2000 operating system edition. If you are using failover clustering at either the local site or the remote site, the systems and system components must be certified for failover clustering. Certification of systems and system components for failover clustering is more stringent than operating system certification. In addition, Windows 2000 Datacenter Server has more stringent certification requirements than does Windows 2000 Advanced Server. To find systems and system components that are certified by Microsoft, search the "Hardware Compatibility List" on the Microsoft Web site at https://support.microsoft.com/kb/131900.
For more detailed information about failover clustering, see "SQL Server 2000 Failover Clustering" on the Microsoft Web site at https://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/failclus.mspx.
Deploying a Remote Mirror
You will succeed at implementing a remote mirroring solution if you avoid shortcuts. This chapter assumes the following with respect to each participating server in the remote mirroring solution:
Windows 2000 Server, Windows 2000 Advanced Server, or Windows 2000 Datacenter Server is installed on a mirrored volume.
Windows 2000 Service Pack 3 or later is installed.
Note: Restoring disks to a cluster in the event of SAN failure requires at least Windows 2000 Service Pack 3.
The server is a member of the same Windows domain as each other server.
A SAN is connected.
All disks are formatted for New Technology File System (NTFS) and are not configured as dynamic disks.
At least one network interface card (NIC) is installed.
The server is connected to the corporate network and is connected to each participating server on a private network.
No version of SQL Server is installed.
Figure 5.1 shows the logical design of a remote mirror solution, with a single server in each site. The steps required to build this high-availability solution appear later in this topic.
You could deploy a SQL Server onto an independent MSCS cluster within each site to provide redundancy within the site and to use a remote mirror to protect against site-level disasters.
Figure 5.1: Logical design diagram of a remote mirror solution
Each site in the remote mirror solution runs Windows 2000 Server, Windows 2000 Advanced Server, or Windows 2000 Datacenter Server. If an independent MSCS cluster is deployed within each site, you must use either Windows 2000 Advanced Server or Windows 2000 Datacenter Server.
A SAN is used in each site. Each SAN is connected with fibre (if the sites are separated by less than 100 kilometers (km)) or with conventional Internet Protocol (IP) circuits with fibre channel to IP gateways. At any time, one of the SANs is active, updating its own local store and forwarding updates to the other SAN (the mirror). Only the server (or cluster) in the active site is active.
For distances over 100 km, there are two fundamental problems. First, it is frequently impossible to assemble a complete, dedicated fibre circuit between the sites separated by more than 100 km. Second, because of the speed of light, the signal on the fibre takes too long to travel between sites to be used for control signaling between the SAN devices.
To overcome the lack of fibre connectivity between the two sites, you can install fibre-channel-to-IP (FC/IP) converters at both and use conventional wide area network (WAN) facilities to link the sites.
The delays introduced by using a relatively slow connection (compared to local fibre) require you to operate the SAN mirror in asynchronous mode. This means that the primary site will receive confirmation of a write before the mirror has written — or possibly even received — the data. This creates a very small window of time in which a confirmed transaction may not be committed at the secondary site. If a site failure occurs within this short window, the transaction caught in this window will not appear when the secondary site is brought online. SQL Server manages the transaction log well, however, and it is highly unlikely that a torn page will result from a primary SAN failure, even when it is operating in asynchronous mode.
The following procedure describes generalized steps for implementing this logical design.
To implement a remote mirror solution with SQL Server
Set up the disk.
Configure the SAN to allow only the SAN to connect to the server in the primary site. You must limit access to the store to prevent the disk corruption that would occur if both servers connect concurrently. In a stretch cluster, the Windows Clustering Service performs this function. If the sites have substantial latency between them, configure the SAN mirror to execute the mirroring process in asynchronous mode for better performance. Running the SAN mirror in synchronous mode ensures that all transactions are written to the primary SAN as well as to the remote SAN mirror before SQL Server receives notification that a successful write occurred. Running the SAN mirror in asynchronous mode does not ensure that all transactions are written to the remote SAN mirror before SQL Server is notified that a successful write occurred, only that a successful write occurred in the primary SAN. As a result, running in asynchronous mode exposes the data center to a slightly higher risk that a committed transaction that is successfully written to the primary SAN is not successfully written to the remote SAN mirror if the primary site fails.
Note: For more information about setting up a SAN, see "Configuring Storage Devices" later in this chapter.
Install SQL Server.
In each site, install SQL Server onto a single computer or into an MSCS cluster. Do not install your production database yet.
Install the production database in the active site.
Install your production database or databases onto the drives in the SAN that are being mirrored. The SAN mirror concurrently writes the .mdf and .ldf files for the production database to the SAN in each site.
Initiate a SAN role reversal.
Shut down the servers in each site, initiate a role version between the SANs according to your vendors procedure, and then start the server or servers in the newly active site. Do not restart the servers in the primary site.
Configure the production database in the standby site.
Using SQL Enterprise Manager in the newly activated site, connect to the SQL Server instance that will manage the production database. Attach the production database on the local SAN, and then shut down the server or servers.
Initiate a SAN role reversal.
Shut down the servers in each site, initiate a role version between the SANs according to your vendors procedure, and then start the server or servers in both sites.
Failing Over to a Remote Mirror
When the primary site goes down, you must activate the SAN at the alternate site, bring SQL Server online, and redirect client traffic to the new site. You must also ensure that logins, jobs, alerts, operators, and Data Transformation Services (DTS) packages are synchronized between the primary site and the standby site before you bring the standby SQL Server online.
To reverse the roles of the sites
Shut down the primary site.
Reverse the roles of the SANs according to your vendors procedure.
Start SQL Server in the standby site.
Ensure that logins, jobs, alerts, operators, and DTS packages are synchronized between the primary site and the standby site. For more information, see Chapter 3, "Implementing Transactional Replication."
Redirect the client network traffic to the standby site. For more information on redirecting client network traffic, see "Redirecting Client Network Traffic to a Promoted Secondary Server," in Chapter 5 of the Planning Guide.
Deploying Stretch Clustering
You will succeed at implementing stretch clustering if you avoid shortcuts and parallel installations. If the setup program fails or you cancel the setup program for any reason, completely remove the previous setup before attempting another. Remove any Domain Name System (DNS) entries relating to the failed setup. Also, verify that the service names and IP addresses you plan to use for MSCS and SQL Server are unused immediately before you install MSCS and SQL Server.
This chapter assumes the following with respect to each participating server in the failover clustering solution:
Windows 2000 Advanced Server or Windows 2000 Datacenter Server is installed on a mirrored volume.
Windows 2000 Service Pack 3 or later is installed.
Note: Restoring disks to a cluster in the event of SAN failure requires at least Windows 2000 Service Pack 3.
MSCS has not been installed.
The server is a member of the same Windows domain as each other server.
A SAN is connected.
All disks are formatted for NTFS and are not configured as dynamic disks.
At least two network interface cards (NICs) are installed.
The server is connected to the corporate network and is connected to each participating server on a private network.
No version of SQL Server is installed.
The logical design of the stretch cluster built in this chapter is described later in this chapter, followed by the steps required to build this high-availability solution.
Understanding the Logical Design
This chapter describes how to build a two-node, three-node, or four-node stretch cluster. Figure 5.2 shows the logical design of a four-node stretch cluster.
Figure 5.2: Logical design diagram of a four-node cluster using a SAN and a SAN mirror
Each node in the Windows 2000 failover cluster runs Windows 2000 Advanced Server or Windows 2000 Datacenter Server. MSCS maintains a heartbeat between the nodes to verify that cluster resources are still alive. A dedicated network is maintained between the sites because the heartbeat NICs must all be on the same subnet. When you have geographically separate sites, you must use a private network or virtual local area network (VLAN) to link the two sites so that you can deploy the same subnet at both sites.
One or more SQL Server virtual servers run within the Windows 2000 failover cluster. The active node for the virtual server exclusively accesses and maintains the data and the log files for that virtual server.
The SAN is connected to each node by using fibre, and the distance between the sites cannot exceed 100 km. Increasing the distance between the sites increases the level of disaster protection, but at the expense of the speed and reliability of the connection.
Within the SAN, the SAN software maintains local mirrored copies of the data and log files as well as the temporary and backup files. The SAN software simultaneously sends new data to the SAN mirror at the remote site. Torn pages in the SQL Server database are a risk when deploying a stretch cluster. To mitigate the risk of a SAN causing torn pages, run the SAN in synchronous mode to ensure that writes complete in both locations before SQL Server receives acknowledgement of a successful write.
In a multiple instance cluster, each virtual SQL Server requires a separate group of disks in the SAN for the data, log, temporary, and backup files. Each virtual SQL Server requires exclusive control of its disks. MSCS assigns control to the active node for a virtual SQL Server.
The generalized steps required to implement this logical design are below, followed by the steps in detail.
To implement a failover cluster with SQL Server
Create and configure domain user accounts, determine unique server and instance names, and obtain an allocation of corporate IP addresses.
Configure the network.
Configure the storage devices.
Install and configure MSCS.
Upgrade MS DTC.
Install and configure a SQL Server 2000 virtual server instance.
Apply the most recent SQL Server 2000 service pack.
Perform additional configuration tasks.
Configuring Accounts and Names and Obtaining IP Addresses
Before configuring a Windows 2000 failover cluster with SQL Server, you must create and/or configure domain user accounts for SQL Server, configure domain user account permissions for MSCS, obtain server names for MSCS and SQL Server, and allocate IP addresses for the heartbeat and the corporate network NICs. Use the following information to configure accounts and names, and obtain IP addresses.
Domain user accounts — Create and configure the following domain user accounts:
A domain user account to administer the failover cluster — Make this account a member of the Administrators local group on each node of the failover cluster.
A domain user account for the SQL Server and the SQL Server Agent services — This account does not need to be a member of the Administrators local group on any node of the failover cluster.
Server names — Have the IT department approve the following names:
A virtual cluster name — This name must be unique within the domain.
A virtual SQL Server name for each SQL Server instance — These names must be unique within the domain and on the local subnet if multiple domains reside together on the same subnet.
An instance name for each SQL Server instance — These names must be unique within the failover cluster.
New static IP addresses — Have the IT department allocate static IP addresses for each of the following:
A virtual cluster IP address
A virtual IP address for each SQL Server instance
An IP address for each corporate network NIC
These addresses must be unique on the corporate network and must be on the same subnet.
Note: In a cluster, each SQL Server instance must have its own virtual IP address, virtual server name, and SQL Server instance name. This enables a SQL Server instance to fail over to a secondary node without affecting other SQL Server instances on the same physical node.
Configuring the Network
Before you install MSCS, you must assign the allocated IP addresses for each corporate network NIC, including the default gateway and preferred DNS server for each NIC. You must also assign static IP addresses to the NICs in each server that will carry the heartbeat among the servers in the failover cluster.
To configure a NIC that connects to the corporate network
On the desktop, right-click My Network Places, and then click Properties.
In Network and Dial-up Connections, right-click the network connection for the NIC that connects to the corporate network, and then click Properties.
In Connection Properties for this corporate NIC, click Internet Protocol (TCP/IP), and then click Properties.
In Internet Protocol (TCP/IP) Properties, type the IP address allocated to this corporate NIC.
If your network uses subnetting, change the subnet mask only on the advice of a network specialist.
Type the IP address of the default gateway.
Click Use the Following DNS Server Addresses, and then type the DNS server address.
Click OK to save the new TCP/IP properties.
Click OK to save the new connection properties for this NIC.
In Network and Dial-up Connections, rename the connection to Corporate to clearly identify its function.
Note: Repeat this process if you use a second corporate NIC that provides redundant connectivity to the corporate network. Name this redundant corporate connection Corporate2.
To configure the NIC that connects to the private network
On the desktop, right-click My Network Places, and then click Properties.
In Network and Dial-up Connections, right-click the network connection for the NIC that connects to the private network, and then click Properties.
In Connection Properties for this private NIC, click Internet Protocol (TCP/IP), and then click Properties.
In Internet Protocol (TCP/IP) Properties, type an IP address for this private NIC.
You must use an IP address for each private NIC that is on the same subnet as the IP address you use for the other private NICs in the cluster. Use the automatically completed subnet mask. It is common to use an IP address of the form 10.1.0.x for the private segment of the network.
Click OK to save the new TCP/IP properties.
Click OK to save the new connection properties for this NIC.
In Network and Dial-up Connections, rename the connection to Heartbeat to clearly identify its function.
Note: Repeat this process if you use a second private NIC that provides a redundant heartbeat. Name this redundant private connection Heartbeat 2.
Configuring Storage Devices
Each vendor has different procedures for configuring its storage devices. Follow the procedures provided by the vendor and configure the SAN as follows:
Use the SAN for SQL Server only, not for Windows 2000.
Create a mirrored volume for the quorum.
Create three mirrored volumes for each SQL Server instance — one each for data files, transaction log files, and temporary files.
Configure the SAN in the remote site to mirror the SAN activity in the local site. Figure 5.2 illustrates a typical interconnection configuration for a two-node stretch cluster that provides redundant paths from both servers to both SANs. Because the configuration differs among vendors, discuss the most appropriate configuration for your environment with your storage vendor.
Before configuring the storage devices for MSCS to use, turn off all but one node to avoid configuration errors.
Preparing the Drives
To prepare the storage for use, write disk signatures, create partitions, and format drives. Initialize drives used in a cluster as basic drives.
To initialize a drive for use in a cluster
Right-click My Computer, and then click Manage.
Expand Storage, and then click Disk Management.
Right-click a drive that you want to initialize, and then click Write Signature.
In Write Signature, select the disk(s) to initialize, and then click OK.
To create a partition and format a drive
In Computer Management, right-click the drive that you want to partition, and then click Create Partition.
On the Welcome to the Create Partition Wizard page, click Next.
On the Select Partition Type page, click Primary Partition, and then click Next.
On the Specify Partition Size page, click Next to use all of the disk space on the drive for this partition.
On the Assign Drive Letter or Path page, assign a drive letter, and then click Next.
On the Format Partition page, click Format Partition.
In Formatting, set the formatting options as follows:
In File System to Use, specify NTFS.
In Allocation Unit Size, specify Default.
In Volume Label, specify a meaningful name.
Do not select the Enable File and Folder Encryption check boxes.
Click Next.
On the Completing the Create Partition Wizard page, click Finish.
To format a partitioned but unformatted drive
Right-click My Computer, and then click Manage.
Expand Storage, and then click Disk Management.
Right-click a drive that you want to format, and then click Format.
In Format Disk, click OK.
Note: Repeat these procedures for each drive.
Specifying Drive Letters
Specify drive letters for each drive in the cluster to ensure that they do not change dynamically based on the drives that are online. If a drive letter that SQL Server uses for a data or a log file changes after SQL Server is installed, SQL Server cannot attach adatabase when SQL Server starts.
Note: If you created a new partition in the preceding procedures and assigned a drive letter, you do not need to complete the following procedure.
To specify drive letters
Right-click My Computer, and then click Manage.
Expand Storage, and then click Disk Management.
Right-click a drive that the cluster uses, and then click Change Drive Letter and Paths.
In Change Drive Letter and Paths, click Edit.
In Edit Drive Letter or Path, assign a drive letter from the list of available drive letters, and then click OK.
Click Yes to confirm that you want to change the drive letter.
Repeat this process for each drive that the cluster uses.
When you add more nodes to the cluster, they inherit these drive letters from the cluster configuration information stored on the quorum drive.
Installing and Configuring MSCS
When you install and configure MSCS, set up only one node at a time. Turn off all other nodes that are connected to the shared storage device while you install MSCS on the first node. You risk disk configuration errors if you attempt to configure MSCS on the first node while other nodes are connected and running. After you install MSCS on the first node and the drives in the shared storage device are allocated, you can install MSCS on additional nodes without turning off nodes on which MSCS is running.
Note: If you are reinstalling MSCS or recovering from a failed MSCS installation, remove any entries in the DNS server that relate to the failed MSCS installation before you reinstall. Stray or duplicate IP addresses or names can prevent a clean installation.
Install MSCS on all nodes in the cluster. The first node of an MSCS installation requires more setup than an additional node in the same cluster.
To add MSCS on the first node
On the first server on which you are installing MSCS, click Start, point to Settings, and then click Control Panel.
In Control Panel, double-click Add/Remove Programs.
In Add/Remove Programs, click Add/Remove Windows Components.
On the Windows Components page, select the Cluster Service check box, and then click Next.
The Internet Information Services check box is automatically selected with some Internet Information Services (IIS) components.
If Terminal Services is installed, click Next on the Terminal Services Setup page to run Terminal Services in remote administration mode.
If Terminal Services is not installed, this page will not appear.
When prompted, insert your Windows 2000 Advanced Server or Windows 2000 Datacenter Server CD, and then click OK.
After the clustering components install, click Next on the Welcome to the Cluster Service Configuration Wizard page.
On the Hardware Configuration page, click I Understand to acknowledge that only certified configurations listed in the Cluster category of the HCL are supported by MSCS; then click Next.
To define the cluster
On the Create or Join a Cluster page, click The First Server in the Cluster, and then click Next.
If the Cluster Service Configuration Wizard does not detect disks that meet the criteria for clustering, you cannot continue.
On the Cluster Name page, type the name for the new cluster (up to 15 characters), and then click Next. Use a name that is unique on the network and within the Windows domain.
To specify the setup program administrator account
On the Select an Account page, type the domain user account, password, and domain name in the appropriate text boxes, and then click Next.
Use the domain user account you previously added to the Administrators local group on this node.
To define the cluster storage
On the Add or Remove Managed Disks page, specify the disks that you want to belong to this cluster, and then click Next.
By default, all disks that meet the criteria for clustering are selected.
Remove any disks that you do not want this cluster to use.
Do not remove any disks that you plan to use for any SQL Server instance on this cluster.
On the Cluster File Storage page, click the drive belonging to the cluster that you want to use as the quorum drive.
Select a mirrored drive with a minimum size of 500 megabytes (MB). If the quorum drive fails, the cluster fails. Do not use this drive for SQL Server.
To define the networks
In this step, you tell Cluster Services which NICs are to be used for internal cluster communication, such as the heartbeat, and which NICs are to be used for client access. The Network Connections page displays once for each NIC.
On the Network Connections page, select the Enable This Network for Cluster Use check box if this NIC is used in the cluster. Select the Internal Cluster Communications Only (Private Network) check box for each heartbeat NIC, or select the Client Access Only (Public Network) check box for each corporate NIC. Then click Next.
On the Cluster IP Address page, type the IP address in the IP Address text box, and confirm that the subnet mask is correct for your configuration.
If your network employs subnetting, change the subnet mask only on the advice of a network specialist.
Click the corporate network, and then click Next.
To finish the installation
Click Finish to apply the settings for the cluster, and start MSCS on this first node.
When the message informs you that MSCS has started, click OK.
Click Finish to close the Cluster Configuration Wizard.
Close Add/Remove Program and Control Panel.
Installing and Configuring MSCS on Additional Nodes in the Cluster
Installing MSCS on additional nodes in a cluster requires less setup than installing MSCS on the original node because additional nodes inherit many settings from the first node.
To install MSCS on an additional node in the cluster
On each additional server on which you want to install MSCS, click Start, point to Settings, and then click Control Panel.
In Control Panel, double-click Add/Remove Programs.
In Add/Remove Programs, click Add/Remove Windows Components.
On the Windows Components page, select the Cluster Server check box, and then click Next.
The Internet Information Services check box is automatically selected with some IIS components.
If Terminal Services is installed, click Next on the Terminal Services Setup page to run Terminal Services in remote administration mode.
If Terminal Services is not installed, this page will not appear.
When prompted, insert the Windows 2000 Advanced Server or Windows 2000 Datacenter Server CD, and then click OK.
After the clustering components install, click Next on the Welcome to the Cluster Service Configuration Wizard page.
On the Hardware Configuration page, click I Understand to acknowledge that only certified configurations listed in the Cluster category of the HCL are supported by MSCS; then click Next.
To join an existing cluster
On the Create or Join a Cluster page, click The Second or Next Node in the Cluster, and then click Next.
If the Cluster Service Configuration Wizard does not detect disks that meet the criteria for clustering, you cannot continue. Disks must be formatted for NTFS and must not be configured as dynamic disks.
On the Cluster Name page, type the name of the existing cluster that you want this node to join. Leave the Connect to cluster as check box cleared, and then click Next.
The Cluster Service Configuration Wizard verifies that a heartbeat is detected from the first node and then configures each NIC on the node as either a corporate NIC or a heartbeat NIC.
To specify the setup program administrator account
On the Select an Account page, type the domain user account, password, and domain name in the appropriate text boxes, and then click Next.
Use the domain user account you created earlier that is an administrator on each node.
To finish the installation
Click Finish to apply the settings for the cluster, and then start MSCS on this additional node.
When the message box informs you that MSCS has started, click OK.
Click Finish to close the Cluster Configuration Wizard.
Close Add/Remove Program and Control Panel.
Upgrading the MS DTC
You must upgrade MS DTC to work in a cluster before installing SQL Server in the cluster. Distributed queries, two-phase commits, and certain types of replication use DTC. Even if you do not currently use these features, upgrade DTC because you may need to use the features in the future. Upgrading now avoids errors later.
To upgrade DTC
At one node in the cluster, open a command prompt.
At the command prompt, type comclust, and then click Enter.
Close the command prompt.
Note: Repeat this procedure on all other nodes participating in the SQL Server virtual server.
Installing and Configuring SQL Server on the Cluster
When you install SQL Server on a cluster, the setup program creates a virtual server. A virtual server consists of a virtual name, an IP address, and a port. Clients connect to these virtual resources, and MSCS points to the actual resources on the node in the cluster that is currently active. The SQL Server setup program installs SQL Server program files on each node and SQL Server system databases on a cluster resource disk. To successfully install a virtual server, you must properly configure and start MSCS.
Note: If you are reinstalling SQL Server or recovering from a failed SQL Server installation, remove any entries in the DNS server that relate to the failed MSCS installation before you reinstall. Stray or duplicate IP addresses or names can prevent a clean installation.
Creating a Single Cluster Group
Before you install SQL Server, place the disk resources for the SQL Server instance into a single cluster group. The group should contain a mirrored volume for the data files, a volume for the log files, and a volume for the backup and temporary files. The setup program adds additional resources to this cluster group when the virtual server installs.
Note: Create a separate cluster group for each SQL Server instance you plan to install.
To create a single cluster group
Click Start, point to Administrative Tools, and then click Cluster Administrator.
In Cluster Administrator, expand Groups, and then locate the disk groups containing the disks that you plan to use for this SQL Server instance.
Initially, each disk (or set of mirrored disks) belonging to the cluster is in a numbered disk group, such as Disk Group 1, Disk Group 2, and so on. The quorum drive is in the Cluster Group.
Right-click one of the disk groups, and then click Rename.
Type a descriptive name for this resource group (such as VirtualServer1Group).
Click the Resource node to display the additional drives owned by the cluster.
Right-click a drive for SQL Server, point to Change Group, and then click the newly renamed cluster group.
Close Cluster Administrator.
Note: Repeat this procedure for each additional drive that this SQL Server instance will use.
Installing SQL Server
After creating the cluster group, you can begin installing SQL Server. Use the server name and IP address allocated by your IT department when you create the virtual server. Follow these procedures for each instance you install. Before you begin, create a folder on each clustered drive to easily identify all files belonging to SQL Server.
Note: You must install SQL Server from the server that owns the shared disks. To change the server that owns the shared disks, use Cluster Administrator.
To start the installation and define the virtual server name
Insert the SQL Server 2000 Enterprise Edition CD into the CD-ROM drive.
On the SQL Server 2000 Enterprise Edition page, click SQL Server Components, and then click Install Database Server.
On the Welcome page, click Next.
On the Computer Name page, type a virtual name, click Virtual Server, and then click Next.
On the Product Key page, enter the product key for SQL Server 2000 Enterprise Edition, and then click Next.
On the Software License Agreement page, click Yes to accept the terms of the License Agreement.
On the Virtual Server Information page, enter the IP address allocated for the virtual server, click the corporate network, click Add, and then click Next.
To define disk resources
On the Select the Cluster Disk Where the Data Files Will Be Placed page, select the cluster group (or a drive within that cluster group) for this SQL Server.
This step defines the cluster group that will contain the resources for the virtual SQL Server. The actual drive locations for executables and system databases are selected later. Do not select the cluster group containing the quorum drive.
On the Cluster Definition page, specify the nodes supporting the virtual server, and then click Next.
By default, all nodes in the cluster are selected. Remove any nodes that you do not want to support SQL Server.
On the Remote Information page, type the domain user administrator account, password, and domain name in the appropriate text boxes.
The setup program uses this user account to install the program files on each node. Use the administrator account you configured to administer the failover cluster. For more information, see "Configuring Accounts and Names and Obtaining IP Addresses" earlier in this chapter.
On the Instance Name page, click Next to install a default instance. To install a named instance, clear the Default Instance check box, type the name for the named instance, and then click Next.
On the Setup Type page, click the appropriate setup type for your installation.
Specify the location for the program files on each node.
The default location is the system volume. Change this default if appropriate, and ensure that sufficient space exists for these files. The setup program installs the program files into this location on each node.
Specify the location for the data files on the cluster.
The data files must be on one of the cluster disks and should not be on the quorum disk. The setup program installs the system databases on this drive.
If you have created a folder for this SQL Server instance on the data drive, browse to this folder, and then click Next.
To complete the installation
On the Services Accounts page, type the domain user account, password, and domain name in the appropriate text boxes, and then click Next.
The SQL Server and SQL Server Agent services use this account. Use the domain user account you configured for the SQL Server and SQL Server Agent services. For more information, see "Configuring Accounts and Names and Obtaining IP Addresses" earlier in this chapter.
On the Authentication Mode page, click the authentication mode for SQL Server, and then click Next.
Windows Authentication Mode is more secure than Mixed Mode. If you select Mixed Mode, you must provide a password for the sa account or agree to use a blank password. Do not use a blank password because it is highly insecure, even in a development environment, and can lead to poor coding practices.
On the Licensing Mode page, specify the appropriate licensing mode for your environment, and then click Next.
Click Finish to install SQL Server. When installation is complete, restart all nodes in the cluster.
Adding Cluster Disks as SQL Server Dependencies
To ensure that the failover process operates properly, manually add each disk on which SQL Server depends to the list of SQL Server dependencies. If a disk is not included in the SQL Server dependencies, SQL Server cannot use it after a failover. Add the mirrored volumes used for the transaction log files, backup files, and additional data files. This procedure requires all disk resources to be members of a single cluster group.
To add additional cluster disks as dependencies
Click Start, point to Programs, and then click Cluster Administrator.
In Cluster Administrator, click the cluster group containing the SQL Server virtual server.
Right-click SQL Server, and then click Take Offline.
SQL Server must be offline to change SQL Server dependencies.
Right-click SQL Server, and then click Properties.
On the Dependencies tab in SQL Server Properties, review the default resource dependencies.
The Network Name resource and at least one disk resource appear.
Click Modify.
In the Available Resources list, double-click each additional disk resource on which SQL Server depends, and then click OK.
Click OK to close SQL Server Properties.
Right-click group containing the SQL Server resources, and then click Bring Online.
Applying the Most Recent SQL Server Service Pack
Before using SQL Server, install the most recent SQL Server service pack.
To install the most recent SQL Server service pack
Log on to one of the nodes in the cluster by using a domain account that is a member of the Administrators local group on that node.
Insert the CD containing the service pack into the CD-ROM drive.
If it does not start automatically, navigate to the \X86\Setup directory, and double-click setupsql.exe.
On the Welcome page, click Next.
On the SQL Server Name page, type the name of the SQL Server virtual server, and then click Next.
On the Authentication Mode page, click Next to connect to SQL Server using Windows authentication.
On the Domain Administrator page, type a domain administrator name, password, and domain name, and then click Next.
The setup program must use a domain administrator account to install the service pack on all nodes of the virtual server.
After the service pack installs, click OK.
Click Yes, and then click Finish to restart the node.
Manually restart each additional node on which the service pack has been installed.
Back up the master and msdb databases.
They have been modified by the service pack installation.
Note: This procedure is for SQL Service 2000 Service Pack 2. Check the ReadMe file on the CD for later service pack procedures.
Performing Additional Configuration Tasks
After you have installed the virtual server, review these additional configuration options, and apply the settings that are appropriate for your environment. These options include the following:
Setting memory allocation between virtual servers
Setting new database properties
Setting the TCP/IP port that SQL Server uses
Setting advanced failover options
Setting SQL Server properties in the cluster
Use the following guidelines when performing these tasks.
Allocating Memory
If you are running multiple instances of SQL Server on separate nodes in the cluster, you must determine the total amount of physical memory required for each server, and determine the appropriate allocation of memory for each SQL Server instance. Performance of SQL Server is directly related to the amount of physical memory allocated to it by the operating system. SQL Server provides two modes for memory management: dynamic and fixed. Dynamic memory allocation lets SQL Server acquire and release memory from the operating system as needed. Dynamic memory allows you to set a maximum amount of physical memory that the operating system can allocate to SQL Server. This allows the remaining memory to be reserved for other purposes, such as the operating system and other SQL Server instances. Fixed memory allows you to set the amount of physical memory that SQL Server can use. This amount neither grows nor shrinks.
Configure each server with the same amount of physical memory, and then allocate memory to each SQL Server instance to ensure that sufficient memory is available to support the failover of an instance of SQL Server from another node. If a server does not use a lot of memory compared to the available memory on the server, specify dynamic memory and set the maximum amount of memory that the operating system can allocate. By doing so, you leave sufficient memory for the failover to occur quickly and successfully. If an instance uses all of the available physical memory, failover occurs more slowly and can fail if memory is not released quickly enough.
To determine the total amount of memory to install on each server, test your applications by setting the maximum amount of memory available to SQL Server, and evaluate the performance. Install enough memory on each server to enable all instances to operate on that server and to meet the minimum service level for each instance. Using dynamic memory allows the SQL Server instance on each node to use most of the memory on that node. If you use fixed memory, the additional memory is used only during a failover.
In a system under continuous heavy load, install sufficient memory to ensure that each SQL Server instance can meet its required minimum allocation. Use a fixed memory allocation to ensure that the SQL Server instance will not expand into memory reserved for the failover of other instances. To determine the appropriate setting, initiate a failover, and test the performance level of each virtual server. If the operating system is unstable after a failover, decrease the memory allocated to each instance to increase the memory available to the operating system.
To specify a fixed amount of memory for each instance, use SQL Server Enterprise Manager or the sp_configure system stored procedure. When determining the amount of memory to allocate to SQL Server, remember to allow sufficient memory for the operating system. With large databases, allow 2 gigabytes (GB) of memory for the operating system.
If the amount of memory installed on each node is greater than 4 GB, add the /PAE switch to the Boot.ini file, and enable Address Windowing Extensions (AWE) in SQL Server. For more information, see article 268363, "Intel Physical Addressing Extensions (PAE) in Windows 2000," in the Microsoft Knowledge Base at https://support.microsoft.com/default.aspx?scid=kb;en-us;268363&sd=tech. Also see article 274750, "HOW TO: Configure Memory for More Than 2 GB in SQL Server," in the Microsoft Knowledge Base at https://support.microsoft.com/default.aspx?scid=kb;en-us;274750&sd=tech.
Setting New Database Properties
You can specify the database location for all new databases on the cluster disks you created for the data and log files.
To set new database properties
On one of the nodes, click Start, point to Programs, point to Microsoft SQL Server, and then click Enterprise Manager.
Expand Microsoft SQL Servers, and then expand SQL Server Group.
Right-click the SQL Server instance, and then click Properties.
On the Database Settings tab in the Default Data Directory text box, type or browse to the drive and folder you want for the data files.
In the Default Log Directory text box, type or browse to the drive and folder you want for the log files, and then click OK.
Close SQL Server Enterprise Manager.
Setting the TCP/IP Port SQL Server Uses
The default instance of SQL Server usually listens on port 1433. If you are configuring a multiple instance cluster, you must use a different port for each additional instance. Although SQL Server can set this port dynamically at startup, you might want to set each named instance to listen on a specific, unused port. Check with a network administrator for a recommendation. Specify a static port if you have to open ports on a firewall and do not want the SQL Server ports to change.
To specify the TCP/IP port for an instance
Click Start, point to Programs, point to Microsoft SQL Server, and then click Server Network Utility.
In SQL Server Network Utility, click the instance of SQL Server for which you want to set the port.
In the Enabled Protocols list, click TCP/IP, and then click Properties.
In the Default Port text box, type the appropriate port, and then click OK.
Click OK to close the SQL Server Network Utility dialog box.
Repeat these steps for each node participating in the SQL Server virtual server.
Setting Advanced Failover Options
Advanced failover options determine the failover behavior of SQL Server in the MSCS cluster.
To specify failover behavior
Click Start, point to Programs, and then click Cluster Administrator.
In Cluster Administrator, click the cluster group containing SQL Server.
Right-click SQL Server, and then click Properties.
On the Advanced tab, use the following information to configure the advanced cluster failover properties. The defaults should not be changed without careful consideration and testing:
Do Not Restart/Restart — Click Restart to fail over to a secondary node if a failure is detected. This is the default.
Affect the Group — Select this check box to have the failover of SQL Server cause a failover of all resources in the cluster group.
Threshold — Type the number of attempts that MSCS should make to restart SQL Server before failing over to a secondary node.
Period — Type the seconds between retry attempts.
This value is ignored if the threshold is set to 0.
Looks Alive Poll Interval — Click Use Value from Resource Type to use the SQL Server default of 5,000 milliseconds (5 seconds).
Is Alive Poll Interval — Click Use Value from Resource Type to use the SQL Server default of 60,000 milliseconds (60 seconds).
Pending Timeout — Specify the time SQL Server has in either the Offline Pending or the Online Pending state before MSCS places the resource in Offline or Failed status. The default is 180 seconds.
When you are finished, click OK to close SQL Server Properties in the Cluster Administrator.
Configuring Service Properties in a Cluster
In a cluster environment, you must use the correct management tool to configure SQL Server services, as follows:
Use Cluster Administrator to start and stop the services.
Use the Properties dialog box for the SQL Server instance in Enterprise Manager to set the service accounts and passwords for the services.
Failing to use the appropriate tools to configure SQL Server properties can cause SQL Server startup problems. In addition, in a cluster, do not set SQL Server Agent to restart automatically. If SQL Server Agent is set to restart automatically, it can conflict with the cluster services during a failover.
Failing Over to a Secondary Node in a Stretch Cluster
Failover is automatic when using MSCS and SQL Server. You can control how long MSCS takes to detect a failure, and you can take steps in your applications to reduce the time to bring up the standby service. Before making any changes to MSCS or to a client application, test the failover cluster to become familiar with how it fails over under different conditions.
Testing Failover
Use Cluster Administrator to initiate a planned failover. This simulates taking a node offline for planned maintenance.
To initiate a planned failover
Click Start, point to Programs, and then click Cluster Administrator.
Click the cluster group containing the SQL Server virtual server.
Notice the cluster that owns each resource in this cluster group.
Right-click the cluster group containing the SQL Server virtual server, and then click Move Group.
Watch each resource in the group switch from online to offline to online pending, and then to online in the new group. The entire process takes 1 to 2 minutes for a typical environment. Also notice that the resource owner in this cluster group changes to the secondary cluster.
Use one of the following options to simulate an unplanned failover.
To simulate an unplanned failover
Do one of the following:
Unplug the corporate network cable from the node that owns SQL Server.
Reboot the owning node.
Unplug the fibre connected to the active SAN.
Turn the power off on the owning node.
Use Cluster Administrator on the secondary node to watch each resource in the group switch from online to offline to online pending, and then to online.
The entire process takes less than 1 minute. Notice that the resource owner in this cluster group changes to the secondary cluster.
Minimizing Failover Time
If the time it takes SQL Server to fail over from one node to another needs to be decreased, use the following information to determine how to reduce time to availability.
Understanding Time to Availability
The time it takes from the occurrence of a fault to SQL Server being available on the standby node consists of the following:
The time to detect and confirm the fault, which you can control
The time to take the resources offline, which is controlled by MSCS and which you cannot control
The time to bring the resources online, which MSCS controls and which you can control by ensuring that sufficient SQL Server memory and processor resources are allocated
The time for SQL Server to initialize, which you can control by limiting long transactions)
SQL Server is usually available within 1 minute or less.
Controlling the Time Required to Detect and Confirm the Fault
The LooksAlive and IsAlive poll intervals for SQL Server determine the time that it takes MSCS to detect a failure. Smaller intervals create more overhead and increase the possibility of false failure detections — and therefore result in unnecessary failovers. The default for SQL Server the IsAlive interval is 60,000 milliseconds (60 seconds), and LooksAlive is 5,000 milliseconds (5 seconds). Change these values on the test platform first, and test them thoroughly under a realistic load. If you set the interval lower, be prepared to measure the impact on the application.
Controlling the Time Required to Bring the Resources Online
If the failover node has insufficient available memory at the time of failover, it will take longer to bring SQL Server online after a failover. Although this condition should not occur in a single-instance cluster, this can be a problem in a multiple-instance cluster. In normal operation, each node runs one instance of SQL Server, and each instance uses memory as needed. If a node does not have sufficient memory to run multiple instances simultaneously during a failover, use a lower maximum memory setting to reserve memory for failover.
If at least 30 MB of unallocated physical memory is available, the instance starts in dynamic memory allocation mode. For monitoring purposes, a low-memory condition creates an entry in the SQL Server event log warning you that the operating system did not fulfill the SQL Server memory request. No entry is made in the Application Log. This does not mean that you should attempt to operate SQL Server with only 30 MB of available physical memory.
Controlling the Time Required to Initialize
At startup, SQL Server must verify the condition of each database, rolling forward completed transactions and rolling back any uncommitted transactions that it finds. If client applications use very long transactions, the incomplete transactions can take a long time to roll back. Use a series of short transactions whenever possible. Determine each applications worst-case transaction size, and test the recovery time when this transaction is rolled back in a failover.
Performing a Manual SAN Failover
MSCS does not control the SAN devices, and the failover of the SAN devices is not automatic. Follow the procedures each storage vendor provides to reverse the role of the SAN devices when the active SAN fails. These procedures include additional steps to restore MSCS when the backup SAN is brought online.
When a failure is detected, the quorum drive is generally affected and MSCS stops.
To restart MSCS
Bring the backup SAN online.
Stop all nodes except one.
Stop and restart the remaining node
Open Computer Management, right-click Disk Manager, and then click Rescan.
Open Cluster Administrator, and verify that all disks are online.
To restart disks if MSCS cannot start all disks
Delete each disk resource from its cluster group.
Right-click the cluster group, click New, and then click Resource.
Assign the disk the same name, and click Physical Disk as the resource type.
Assign the correct physical disk to the resource name, and accept the defaults for the remaining settings.
Note: Repeat this procedure for each disk.
Reconnecting Clients
When the SQL Server service activates on the failover node, the application must reconnect to continue operating. How clients reconnect depends on whether an application maintains state or not, and on the ability of the application to reissue the steps in a transaction. Clients reconnect in the following ways:
In a Web application, application state is usually not maintained. When a user submits a request to the Web server, the application usually makes a new connection. In this environment, no additional retry logic is required. The Web application needs to gracefully handle the error response when the application cannot connect to SQL Server, however.
In a non-Web application, a connection is often held for a whole session. You must implement logic in the application that detects a broken connection and that attempts to reconnect at appropriate intervals (for example, 30 seconds). The application should be prepared to restart the transaction that was in progress when the connection broke. In some cases, you may have to rely on the application user to retry the submission.
Failing Back to the Original Node
In a single-instance cluster, little reason exists to fail back to the original node. If you have symmetrical servers that are dedicated to the same application, performance is identical on each. Failing back only causes more downtime.
In a multiple-instance configuration, after a failover, two instances of SQL Server run on the same node, so performance decreases for both. Choose a failback time to minimize the impact on clients.
You can set MSCS to automatically fail back to the original node; however, doing so incurs another service outage when the service is moved to the other node. In addition, the failed node may not be ready for the service to fail back, causing more downtime; therefore, enabling automatic failback is not always effective in maintaining high availability.
Optionally, you can set up automatic failback to occur during standard times set for when an outage is acceptable.
To configure automatic failback in a specific time slot
Click Start, point to Programs, and then click Cluster Administrator.
Right-click the group containing SQL Server.
Choose Properties.
On the Failback tab, set the time of day during which you want to allow failback.
Manual failback enables you to control when to stop the service and move it back to the original node when doing so is least disruptive to your clients.
To initiate manual failback
Click Start, point to Programs, and then click Cluster Administrator.
Right-click the group containing SQL Server.
Choose Move Group.
Restoring a Node After a Catastrophic Failure
If one of the cluster nodes suffers a catastrophic failure, you might have to rebuild the server. Use the steps in the following procedures to recover from a catastrophic failure.
To rebuild the server
On the surviving node, remove the failed node from the virtual server. If you have more than one virtual server on a node, doing so updates all instances.
On the surviving node, evict the failed cluster node.
Rebuild the failed node.
Install SQL Server.
To remove a node from a virtual server
Insert the SQL Server 2000 Enterprise Edition CD into the CD-ROM drive.
On the SQL Server 2000 Enterprise Edition page, click SQL Server Components, and then click Install Database Server.
On the Welcome page for the Microsoft SQL Server Installation Wizard, click Next.
On the Computer Name page, type the virtual server name, click Virtual Server, and then click Next.
Leave Advanced Options selected, and then click Next.
Leave Maintain a Virtual Server for Failover Clustering selected, and then click Next.
Note: If the server is unavailable, you will receive the following error message: "One or more nodes of Virtual Server are unavailable. Setup cannot modify the virtual server's IP address resource." If this error message appears, click OK.
On the Virtual Server Information page, click Next.
On the Cluster Management page, click the failed node, and then click Remove. Click Next.
On the Remote Information page, enter the login name and password of an account that is an administrator for all nodes in the cluster. Click Next.
In the message box, click Yes.
Click Finish.
To evict a node from a cluster
Click Start, point to Programs, and then click Cluster Administrator.
In Cluster Administrator, right-click the failed node, and then click Evict Node. Click OK.
Close Cluster Administrator.
To rebuild a node
Reinstall Windows, join the server to the domain, and apply the most recent Windows service pack on the new hardware.
Configure the domain user account to administer the cluster. For more information, see "Configuring Accounts and Names and Obtaining IP Addresses" earlier in this chapter.
Configure the corporate and private NICs.
Install and configure MSCS as an additional node. For more information, see "Installing and Configuring MSCS on Additional Nodes in the Cluster," earlier in this chapter.
To add the rebuilt node to the virtual server
Insert the SQL Server 2000 Enterprise Edition CD into the CD-ROM drive of any node in the cluster.
You can add a rebuilt node to the virtual server from any node in cluster.
On the SQL Server 2000 Enterprise Edition page, click SQL Server Components, and then click Install Database Server.
On the Welcome page for the Microsoft SQL Server Installation Wizard, click Next.
On the Computer Name page, type the virtual server name, click Virtual Server, and then click Next.
On the Installation Selection page, click Advanced Options, and then click Next.
Leave Maintain a Virtual Server for Failover Clustering selected, and then click Next.
On the Virtual Server Information page, click Next.
On the Cluster Management page, click the new node, click Add, and then click Next.
In the Username text box, specify a domain user account that is an administrator on each node participating in the virtual server configuration. Type the password in the Password text box, type the domain name in the Domain text box, and then click Next.
On the Services Accounts page, type the domain user account, password, and domain name in the appropriate text boxes, and then click Next.
Click Finish when setup completes. Repeat this process for each instance.
Note: After installing SQL Server 2000 and joining it to the virtual SQL Server, apply the appropriate SQL Server service pack to the rebuilt node. All nodes run at the same service pack level.
References
For more information about topics covered in this chapter, see the following:
Article 254321, "INF: Clustered SQL Server Do's, Don'ts, and Basic Warnings," in the Microsoft Knowledge Base at https://support.microsoft.com/default.aspx?scid=kb;en-us;254321&sd=tech
Article 259267, "Microsoft Cluster Service Installation Resources," in the Microsoft Knowledge Base at https://support.microsoft.com/default.aspx?scid=kb;en-us;259267&sd=tech
Article 239885, "INF: How to Change Service Accounts on a SQL Virtual Server," in the Microsoft Knowledge Base at https://support.microsoft.com/default.aspx?scid=kb;en-us;239885&sd=tech
Article 273673, "INF: SQL Virtual Server Client Connections Must Be Controlled by Clients," in the Microsoft Knowledge Base at https://support.microsoft.com/default.aspx?scid=kb;en-us;273673&sd=tech
"Windows Server System Reference Architecture" on the Microsoft TechNet Web site at https://www.microsoft.com/technet/solutionaccelerators/wssra/raguide/default.mspx
For more information on third-party vendor products used in testing, see the following:
Compaq servers and SAN devices
SANCastle Fibre Channel Gateway at https://sancastle.com/products/