Backing up and restoring server clusters
Applies To: Windows Server 2003, Windows Server 2003 R2, Windows Server 2003 with SP1, Windows Server 2003 with SP2
Backing up and restoring server clusters
Performing regular backups of your server cluster is imperative for high availability. This topic explains how you can use the Backup or Recovery Wizard to back up cluster nodes, describes ten cluster failure scenarios, and offers data restore solutions for each scenario using the Backup or Restore Wizard and recovery utilities from the Microsoft Windows Server 2003 Resource Kit.
For more information on backup and restore procedures, see Backing up and restoring data.
Backing up cluster data
In a server cluster, there are four groups of data critical to the proper operation of the cluster; the disk signatures and partitions of the cluster disks, the cluster quorum data, the data on the cluster disks, and the data on the individual cluster nodes.
Cluster disk signatures and partitions
Cluster quorum data
Data on the cluster disks
Data on the individual cluster node
Cluster disk signatures and partitions
Before you begin to back up any data on the server cluster nodes, make sure you backup the cluster disk signatures and partitions using Automated System Recovery in the Backup Wizard. This step is necessary if you later need to restore the signature of the quorum disk, for example, if you experience a complete system failure, and the signature of the quorum disk has changed since you last backed up.
Note
- By default, Backup Operators do not have the user rights necessary to create an Automated System Recovery (ASR) backup on a cluster node. However, Backup Operators can perform this procedure if that group is added to the security descriptor for the Cluster service. You can do that using Cluster Administrator or cluster.exe. For more information, see Give a user permissions to administer a cluster and Cluster.
For information, see Back up cluster disk signatures and partition layouts.
Cluster quorum data
When you back up data on a server cluster node, make sure you also back up the cluster quorum. The cluster quorum is important because it contains the current cluster configuration, application registry checkpoints, and the cluster recovery log.
You can use the Backup Wizard to back up the cluster quorum data if you perform a System State backup from any node provided the Cluster service is running on that node.
For information, see Back up the cluster quorum.
Data on the cluster disks
To back up all cluster disks owned by a node, perform a full backup from that node.
You can also back up this data through a network connection to a hidden administrative file share. For example, you might use the New Resource Wizard to create FBackup$, GBackup$, and HBackup$ file shares for the root of drives F, G, and H, respectively. These shares would not appear in the browse list and could be configured to allow access only to members of the Backup Operators group.
For information on backing up data on the cluster disks, see Back up data on cluster nodes.
Important
- If a cluster disk owned by the node being backed up fails over to another node during the backup process, the backup set will not contain a full backup of that disk.
Important
- You can only backup a cluster disk on a local node. You cannot backup a cluster disk on a remote computer.
Data on the individual cluster nodes
After you back up the cluster quorum disk on one node, it is not necessary to back up the quorum on the remaining cluster nodes. However, you may want to back up the clustering software, cluster administrative software, system state, and application data on the remaining nodes.
Important
- If you back up the system state for a node, you will also automatically back up the quorum data as long as the Cluster service is running on that node.
For information on backing up data on individual cluster nodes, see Back up data on cluster nodes.
Cluster failure and restore scenarios
This section describes ten failure scenarios that will require restoring your cluster. The type of failure you experience determines the steps you must follow.
Scenario 1—Cluster Disk Data Loss
Scenario 2—Cluster Quorum Corruption
Scenario 3—Cluster Quorum Loses Checkpoints
Scenario 4—Cluster Disk Corruption or Failure
Scenario 5—Cluster Quorum Disk Failure
Scenario 6—Single Cluster Node Corruption or Failure
Scenario 7—Cluster Quorum Rollback
Scenario 8—Complete Cluster Failure
Scenario 9—Majority Node Set Cluster Failure
Scenario 10—Application Data Loss in a Server Cluster
Scenario 1—Cluster Disk Data Loss
If you have lost files and folders on one of your cluster disks, but not on the disk containing the cluster quorum, you can use the Backup or Restore Wizard to restore that data.
Important
- You must restore the cluster disk data from the node that owns the cluster disk.
For information, see Restore files from a file or a tape.
Scenario 2—Cluster Quorum Corruption
Symptom: The cluster nodes can boot up, but the Cluster service fails to start because the quorum resource cannot come online.
If this problem results from corrupted files on the quorum disk, try starting the Cluster service by opening a command prompt and typing net start clussvc /resetquorumlog. This creates a new quorum log file, using information stored in the cluster database on the local node. For additional information about recovering from cluster quorum corruption, see Recover from a corrupted quorum log or quorum disk. If the cluster quorum disk needs to be replaced, see Scenario 5, below. For a majority node set cluster, see Scenario 9, below.
Scenario 3—Cluster Quorum Loses Application Checkpoints
Symptom: Some resources fail to come online and the application checkpoints are out of date.
If you have recovered from quorum corruption by creating a new quorum log as described in Scenario 2 above, you may need to restore the matching checkpoints before the quorum resource can come back online.
Using the Microsoft Windows Server 2003 Resource Kit tools
- Use the ClusterRecovery utility. For more information, see the Microsoft Windows Server 2003 Resource Kit and the Help for the ClusterRecovery utility.
Using Windows Server 2003 family tools
Locate backup sets of the cluster quorum disk and system disk. The cluster quorum disk backup will contain other system state data. These backups need to be a matched set, that is make sure they were taken at the same time.
Restore data on the system disk. For information, see Restore files from a file or a tape.
Restore the cluster quorum. For information, see Restore the contents of a cluster quorum disk for all nodes in a cluster.
Scenario 4—Cluster Disk Corruption or Failure
Symptom: A cluster disk cannot come online. Resources that depend on that cluster disk will not be able to come online.
First, see if you can run a diagnostic utility from the disk manufacturer to determine the condition of the disk. If the cluster disk is corrupted or the disk hardware fails, you can restore the disk more quickly by using utilities in the Microsoft Windows Server 2003 Resource Kit. If you do not have access to these tools, you can still restore your cluster disk using the Backup and Recovery utilities included with Windows Server 2003 family operating systems.
Using the Microsoft Windows Server 2003 Resource Kit tools
Use the ClusterRecovery utility. For more information, see the Microsoft Windows Server 2003 Resource Kit and the Help for the ClusterRecovery utility.
Use NTBackup along with the Confdisk utility (from the Microsoft Windows Server 2003 Resource Kit) to restore the data on the cluster disk. For more information, see the Microsoft Windows Server 2003 Resource Kit.
Using Windows Server 2003 tools
If necessary, replace the cluster disk. For information, see Install local storage buses and devices.
Stop the Cluster service on all nodes of the cluster.
Locate the data backup set for the node that owns the cluster disk. Also, locate the Automated System Recovery backup set for that node, if it is available. Perform an Automated System Recovery restore on a node. Use ASR as a last resort in system recovery, only after you have exhausted other options. For more information, see Restore a damaged cluster node using Automated System Recovery.
After the restored node comes back online, restart the Cluster service on the remaining nodes.
Scenario 5—Cluster Quorum Disk Failure
Symptom: The cluster nodes can boot up, but the Cluster service fails to start because the quorum resource cannot come online. Entries in the Event Log indicate hardware failures.
First, try starting the Cluster service by opening a command prompt and typing net start clussvc /fixquorum. This starts the Cluster service with all resources offline, including the quorum resource. Then you can try switching to a new quorum resource, with or without using the Clusterrecovery utility in the Windows Server 2003 Resource Kit. For more information, see Fixquorum command.
If the cluster quorum disk (the disk containing the quorum resource) fails, you can replace it more quickly by using utilities in the Microsoft Windows Server 2003 Resource Kit. If you do not have access to these tools, you can still replace your cluster quorum disk using the backup and restore utilities shipped with Windows Server 2003 family operating systems.
Using the Microsoft Windows Server 2003 Resource Kit tools
Use NTBackup along with the Confdisk utility (from the Microsoft Windows Server 2003 Resource Kit) to restore the data on the cluster disk.
Use the ClusterRecovery utility. For more information, see the Microsoft Windows Server 2003 Resource Kit and the Help for the ClusterRecovery utility.
Using Windows Server 2003 family tools
If necessary, replace the cluster quorum disk. For information, see Install local storage buses and devices.
Locate the data backup set for the cluster quorum. Also, locate the Automated System Recovery backup set for the node you used to create the cluster quorum backup set, if it is available. Perform an Automated System Recovery restore on any node in the cluster. Use ASRas a last resort in system recovery, only after you have exhausted other options. For more information, see Restore a damaged cluster node using Automated System Recovery.
Restore data on the node. For information, see Restore files from a file or a tape.
Scenario 6—Single Cluster Node Corruption or Failure
Symptom: The node cannot join the cluster.
If the Event Log indicates that the cluster database on the local node is corrupted, you can perform a System State restore on that node to replace the local cluster database. For information, see Restore the cluster database on a local node. Alternatively, you can copy the latest checkpoint file (CHKxxx.TMP) from the quorum disk to the %systemroot%\Cluster\ directory, rename it as file CLUSDB, and restart the Cluster service on that node.
If a single node fails in the cluster due to system disk or other hardware failure, follow these steps to rebuild the node and rejoin the cluster:
After verifying that all cluster resource groups have been successfully moved to other nodes, repair or replace the failed hardware. For information, see To Move a group to another node and To Manage Cluster Hardware.
Perform an Automated System Recovery restore on the failed node to rebuild the node. For information, see To Restore a damaged cluster node using Automated System Recovery.
If you have other files or application data for that node backed up on on tape or other backup medium, you can restore that now. For information, see To Restore files from a file or a tape and Scenario 8 below.
For each cluster group and resource, verify that the newly recovered node appears as a possible owner in Cluster Administrator, then move a resource group to the newly recovered node and verify that the move is successful. For information, see To test Test whether group resources can fail over.
Note
- If you do not have an Automated System Recovery backup of the node, you can evict that node and add a new node to the cluster. For more information, see To Evict a node from the clusterand To Add additional nodes to the cluster.
Scenario 7—Cluster Quorum Rollback
If recent changes to your cluster have resulted in the cluster not functioning as expected, you can use the Backup or Restore Wizard to roll back your cluster to a previous configuration. For example, if a number of resources have mistakenly been deleted from the cluster configuration, you can roll it back, using a backup that contains those resources.
For information, see Restore the contents of a cluster quorum disk for all nodes in a cluster.
Scenario 8—Complete Cluster Failure
Symptom: None of the nodes can boot up.
If all nodes fail in a cluster and the quorum disk cannot be repaired, follow these steps:
Use Automated System Recovery on one node in the original cluster, choosing a node that was backed up recently and that was active in the cluster at the time it was backed up. This restores the disk signatures, the partition layout of the cluster disks (quorum and nonquorum), and the cluster configuration data. Do not start other nodes until the first node is restored. For more information, see To Restore a damaged cluster node using Automated System Recovery.
Restore other nodes. For more information, see Restore a damaged cluster node using Automated System Recovery.
Restore your applications and application data from backup data sets.
Important
- If you do not have an Automated System Recovery backup of each node, you cannot restore the cluster. Instead, you must recreate your cluster from scratch. For more information, see Checklist: Planning and creating a server cluster.
Scenario 9—Majority Node Set Cluster Failure
The methods for restoring a majority node set cluster are the same as for restoring other clusters. However, in a majority node set cluster, if some of the nodes fail, and the cluster loses quorum, you can force the remaining nodes to form a quorum and restart the cluster. For more information, see To Force quorum in a majority node set server cluster.
Note
- On a majority node set cluster, the cluster database is not stored on a cluster disk central to all nodes, but is instead stored locally on each node at %systemroot%\Cluster\MNS.%ResourceGUID%$\%ResourceGUID%$\MSCS\.
Scenario 10—Application Data Loss in a Server Cluster
When restoring application data in a server cluster, follow the instructions provided in the documentation that shipped with your application.
Important
- If you are backing up Microsoft Exchange Server, a newer version of NTBackup.exe may be available from the Exchange Server section of the Microsoft Web site. Otherwise, you can use the version of NTBackup.exe that is included with Windows Server 2003 family operating systems.