Cluster disk and drive connection problems

Applies To: Windows Server 2003, Windows Server 2003 R2, Windows Server 2003 with SP1, Windows Server 2003 with SP2

Cluster disk and drive connection problems

What problem are you having?

  • When the physical disks are not powering up or spinning, Cluster service cannot initialize any quorum resources.

  • The Cluster service fails to start and generates an Event ID 1034 in the Event log after you replace a failed hard disk, or change drives for the quorum resource.

  • Drive on the shared storage bus is not recognized.

  • Configuration cannot be accessed through Disk Management.

  • SCSI or fibre channel storage devices do not respond.

  • Disk groups do not move or stay online pending after move.

  • Disks do not come online or Cluster service does not start when a node is turned off.

  • Drives do not fail over or come online.

  • Mounted drives disappear, do not fail over, or do not come online.

  • The cluster quorum disk (containing the quorum resource) becomes disconnected from all nodes in a cluster and you are later unable to add the nodes back to the cluster.

When the physical disks are not powering up or spinning, Cluster service cannot initialize any quorum resources.

Cause:  Cables are not correctly connected, or the physical disks are not configured to spin when they receive power.

Solution:  After checking that the cables are correctly connected, check that the physical disks are configured to spin when they receive power.

The Cluster service fails to start and generates an Event ID 1034 in the Event log after you replace a failed hard disk, or change drives for the quorum resource.

Cause:  If a hard disk is replaced, or the bus is reenumerated, the Cluster service may not find the expected disk signatures, and consequently may fail to mount the disk.

Solution:  Write down the expected signature from the Description section of the Event ID 1034 error message. Then follow these steps:

  1. Backup the server cluster.

  2. Set the Cluster service to start manually on all nodes, and then turn off all but one node.

  3. If necessary, partition the new disk and assign a drive letter.

  4. Use the confdisk.exe tool (available in the Microsoft Windows Server 2003 Resource Kit) to write that signature to the disk.

  5. Start the Cluster service and bring the disk online

  6. If necessary, restore the cluster configuration information.

  7. Turn on each node, one at a time.

For information on replacing disks in a server cluster, see Knowledge Base article Q305793, "How to Replace a Disk with Windows 2000 or Windows Server 2003 family Clusters" in the Microsoft Knowledge Base.

Drive on the shared storage bus is not recognized.

Cause:  Scanning for storage devices is not disabled on each controller on the shared storage bus.

Solution:  Verify that scanning for storage devices is disabled on each controller on the shared storage bus.

Many times, the second computer you turn on does not recognize the shared storage bus during the BIOS scan if the first computer is running. This situation can manifest itself in a "Device not ready" error being generated by the controller, or in substantial delays during startup.

To correct this, disable the option to scan for devices on the shared controller.

Note

  • This symptom can manifest itself as one of several errors, depending on the attached controller. It is normally accompanied with a one- to two-minute start delay and an error indicating the failure of some device.
Configuration cannot be accessed through Disk Management.

Under normal cluster operations, the node that owns a quorum resource locks the drive storing the quorum resource, preventing the other nodes from using the device. If you find that the cluster node that owns a quorum resource cannot access configuration information through Disk Management, the source of the problem and the solution might be one of the following:

Cause:  A device does not have physical connectivity and power.

Solution:  Reseat controller cards, reseat cables, and make sure the drive spins up when you start.

Cause:  You attached the cluster storage device to all nodes and started all the nodes before installing the Cluster service on any node.

Solution:  After you attach all servers to the cluster drives, you must install the Cluster service on one node before starting all the nodes. Attaching the drive to all the nodes before you have the cluster installed can corrupt the file system on the disk resources on the shared storage bus.

SCSI or fibre channel storage devices do not respond.

Cause:  The SCSI bus is not properly terminated.

Solution:  Make sure that the SCSI bus is not terminated early and that the SCSI bus is terminated at both ends.

Cause:  The SCSI or fibre channel cable is longer than the specification allows.

Solution:  Make sure that the SCSI or fibre channel cable is not longer than the cable specification allows.

Cause:  The SCSI or fibre channel cable is damaged.

Solution:  Make sure that the SCSI or fibre channel cable is not damaged. (For example, check for bent pins and loose connectors on the cable and replace it if necessary.)

Disk groups do not move or stay online pending after move.

Cause:  Cables are damaged or not properly installed.

Solution:  Check for bent pins on cables and make sure that all cables are firmly anchored to the chassis of the server and drive cabinet.

Disks do not come online or Cluster service does not start when a node is turned off.

Cause:  If the quorum log is corrupted, the Cluster service cannot start.

Solution:  If you suspect the quorum resource is corrupted, see the information on the problem "Quorum log becomes corrupted" in Node-to-node connectivity problems

Drives do not fail over or come online.

Cause:  The drive is not on a shared storage bus.

Solution:  If drives on the shared storage bus do not fail over or come online, make sure the disk is on a shared storage bus, not on a nonsystem bus.

Cause:  If you have more than one local storage bus, some drives in Shared cluster disks will not be on a shared storage bus.

Solution:  If you do not remove these drives from Shared cluster disks, the drives do not fail over, even though you can configure them as resources.

Shared cluster disks is in the Cluster Application Wizard.

Mounted drives disappear, do not fail over, or do not come online.

Cause:  The clustered mounted drive was not configured correctly.

Solution:  Look at the Cluster service errors in the Event Log (ClusSvc under the Source column). You need to recreate or reconfigure the clustered mounted drive if the description of any Cluster service error is similar to the following:

Cluster disk resource "disk resource": Mount point "mount drive" for target volume "target volume" is not acceptable for a clustered disk because reason. This mount point will not be maintained by the disk resource.

When recreating or reconfiguring the mounted drive(s), follow these guidelines:

  • Make sure that you create unique mounted drives so that they do not conflict with existing local drives on any node in the cluster.

  • Do not create mounted drives between disks on the cluster storage device (cluster disks) and local disks.

  • Do not create a mounted drive from a clustered disk to the cluster disk that contains the quorum resource (the quorum disk). You can, however, create a mounted drive from the quorum disk to a clustered disk.

  • Mounted drives from one cluster disk to another must be in the same cluster resource group, and must be dependent on the root disk.

For more information on viewing the Event Log, see View Event Logs.

For more information on creating mounted drives in a server cluster, see Add drives on the shared storage bus.

The cluster quorum disk (containing the quorum resource) becomes disconnected from all nodes in a cluster and you are later unable to add the nodes back to the cluster.

See the solution for the identically titled problem in Node-to-node connectivity problems.

For information about how to obtain product support, see Technical support options.

It is important that you correctly configure the storage topology (for example, SCSI, Fibre Channel, Storage Area Networks) and the storage interconnects (for example, multiple paths) used in your server cluster. Before deploying your server cluster, contact your hardware vendors to ensure that your particular cluster storage configuration is supported at the hardware level. For descriptions of supported cluster storage topologies, best practices for deploying and managing cluster storage, and a list of cluster storage-related Knowledge Base articles, see the cluster storage information at the Microsoft Web site.