I am experiencing an issue with a 2-node Windows Failover Cluster where automatic failover works perfectly during hard failures, but completely fails during a graceful OS shutdown or restart.
Here are the details of my environment and the specific symptoms:
Environment Details:
OS: Windows server 2025 Standard (Ver:24H2)
Cluster Setup: 2-Node Cluster with a Disk Witness
Workload: SQL Server Always On Availability Groups (AG)
Storage: Dell ME Series SAN via iSCSI
Multipathing: Dell specific DSM
The Issue:
The cluster successfully handles unexpected failures, but when I attempt a standard restart or shutdown of the active node, the cluster fails to move the resources to the secondary node. The disk resource seems to hang or fail during the termination process.
Testing Matrix:
Hard Power Pull (Active Node): Failover is SUCCESSFUL.
Network Disconnection (Active Node): Failover is SUCCESSFUL.
Manual Resource Move (via FCM): Move is SUCCESSFUL.
Graceful Restart / Shutdown (Active Node): Fails to move resources to the other node.
Error Logs:
In the Cluster Events / System Event logs, I am seeing the following error during the shutdown process:
"Cluster physical disk resource encountered attempting to terminate, error code 1168"
Troubleshooting Performed:
Validated the cluster configuration (Cluster Validation passes).
Verified that manual drain/pause of the node works correctly.
Checked that the Dell ME SAN firmware and drivers are up to date.