Azure Storage Disks Disconnected from Virtual Machine When Running Test-Cluster Command

Jack 0 Reputation points
2025-03-12T13:00:32.1733333+00:00

Description:

I am experiencing unexpected behaviour when running the Test-Cluster command in an Azure-based Windows Server 2016 environment. When executing the command, I observed the following issue:

Error Event 157: "Disk X has been surprise removed."

The affected disk automatically reconnected after the test completed.

This behaviour did not occur on all nodes—only 2 out of 3 servers had disks disconnect.

Environment Details:

  • OS: Windows Server 2016
  • Disks: Azure Managed Disks (not added to Failover Cluster)
  • Cluster Size: 3-node cluster
  • Previous Behaviour: The Test-Cluster command has been executed in the same environment before without this issue.
  • Comparison: I have also run Test-Cluster in an on-premise setup and in AWS environments, but this behaviour was not observed.

Questions:

Any guidance or official documentation reference on this issue would be greatly appreciated.

Azure Virtual Machines
Azure Virtual Machines
An Azure service that is used to provision Windows and Linux virtual machines.
9,035 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Marcin Policht 50,570 Reputation points MVP Volunteer Moderator
    2025-03-12T13:13:27.4266667+00:00

    Refer to https://learn.microsoft.com/en-us/troubleshoot/windows-server/high-availability/validate-hardware-failover-cluster#considerations-when-you-include-storage-tests

    Considerations when you include storage tests

    The Validate a Configuration Wizard runs all storage tests by default. All or some of the storage tests can be unselected by choosing the Run only tests I select option on the Testing Options page of the wizard. When storage tests are included, the Review Storage Status page of the wizard shows all of the disks and storage pools in the cluster and allows you to select the disks and storage pools to include in the storage tests. Storage tests require that a disk or storage pool that is assigned to a clustered role or Cluster Shared Volume be taken offline first. Therefore, anything using the storage won't have access to it during the storage tests. We recommend that any clustered role or other process that might be using the disk or storage pool is taken offline before the storage is included in the storage validation tests.

    The Test-Cluster Windows PowerShell cmdlet runs all storage tests by default. You can specify the -Include parameter to run only storage tests or a specific storage test. You can use the -Disk and -Pool parameters to enable targeted storage validation. The -Disk parameter or the -Pool parameter allows specifying, respectively, one or more disks or storage pools to be included in the storage validation testing. If the -Disk parameter or the -Pool parameter is used to specify a disk or storage pool that is currently online and is assigned to a clustered role or Cluster Shared Volume, you must also specify the -Force parameter to validate the corresponding disk or storage pool; otherwise, you must ensure that the clustered disk or storage pool is offline before running the tests. If the -Disk parameter or the -Pool parameter isn't specified, Test-Cluster runs storage tests on all disks and storage pools that are available for cluster use or that are in the cluster resource offline or failed state. We recommend that any clustered role or other process that might be using the disk or storage pool is taken offline before the storage is included in the validation testing.


    If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.

    hth

    Marcin


  2. Jack 0 Reputation points
    2025-03-20T12:21:46.1333333+00:00

    Everything points to ensuring the Storage tests are not run as part of cluster validation when using Azure Managed Disks. While confirming this resolves the issue will be tricky due to me having run cluster validation with storage checks in Azure environments for several years without issue. I will carry out some further testing with and without storage tests to try and get some peace of mind.

    Thanks for the responses on the question. I will update this thread if I find anything further from testing.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.