VMM2016/Hyper-V, IncompleteVMConfig VMM state - potential vmwp.exe file handle refresh issue

Tony Gedge 241 Reputation points

I have a VMM 2016 managed Hyper-V environment (multiple failover clusters on Windows Server 2016). The VM configuration and storage is located on SMB file shares held by an external NetApp NAS.

Periodically the VMM status goes to IncompleteVMConfig state on some VMs. The number of VMs affected is variable, and I have not determined a pattern to the behaviour nor correlation with any other event. When the VM is in the IncompleteVMConfig state, a refresh of the VM generates an error of the form:

Error (2912)
An internal error has occurred trying to contact the 'HYPERV_HOST' server: NO_PARAM: NO_PARAM.

WinRM: URL: [http://HYPERV_HOST:5985], Verb: [GET], Resource: [http://schemas.microsoft.com/wbem/wsman/1/wmi/root/scvmm/FileInformation?Filename=\\FILE_SHARE\VM_FOLDER\VHD.vhdx]

The requested resource is in use (0x800700AA)

A temporary workaround to recover is to live migrate the VM to another host, then refresh the VM in VMM.

This error looks very similar to that described in this article which refers to this article which states:

In a Hyper-V Failover Cluster, a virtual machine can have its configuration information stored on cluster shared storage (a Physical disk resource or a Cluster Shared Volume (CSV). If the physical disk resource or the Cluster Shared Volume (CSV) goes Offline or Fails, the VM placed in a critical state. Once the storage is reconnected, the VM should no longer be in a critical state. However, virtual machine worker process (vmwp.exe) does not refresh all of its file handles.

The suggested recovery process is the same:

Execute a Live Migration of the virtual machine(s) experiencing the problem to another node in the cluster. This will re-establish the connections to the required files on the storage.

The KB2504962 article referenced above relates to Windows Server 2008 R2 Service Pack 1, which is quite old and I would expect this issue to be rectified for Windows Server 2016/VMM 2016, though I guess all things are possible!

Has anyone else seen this behaviour on Windows Server 2016/VMM 2016?

Does anyone have an idea of how I can determine whether this is the same kind of issue referenced in the KB article, that is file handles not refreshed in the VMWP process?

System Center Virtual Machine Manager
{count} votes

1 answer

Sort by: Most helpful
  1. Tony Gedge 241 Reputation points

    Well this seems to be a little more complex than I first thought and appears to relate to a behaviour triggered in Hyper-V and not VMM specifically.

    After much digging through the VMM debug log, I can see that a WMI query to Msvm_ImageManagementService.GetVirtualHardDiskState is failing due to error 2147942570 (which is 0x800700AA). I can reproduce this manually using WMI on any VHD attached to the affected VM. Queriying the VHDs attached to other VMs on the same hypervisor works fine.

    This WMI provider is implemented by the Hyper-V management process vmms. Restarting VMMS has no effect. Live migrating or restarting the VM fixes the issue.

    Running sysinternals ProcMon tool on VMMS shows that querying for VHDs attached to other VMs on the same hypervisor host works, but for the affected VM a SHARING_VIOLATION is being generated. This matches with the error code being produced at the WMI level. The requested open was for a generic shared read.

    As restarting VMMS has no effect, but live migrating/restarting the VM does, I would guess that another process is the culprit. According to sysinternals handle tool, only the VMWP process (Hyper-V process representing the VM) and the SYSTEM process have file handles open to the VHD files. As live migrating/restarting the VM will cause a new VMWP process to be started, it's likely that VMWP is the one preventing shared access - though it could be something funky with how the SYSTEM has the file open as well.

    Is there any way to determine what flags were used on open file handles? I'd like to query the handles in the different processes and see what flags each are using to determine which one is the potential culprit.