Can't start, stop, restart, redeploy, or connect to a running Windows VM in Azure...

Boris Shpungin 16 Reputation points
2019-12-21T00:43:17.763+00:00

I have a VM running Windows 7 SP1 in Azure, that we've been using sporadically (every couple of months) for the last several years. When we don't need it, we shut it down so it doesn't keep running up resource usage charges...

Several days ago, we suddenly can't RDP into this VM after starting it through the Azure Portal. Upon further examination, it turns out the VM never properly shut down the last time we tried to Stop it: it's still running, and in the Azure Portal I can see that its CPU activity reflects some sort of an ongoing 'heartbeat' process. Looking at the current boot-up screenshot, it shows the machine sitting at the login (ctrl+alt+del) screen. So it seems to be "alive", after a fashion.

However, attempts to RDP to it fail. Trying to Stop, Restart, Redeploy - or basically do anything to it at all, either through the Azure Portal or through Azure Cloud Shell, either immediately or eventually results in failure (if not immediately, then after about an hour or so of 'operation in progress' - I'm guessing it's a timeout). Here's the current status for this VM, as shown in the Azure Portal:

Additional error information is available for this virtual machine:
GENERAL
Provisioning state Provisioning failed. An unexpected error occured while processing the network profile of the VM. Please retry later.. NetworkingInternalOperationError
Provisioning state error code ProvisioningState/failed/NetworkingInternalOperationError
Guest agent Unknown

At one point, I tried adding a new virtual NIC to this VM (trying to get around the "unexpected error" that seems to be related to networking): no dice - operation fails with error.

This has been going on for several days now, and despite all I've tried, I can't seem to make a dent in this problem. How do you shut down, reconfigure, or generally do anything that can at all impact, this kind of an apparently immortal and invulnerable 'zombie' VM, in Azure??

In the meantime, we're being charged for this VM's resource usage... The whole situation is sort of incredible, and more than a little absurd. Any ideas, anyone?

Azure Virtual Machines
Azure Virtual Machines
An Azure service that is used to provision Windows and Linux virtual machines.
7,120 questions
0 comments No comments
{count} vote

3 answers

Sort by: Most helpful
  1. Ronen Ariely 15,096 Reputation points
    2019-12-21T21:28:53.613+00:00

    Good day Boris,

    Without a direct option to examine your specific VM from the host side, I think that in this case we are playing a guess game. In this case it is best to create a help ticket (open a support case) in the azure account so the Azure team will be able to get from you the information about your account and examine this case.

    Note: This might be a platforms issue, which only the Azure team can handle.

    In the meantime,

    (a) use PowerShell to get the status of your Azure VM and post the information here (after cleaning the infomation which should not be in public)

    ✔ you can use: Get-AzureRmVM, or Get-AzVM

    (b) Try to stop the VM from PowerShell using the command Stop-AzureRmVM

    (c) There is a new option to repair a Windows VM by using the Azure Virtual Machine repair commands. Try to follow the following document:
    https://learn.microsoft.com/en-us/azure/virtual-machines/troubleshooting/repair-windows-vm-using-azure-virtual-machine-repair-commands

    Any more information that you have might help, but again i am prety sure that in this case best option and maybe the only one might be to open a support case

    1 person found this answer helpful.
    0 comments No comments

  2. Boris Shpungin 16 Reputation points
    2019-12-23T20:27:42.69+00:00

    Hi Ronen,

    Thanks for your suggestions. Here's as you requested:

    (a) Get-AzVM

    ResourceGroupName      Name      Location        VmSize  OsType           NIC ProvisioningState Zone
    -----------------      ----      --------        ------  ------           --- ----------------- ----
    ...
    XXXXXXX           XXXXXXXXX westcentralus  Standard_F4s Windows  XXXXXXXXXXXXXX            Failed
    

    (b) Stop-AzureRmVM: hangs, until azure cloud shell times out and closes connection...

    (c) I will try the repair option a bit later (thanks for this info!), and will post the results

    I actually did try to create a support ticket for this, but we currently don't have a support subscription (can't create a ticket without one) - and oddly, Azure is refusing to let me purchase one. It might be because we used to have a "developer" support subscription until about 2 months ago, when I cancelled it (because we didn't seem to need support anymore, and I didn't want to keep paying for it...) - shows what I know! :-/

    Edit: I tried the repair option. I created a repair VM as instructed (at the link in your post under (c)), RDP'd into it, and ran sfc on the target VM's OS drive copy - which failed to find (or fix) any problems, whatsoever. I then tried to update the target VM's system drive from the copy (hoping this would trigger some sort of a system resource reset, to fix my issues) - but no luck. The target VM is still exactly as unresponsive and unreachable as it was before. Its "ProvisioningState" still reports as "Failed" in Get-AzVM, and the error code as reported by the Azure Portal is still ProvisioningState/failed/NetworkingInternalOperationError.

    Maybe it's a problem not so much with the VM itself, as with the virtual NIC linked to it? If that's the case - if the VM's network interface is shot - then there'd be no way for me (or any powershell script) to reach that VM; by the same token, there's no way to reconfigure that VM's NIC - because the VM is still running (yet can't be stopped...) That said, I have no insight into the Azure VM internals or whether, for instance, Azure powershell scripts might interact with the hypervisor directly (e.g. to force-kill the VM), vs. talking to the VM itself (and nicely asking it to shut itself down, while it blithely ignores the request... if that's the case here.) So I'm pretty lost for options, and out of ideas...

    1 person found this answer helpful.
    0 comments No comments

  3. Thuan 26 Reputation points
    2020-10-15T03:05:39.447+00:00

    I have got this exact problem for some of my VMs. The only "fix" that I can find is to re-create VMs using the existing disks.

    0 comments No comments