Redploying due to host failure

nimi 91 Reputation points
2022-03-16T10:39:31.43+00:00

We got a resource health event for our azure vm "Redeployed due to host failure"
Please give me the answers of below questions one by one. I will be greateful for this.

What is the cause of this issue?

After Redeploying what happens to our vm?

Our VM got rebooted and it took 30 minutes for completion and what is the reason for that?

What is MS doing for avoid this type of issues because in AWS it is not happening?

How can be mitigate these type of issues?

Azure Virtual Machines
Azure Virtual Machines
An Azure service that is used to provision Windows and Linux virtual machines.
9,043 questions
0 comments No comments
{count} votes

Accepted answer
  1. Alan Kinane 16,951 Reputation points MVP Volunteer Moderator
    2022-03-16T11:19:09.363+00:00

    What is the cause of this issue?
    This should be listed under the service health service - it may take some time for the report to appear. https://learn.microsoft.com/en-us/azure/service-health/resource-health-overview

    After Redeploying what happens to our vm?
    It just gets moved to a healthy node (physical host), you will experience some downtime while it is moved but that should be all. - https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/redeploy-to-new-node-windows

    Our VM got rebooted and it took 30 minutes for completion and what is the reason for that?
    The fabric controller automatically moves your VM to a healthy physical node which requires some downtime. The time it takes can depend on what you have deployed.

    What is MS doing for avoid this type of issues because in AWS it is not happening?
    Hardware components are always subject to failure so in this instance Microsoft have automatically moved your VM to a healthy host. I'm quite sure AWS has a very similar process, but I can't comment on how AWS manage fault tolerance. Something like this can happen to anyone and likewise I know customers who have never experienced this after many years of usage.

    How can be mitigate these type of issues?
    If you can't afford to risk any downtime then you would need to deploy multiple instances of your VMs using either availability sets or availability zones in order to spread your VMs across separate physical hosts. https://learn.microsoft.com/en-us/azure/architecture/example-scenario/infrastructure/iaas-high-availability-disaster-recovery

    4 people found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. nimi 91 Reputation points
    2022-03-16T14:48:33.91+00:00

    Thankyou for your answers.
    The fabric controller automatically moves your VM to a healthy physical node which requires some downtime. The time it takes can depend on what you have deployed.

    Could you please elaborate this.

    Which all the things depends upon it.

    Also how did we know that which hardware component caused the failure. In that resource health event it is not mentioned.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.