Understand allocation of VMs in Availability Sets

Apurva Pathak 885 Reputation points
2025-12-14T15:18:25.6566667+00:00

Hello all,

Could someone please be kind enough to help me understand how Azure exactly maps VMs deployed in Availability Sets to the different Update and Fault domains.

I've gone through this doc which gave me some idea but still not 100% sure.

As per my current understanding based on above doc, Azure automatically assigns an update domain and a fault domain to each machine in the availability sets, the moment your number of VMs cross the number of update domains in the set, it will start placing machines from the 1st update domain.

My doubt is:

What if I create an availability set with 3 fault domains and 20 update domains, and I have to place 20 VMs in that.

In theory, each of the VM will get its unique update domain but what about fault domains?

Because fault domains are limited to a max of three, which means every 4th machine will end up in the same fault domain as the 1st one. In total: -

7x VMs will be in fault domain 1

7x VMs will be in fault domain 2

6x VMs will be in fault domain 3, while maintaining their unique update domains.

Similarly, if I try to put more than 20 VMs in this availability set, the 21st machine will be put in the same update domain as the 1st machine.

Below is a snip to explain my understanding.

{89E5ADC3-B8FA-4321-BCB8-D3D73A36B392}

Will this put the infra on risk of resiliency for VMs which are being assigned to the repetitive update and fault domains? If yes, how do we plan our infra to overcome this.

Thanks in advance!

Azure Virtual Machines
Azure Virtual Machines
An Azure service that is used to provision Windows and Linux virtual machines.
0 comments No comments
{count} votes

Answer accepted by question author
  1. Andreas Baumgarten 129.5K Reputation points MVP Volunteer Moderator
    2025-12-14T15:32:48.3833333+00:00

    Hi @Apurva Pathak ,

    based on your scenario:

    20 VMs in 3 fault domains and 20 update domains

    2x 7 VMs and 1x 6 VMs distributed in 3 fault domains:

    Means if an outage of power or network connectivity happens in a fault domain only 7 or 6 VMs are affected. Depending on your performance requirements it might be a solution to increase the size of VMs in case of an outage of a fault domain.

    Another option might be to distribute the VMs in different Availability Sets:
    20 VMs in 2 Availability Sets with 3 fault domains each and 10 update domains

    • 3x VMs will be in fault domain 1 in Availability Set 1
    • 3x VMs will be in fault domain 2 Availability Set 1
    • 4x VMs will be in fault domain 3 Availability Set 1
    • 3x VMs will be in fault domain 1 in Availability Set 2
    • 3x VMs will be in fault domain 2 Availability Set 2
    • 4x VMs will be in fault domain 3 Availability Set 2

    (If the reply was helpful please don't forget to upvote and/or accept as answer, thank you)

    Regards

    Andreas Baumgarten

    1 person found this answer helpful.
    0 comments No comments

Answer accepted by question author
  1. Marcin Policht 68,850 Reputation points MVP Volunteer Moderator
    2025-12-14T15:31:44.0866667+00:00

    Your mental model is pretty close, but a few important nuances about how Azure actually distributes VMs across update domains (UDs) and fault domains (FDs) will clear up the remaining confusion.

    First, Azure treats update domains and fault domains as two independent axes of placement. When you deploy a VM into an availability set, Azure assigns it exactly one fault domain and one update domain. The assignment is automatic and done at creation time. You do not control the specific numbers, only the maximum counts defined on the availability set.

    With an availability set configured for 3 fault domains and 20 update domains, Azure will try to spread VMs as evenly as possible across both dimensions, but fault domains take priority from a resiliency perspective. This is because fault domains represent physical isolation (power, network, rack), whereas update domains are a logical construct used only during planned maintenance.

    So in your scenario with 20 VMs, each VM will indeed land in a distinct update domain, because you have exactly 20 UDs. However, those 20 VMs must still be mapped onto only 3 fault domains. Azure will distribute them as evenly as possible across the 3 FDs. In practice, that means something very close to what you described: roughly one third of the VMs per fault domain, for example 7 in FD0, 7 in FD1, and 6 in FD2. Each of those VMs still has its own unique update domain number. There is no requirement that update domains and fault domains “line up” in a grid-like way - Azure simply assigns both independently.

    When you go beyond 20 VMs, the 21st VM will reuse an update domain number, but not arbitrarily. Azure again tries to balance the load, so the reused update domain will typically be paired with a fault domain that keeps overall distribution as even as possible. You should not assume that “VM 21 goes to UD1 and FD1” - the platform’s goal is balance, not strict modulo arithmetic.

    Regarding the resiliency concerns, reusing update domains and fault domains does not inherently put your infrastructure at risk, as long as you understand what each domain protects you from. Fault domains protect you from unplanned physical failures. With 3 fault domains, you are protected against the loss of an entire rack or power unit, but you are not protected from losing more than one third of your VMs in a single physical failure. That limitation exists regardless of whether you deploy 3 VMs or 300 VMs into the availability set. Adding more VMs does not reduce fault-domain resilience; it just means more VMs share the same physical isolation boundary.

    Update domains protect you from planned maintenance. During planned maintenance, Azure updates one update domain at a time, so only the VMs in that domain are rebooted simultaneously. If you have more VMs than update domains, then yes, multiple VMs will reboot together during maintenance. However, this is expected and accounted for in availability set design. The SLA assumes that your application can tolerate losing all VMs in a single update domain at once. If you exceed the number of update domains, you must design your application tier so that losing one update domain that contains multiple VMs is still acceptable.

    The key planning takeaway is that availability sets are meant to protect application tiers, not individual VMs. You plan resiliency by ensuring that each tier has enough instances such that losing one fault domain or one update domain does not take the service down. If you need stronger guarantees, such as isolation across datacenters or protection against zone-level failures, availability sets are no longer sufficient and you should use availability zones instead.


    If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.

    hth

    Marcin


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.