Availability Sets - Choosing number of fault and update domains

Question

Availability Sets - Choosing number of fault and update domains

Chris Hocking 1

Hello,

I have come across availability sets many times during study and have a good understanding of them. However there is one thing I've not been able to understand:
Is there a reason to not set the number of fault domains and update domains to their maximums?
Or put another way, why would you choose to have a higher proportion of your VM's in the same fault domain or in the same update domain?

Thanks

2 answers

Your answer

Answer 1

Hello @Chris Hocking

Here is an example of fault domain usage:

Each virtual machine in availability set is assigned an update domain and a fault domain by the underlying Azure platform. For a given availability set, five non-user-configurable update domains are assigned by default (Resource Manager deployments can then be increased to provide up to 20 update domains) to indicate groups of virtual machines and underlying physical hardware that can be rebooted at the same time. When more than five virtual machines are configured within a single availability set, the sixth virtual machine is placed into the same update domain as the first virtual machine, the seventh in the same update domain as the second virtual machine, and so on. The order of update domains being rebooted may not proceed sequentially during planned maintenance, but only one update domain is rebooted at a time. A rebooted update domain is given 30 minutes to recover before maintenance is initiated on a different update domain.
https://learn.microsoft.com/en-us/azure/virtual-machines/availability-set-overview#how-do-availability-sets-work

There are some best practices:

Put each application tier into a separate Availability Set. In an N-tier application, don't put VMs from different tiers into the same availability set. VMs in an availability set are placed across fault domains (FDs) and update domains (UD). However, to get the redundancy benefit of FDs and UDs, every VM in the availability set must be able to handle the same client requests.
The availability set should have the number of fault domains set to 3 and upgrade domains should be set to 20.
Azure supports a maximum of 3 fault domains and 20 upgrade domains. We recommend the maximum of 20 upgrade domains as that will minimize the number of nodes down at any one time.

https://github.com/DSPN/azure-deployment-guide/blob/master/bestpractices.md
https://learn.microsoft.com/en-us/azure/architecture/checklist/resiliency-per-service#virtual-machines

Chris Hocking 1 Reputation point

2021-03-16T12:46:36.94+00:00

Thanks for taking the time to respond. I'm looking to understand in which scenario one would choose to set a lower number of fault and update domains than maximums. So far I haven't come up with such a scenario, and am left wondering why Microsoft doesn't set the maximums by default.

Answer 2

Hello! That's a good question. If increasing the number of fault domains and update domains increases reliability, why not spread your solution out over as many as possible if there's little or no impact to your latency? While it initially makes sense to max out fault domains and update domains it may end up being more important that you set a minimum threshold instead.

Enterprise and solutions at scale
When you are dealing with just a handful of VMs, you have a lot of flexibility when it comes to picking the location of your VM. When you are working with a much larger solution, it may be more important that you can quickly secure a large number of VMs than it is if some of those VMs end up in the same fault domain or update domain as long as you can ensure a minimum number of fault domains and update domains to keep your solution running. In this case, a minimum threshold is more important than a maximum.

Azure Stack, hybrid, and modular data centers
Azure Stack is used on cruise ships, and modular data centers bring the cloud to remote areas. In these cases you don't have the luxury of a large datacenter and setting the minimum number of fault domains and update domains may be more practical than setting a maximum.

I hope that answers your question. If you are interested, there's more reading on the subject here:
https://learn.microsoft.com/en-us/archive/msdn-magazine/2015/september/microsoft-azure-fault-tolerance-pitfalls-and-resolutions-in-the-cloud#how-many-fault-domains

Share via

Availability Sets - Choosing number of fault and update domains

2 answers

Your answer