Availability Sets: Why only 1 update domain in 1 fault domain?

Question

Availability Sets: Why only 1 update domain in 1 fault domain?

Rafael Chang 20

I am thinking of a use case where the administrator isn't concerned about rack failure (hence 1 fault domain) but wants to separate their VMs in multiple update (>1) domains to stagger updates and avoid downtime.

When configuring Availability Sets, if I set 1 fault domain, why can I only set 1 update domain?

vipullag-MSFT 26,487 Reputation points Moderator

2023-07-19T03:07:44.01+00:00

Hello Rafael Chang

Any update on the issue?

Just checking in to see if you got a chance to see previous response.

If the suggested response helped you resolve your issue, please 'Accept as answer', so that it can help others in the community looking for help on similar topics.

2 answers

Your answer

vipullag-MSFT 26,487 Reputation points Moderator

2023-07-19T03:07:44.01+00:00

Hello Rafael Chang

Any update on the issue?

Just checking in to see if you got a chance to see previous response.

If the suggested response helped you resolve your issue, please 'Accept as answer', so that it can help others in the community looking for help on similar topics.

Answer 1

shiva patpi 13,366 Microsoft Employee Moderator

@Rafael Chang

Please take a look at similar question was answered in the below post:

https://learn.microsoft.com/en-us/answers/questions/1036263/fault-domain-vs-update-domain

See the architecture of Fault domain concept :

https://learn.microsoft.com/en-us/azure/virtual-machines/availability-set-overview?WT.mc_id=AZ-MVP-5000120

Basically, an update domain is nothing but the logical groupings of different fault domains. So to have multiple Update domains, there should be a minimum of 2 Fault domains.

Think Fault Domain as a Column and Update Domain as a Row.

Regards,

Shiva.

Rafael Chang 20 Reputation points

2023-07-19T10:30:35.3333333+00:00

I have seen those 2 resources you provided in addition to your answer and I don't really follow.

My understanding is that when you define the number of fault domains, you are defining the number of racks which you want to spread your VMs over. That way you can protect against rack failure (either because of power or network failure).

On the other hand, my understanding of update domains is that when you define the number of update domains, you define the number of physical server blades you spread your VMs over. This way, when Microsoft performs upgrades on the server blade (a hypervisor upgrade for example) and it needs to be rebooted, you have VMs running on other blades that will provide service availability. Only when the upgrade is finished on one update domain, Microsoft will continue upgrading the next upgrade domain.

So with my understanding, it sounds perfectly reasonable to simply have one fault domain but have multiple update domains. This would mean that my VMs would be limited within one rack (one fault domain) but will be spread out over many server blades (many upgrade domains) within that rack.

What part of my understanding is incorrect?

Answer 2

Hello Rafael Chang

Welcome to Microsoft Q&A Platform, thanks for posting your query here.

Your understanding of fault domains and update domains is correct. Fault domains are used to protect against rack failure, while update domains are used to ensure that not all VMs in an availability set are updated at the same time during planned maintenance**.**

If you have only one fault domain, it means that all the VMs in that availability set are located in the same rack, which means that they share the same power source and network switch. This can be a single point of failure, which means that if the rack goes down, all the VMs in that rack will be affected.

On the other hand, if you have multiple update domains, it means that the VMs in your availability set are spread across multiple physical server blades. This helps to ensure that if one server blade needs to be rebooted for maintenance, the other server blades can continue to provide service availability.

Now, where the confusion might be is that the number of update domains you can set for an Availability Set is constrained by the number of fault domains you choose.

When you create an Availability Set, Azure ensures that VM instances in that set are spread across different fault domains for high availability. However, VM instances within the same Availability Set cannot share the same update domain. This constraint is in place to provide better resiliency. If multiple VM instances sharing the same update domain are deployed on the same physical hardware (server blades) and that hardware experiences an issue during an update, all VMs within that update domain would be affected simultaneously, reducing the benefits of update domain isolation.

So, while it might seem reasonable to have one fault domain and multiple update domains to spread VMs across different server blades within the same rack, Azure currently enforces a one-to-one relationship between fault domains and update domains in an Availability Set. If you want to have more update domains, you would also need to have a corresponding increase in fault domains to achieve the desired level of fault and update isolation. This ensures better resiliency and availability of VM instances during planned maintenance events.

So, in your use case, if you want to have multiple update domains to stagger updates and avoid downtime, you would need to choose more than 1 fault domain during the configuration of the Availability Set. This will enable you to spread your VMs across different fault domains and have the flexibility to set multiple update domains to achieve your desired update management strategy.

Rafael Chang 20 Reputation points

2023-07-20T15:57:34.27+00:00

Hello vipullag and thank you for your clear and detailed answer. Good to know my base understanding is correct.

You mention the fact that if multiple VM instances share the same update domain, they will experience downtime if that update domain is being worked on. However, in the hypothetical scenario I presented was, for example, if I have 3 VMs and I wanted them all in 1 fault domain but 3 different update domains as follows:

I still don't see where is the fallacy behind my scenario. Is this not possible due to some misunderstanding of how fault and update domains work or are you saying that the goal I am trying to achieve and the reasoning behind it makes sense but Microsoft has decided not to for some reason unknown to us?

Also, when you say that fault domains and update domains have a 1 to 1 relationship, I think thats correct. From the diagram (https://learn.microsoft.com/en-us/azure/virtual-machines/availability-set-overview?WT.mc_id=AZ-MVP-5000120) it seems that one fault domain can have many update domains, and the same update domain can be present in multiple fault domains. Isn't it a many-to-many relationship?

Lastly, it seems Azure, in many ways, has been designed to prevent us from making poor choices and Availability Sets are meant to provide HA and resiliency, why would we be allowed to create a set with 1 fault domain and 1 update domain? Is there a use case for that? If not, perhaps a better limitation would be to set at least a minimum of 2 fault domains and 2 update domains...

Thanks for your answers and your time.
vipullag-MSFT 26,487 Reputation points Moderator

2023-07-24T00:01:49.37+00:00

Hello Rafael Chang
Thanks for responding back with more details.
I would recommend you to open a azure support case for getting these clarified from the internal team.

If you don't have a support plan enabled on your subscription, I request you to send an email to AzCommunity@Microsoft.com with Subject as "Attn:Vikas" referencing this thread along with your subscription ID. I can then enable your subscription with a one-time free technical support.

Once the issue is resolved, request you to post the resolution here for the benefit of community.

Share via

Availability Sets: Why only 1 update domain in 1 fault domain?

2 answers

Your answer