Hello Rafael Chang
Welcome to Microsoft Q&A Platform, thanks for posting your query here.
Your understanding of fault domains and update domains is correct. Fault domains are used to protect against rack failure, while update domains are used to ensure that not all VMs in an availability set are updated at the same time during planned maintenance**.**
If you have only one fault domain, it means that all the VMs in that availability set are located in the same rack, which means that they share the same power source and network switch. This can be a single point of failure, which means that if the rack goes down, all the VMs in that rack will be affected.
On the other hand, if you have multiple update domains, it means that the VMs in your availability set are spread across multiple physical server blades. This helps to ensure that if one server blade needs to be rebooted for maintenance, the other server blades can continue to provide service availability.
Now, where the confusion might be is that the number of update domains you can set for an Availability Set is constrained by the number of fault domains you choose.
When you create an Availability Set, Azure ensures that VM instances in that set are spread across different fault domains for high availability. However, VM instances within the same Availability Set cannot share the same update domain. This constraint is in place to provide better resiliency. If multiple VM instances sharing the same update domain are deployed on the same physical hardware (server blades) and that hardware experiences an issue during an update, all VMs within that update domain would be affected simultaneously, reducing the benefits of update domain isolation.
So, while it might seem reasonable to have one fault domain and multiple update domains to spread VMs across different server blades within the same rack, Azure currently enforces a one-to-one relationship between fault domains and update domains in an Availability Set. If you want to have more update domains, you would also need to have a corresponding increase in fault domains to achieve the desired level of fault and update isolation. This ensures better resiliency and availability of VM instances during planned maintenance events.
So, in your use case, if you want to have multiple update domains to stagger updates and avoid downtime, you would need to choose more than 1 fault domain during the configuration of the Availability Set. This will enable you to spread your VMs across different fault domains and have the flexibility to set multiple update domains to achieve your desired update management strategy.