S2D / Azure Stack HCI Pool Quorum

Question

Hi,

I understand the whole majority thing in a Windows Cluster; 16 nodes with node-only majority means 7 nodes can go offline.

I'm not quite getting my head around Pool Quorum though, when it comes to larger cluster sizes and linear failure tolerance. I totally get this exmaple: https://learn.microsoft.com/en-us/windows-server/storage/storage-spaces/understand-quorum#how-pool-quorum-works

3 things in a 4-node S2D cluster cannot be offline, but scaling that example up to using the same 16-node cluster:

16 nodes, 4 disks in each = 64 disk votes + pool resource owner = 65 votes:

Scenario: 8 servers or all disks in 8 servers go down (as long as pool resource owner isn't one of the servers) = 32 votes
Outcome: 8 servers and all disks in 8 servers remain up + pool resource owner up = 33 votes. Pool stays online
Notes: If the pool resource owner was one of the failed 8 servers, or I lose another surviving server or even a single disk in another other surviving server then majority are now down. Pool goes offline

However: "Three-way mirroring can safely tolerate at least two hardware problems (drive or server) at a time. For example, if you're rebooting one server when suddenly another drive or server fails, all data remains safe and continuously accessible."

That suggests to me that even if you (could) build a 512 node S2D cluster, only 2 fault domains can be offline at a time. Is this in spite of how fault-tolerant you make it in terms of node, chassis, rack, site awareness?

Thanks!

Accepted Answer

@Lanky Doodle

There are a few other nuances, which are described in the "delimit-volume-allocation" document that is linked below.

The gist of it is that is that every volume is broken up into extents of either 256MB or 1GB, and those extents are each handled with the specified resiliency. For instance 3-way mirror there will be 3 copies placed on 3 separate nodes. The next extent will also have 3 copies, but likely placed on a different set of nodes (unless it's a 3 node system then there are only 1 set of 3 nodes as possibility.

So, if any 3 nodes goes down, it's highly probably it's the 3 nodes for at least 1 extent of each volume. Which mean each volume goes offline. The article discusses ways to scope (delimit) which nodes a volume are on to manage this.

https://learn.microsoft.com/en-us/windows-server/storage/storage-spaces/storage-spaces-fault-tolerance

https://learn.microsoft.com/en-us/windows-server/storage/storage-spaces/delimit-volume-allocation

Stevenek

Answer

Hi @Lanky Doodle ,

If you have a volume that is 3-way mirror, there are 3 copies of each piece of the volume that are distributed across the fault domain (default fault domain is "storage scale unit", which is the same as a node or server). If your system is 12 nodes, each piece will be on 3 nodes. But a volume has lots of pieces (extents) so every node will have some copy of a piece since we distribute them out for performance and efficiency.

What this means is that if one node goes down each volume will have some copies of data. If a 2nd node goes down, some pieces of the volume with have 2 of it's 3 copies. If a 3rd node goes down some piece(extent) of the volume will statistically have it's 3 copies on the 3 down nodes, meaning the volume (virtual disk) has to offline until the volume is able to access atleast one copy of each piece.

Int his case, the pool quorum is not the issue.

I hope this helps,
Steven Ekren
StevenEk@Microsoft.com

Share via

S2D / Azure Stack HCI Pool Quorum

1 additional answer