Share via

SQL Server Failover Cluster

David Prodata 80 Reputation points
2026-04-21T03:28:43.4966667+00:00

Hello,

I want to ask the requirement failover-cluster for geo-redudancy environment, full detail of it.

Thankyou

Windows for business | Windows Server | Storage high availability | Clustering and high availability
0 comments No comments

Answer accepted by question author

Jason Nguyen Tran 18,720 Reputation points Independent Advisor
2026-04-26T01:02:11.0533333+00:00

Hi David Prodata,

I’m following up to check whether the issue has been resolved. Feel free to reply if you need further information. If the information provided was helpful, please click "Accept Answer" to help others in the community. Thank you!

Was this answer helpful?


Answer accepted by question author

Jason Nguyen Tran 18,720 Reputation points Independent Advisor
2026-04-21T08:11:20.8366667+00:00

Hi David Prodata,

To clarify, there are two main approaches supported in Windows Server Failover Clustering (WSFC) for SQL Server when you need geo‑redundancy:

1. Stretched Cluster (Multi‑Subnet FCI): This involves a single WSFC that spans multiple sites. Each site hosts nodes in different subnets, and SQL Server Failover Cluster Instances (FCIs) can fail over between them. Requirements include:

  • Reliable, low‑latency network connectivity between sites.
  • Shared storage that supports replication across sites.
  • DNS and quorum configuration for multi‑subnet awareness.

2. Multi‑Cluster Topology with Replication: In this model, you deploy independent clusters in each site and use application‑level replication (such as Always On Availability Groups) to synchronize data. Requirements include:

  • Separate WSFC clusters per site.
  • Database replication configured between clusters.
  • Proper failover group configuration to handle cross‑site failover.

For both approaches, you’ll need:

Windows Server Datacenter edition for clustering features.

SQL Server Enterprise edition for multi‑subnet clustering and Always On Availability Groups.

Quorum configuration that accounts for site failures (often using a cloud witness or file share witness).

Network planning to ensure client redirection works correctly during failover.

Best practice is to pilot the design in a test environment before production rollout, validate failover scenarios, and ensure monitoring tools are in place. Geo‑redundancy adds complexity, so careful planning of storage replication, DNS, and quorum is essential.

In short, you can choose between a stretched multi‑subnet cluster or multiple independent clusters with replication, depending on your latency tolerance and infrastructure.

I hope this helps. If you find this answer helpful, please consider clicking Accept Answer so others can benefit too.

Jason.

Was this answer helpful?

0 comments No comments

1 additional answer

Sort by: Most helpful
  1. AI answer

    2026-04-21T03:29:00.8366667+00:00

    For geo-redundancy with SQL Server using failover clustering, there are two main patterns from the provided material:

    1. Single WSFC with a multi-subnet SQL Server FCI (stretched cluster)
    2. Multiple independent clusters with application-level replication (multi-cluster topology)

    Below are the requirements and characteristics for each.

    1. Single Windows Server Failover Cluster (WSFC) with multi-subnet FCI

    Purpose: High availability within a cluster plus disaster recovery across sites/subnets.

    Key requirements:

    • WSFC infrastructure
      • A Windows Server Failover Cluster (WSFC) configured across nodes that may reside in different subnets/sites.
      • All nodes must be joined to the same WSFC and meet clustering requirements (hardware, networking, quorum, etc.).
      • WSFC provides resource management, health monitoring, and failover coordination for the SQL Server FCI.
      • Quorum must be designed to tolerate site/subnet failures.
    • SQL Server Failover Cluster Instance (FCI)
      • SQL Server Always On failover cluster instance installed across the WSFC nodes.
      • Identical SQL Server version, patch level, components, and instance name on all FCI nodes.
      • SQL Server binaries are installed locally on each node; WSFC controls service startup and failover.
    • Multi-subnet configuration
      • Each node (or set of nodes) is in a different subnet or site (stretch cluster).
      • The FCI uses a WSFC resource group that includes a virtual network name and IP addresses in each subnet.
      • Clients connect to the FCI as if it were a single SQL Server instance; failover between subnets is transparent when properly configured.
    • Storage
      • Traditional FCI: requires shared storage visible to all nodes (cluster disks, SAN, Storage Spaces Direct, or SMB file shares).
      • For multi-subnet FCI where shared storage is not possible across sites, data must be replicated between storage in each subnet; there is more than one copy of the data.
      • Storage must be designed so that when failover occurs to another subnet, the SQL instance has access to a consistent copy of the data.
    • Benefits for geo-redundancy
      • Instance-level protection with automatic failover between nodes (including across subnets in a multi-subnet FCI).
      • Disaster recovery via multi-subnet FCI or by combining FCI with Always On availability groups for remote DR.
      • No client reconfiguration during failover when using the FCI virtual name and IPs.
    1. Multi-cluster topology (independent clusters per site) with application-level replication

    Purpose: Strong geo-redundancy and isolation by using separate clusters in each site and replicating at the application/database layer.

    Key requirements:

    • Independent WSFC clusters
      • Two or more separate WSFC clusters, each at its own site or subnet.
      • Each cluster has its own quorum, resources, and SQL Server instances/FCIs.
      • A failure in one cluster does not affect quorum or operation of the other clusters.
    • Application-level replication
      • Data is replicated between clusters using:
        • SQL Server Always On Availability Groups for database workloads, or
        • Other file-based or custom replication tools for non-database data.
      • Replication can be synchronous (for HA within a region) or asynchronous (for long-distance DR), depending on requirements and support.
    • Use cases
      • Large-scale deployments needing isolation between clusters.
      • Different administrative or security boundaries between sites.
      • Active-active scenarios where load is distributed across sites.
      • Maximum separation between sites for disaster recovery.
    • Operational considerations
      • More complex operations than a single stretched cluster; requires orchestration for cross-cluster failover.
      • Monitoring must cover all clusters and replication health.
      • Cluster versions and configurations can differ, as long as the application-level replication supports it.
    1. General WSFC/FCI requirements relevant to geo-redundancy
    • Cluster functionality
      • WSFC provides distributed metadata, resource management, health monitoring (heartbeats and resource checks), and failover coordination.
      • A health-based failover policy controls automatic transfer of SQL resources between nodes.
    • SQL Server FCI specifics
      • FCI appears as a single SQL instance to clients, even though it spans multiple nodes (and possibly subnets).
      • Only one node owns the FCI resource group at a time.
      • FCI can be combined with Always On availability groups to extend disaster recovery beyond the local cluster/site.
    • Storage and networking
      • For traditional FCI: shared storage accessible from all nodes, or replicated storage for multi-subnet scenarios.
      • Reliable, low-latency network within each site; sufficient bandwidth and latency characteristics between sites for replication.

    In practice, for geo-redundancy with SQL Server and failover clustering, environments typically choose either:

    • A multi-subnet FCI (single WSFC stretched across sites) for instance-level HA/DR, possibly combined with availability groups for additional DR, or
    • Multiple independent WSFC clusters per site with Always On availability groups replicating databases between them for DR and/or active-active usage.

    References:

    AI-generated content may be incorrect. Read our transparency notes for more information.

    Was this answer helpful?

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.