Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Failover clustering is a powerful strategy to ensure high availability and uninterrupted operations in critical environments. It involves a configuration of independent computers, known as nodes, which work together to enhance the availability and scalability of applications and services, now referred to as clustered roles. These nodes are interconnected through both physical cabling and software.
If a failure occurs in one or more nodes, the remaining nodes automatically take over the workload, a process called failover, minimizing disruptions. Additionally, the health of clustered roles is continuously monitored. If any issues are detected, the roles are either restarted or migrated to another node to maintain seamless operation. This proactive approach ensures that services remain consistently available, even if hardware or software failures occur.
Networking is a crucial role in failover clusters by enabling reliable communication and efficient data exchange among cluster nodes and with external clients. Clusters often employ dedicated private networks for internal functions such as heartbeat signals and cluster management, while separate public networks handle client access and application data. This network separation enhances performance and security by isolating critical cluster traffic from external disruptions. It also increases fault tolerance, ensuring that internal cluster operations remain uninterrupted and that client connections maintain high availability during failover events.
The cluster's health is continuously monitored through heartbeat signals, which help detect any issues. If a problem arises, the system can automatically initiate a failover to maintain service continuity. To protect sensitive data and meet organizational standards, failover clusters incorporate robust security measures such as encryption to secure data both in transit and at rest. They also use granular access control to manage permissions and access rights effectively.
To learn more about failover clustering in Azure Local, see Understanding cluster and pool quorum.
Active and passive failover configuration
Failover clusters can be set up in two primary configurations, active-active and active-passive. Each configuration has its own trade-offs, with active-active focusing on performance and resource efficiency, while active-passive emphasizes simplicity and reliability in failover scenarios. The choice depends on specific organizational needs and the criticality of applications being clustered.
Configuration | Operation |
---|---|
Active | In an active-active failover cluster, all nodes are active and work together simultaneously to balance the workload across the cluster. This configuration distributes tasks, processing power, or services among all available nodes, making efficient use of resources. Here’s how it works: |
Passive | In an active-passive failover cluster, some nodes are designated as active while others are on standby, ready to take over if an active node fails. Here’s how it works: |
Failover Clustering functionalities
Failover clustering provides a comprehensive set of functionalities designed to maximize uptime, ensure data integrity, and streamline management of critical workloads. These features enable organizations to maintain service continuity, efficiently manage resources, and quickly recover from hardware or software failures. Some functionalities offered by failover clustering include:
Cluster nodes and quorum:
Cluster nodes collaborate to maintain what is known as a quorum, which is essentially the minimum number of votes from cluster members required for the cluster to function correctly. This mechanism prevents split-brain scenarios, where split portions of a cluster might try to operate independently, potentially causing inconsistencies. Quorum models, such as Node Majority, Node and Disk Majority, Node and File Share Majority, and No Majority (Disk Only), determine how votes are assigned and counted. For instance, Node Majority assigns each node a vote, while Node and Disk Majority incorporate additional votes from either a disk or a file share.
Storage configuration:
A notable feature of failover clusters is the Cluster Shared Volume (CSV), which allows multiple nodes to access the same storage concurrently, enabling smooth disk management and coordination without performance loss. CSVs are an integral part of storage configurations in failover clusters. CSVs facilitate efficient disk access, allowing nodes to handle storage tasks collaboratively.
Proactive monitoring and management:
Failover clusters employ heartbeat signals as a means of monitoring the health of nodes and their roles. These signals help detect issues such as node failures or service disruptions. When such issues are detected, the system can automatically initiate failover procedures, ensuring continuity and minimizing downtime.
Security and compliance:
Security is a vital aspect of failover clusters, incorporating features like encryption and access control to protect data and cluster operations. Clusters help organizations meet compliance requirements for critical applications by ensuring secure data handling and reliable system performance. This makes them suitable for environments requiring stringent data protection and regulatory adherence.
Use cases:
Failover clustering has several practical applications, including disaster recovery, load balancing, and high-performance computing. It supports critical applications by providing high availability, enabling businesses to maintain operations even in adverse conditions. For instance, in disaster recovery scenarios, clusters can quickly restore services by transferring operations to unaffected nodes.
Failover clustering ensures high availability or continuous availability for critical applications and services (clustered roles) running on physical servers or virtual machines. If a failure occurs, these roles can be quickly moved or restarted on another node, minimizing downtime and maintaining consistent performance and redundancy.
Applications such as Microsoft SQL Server and Hyper-V virtual machines benefit from failover clustering by experiencing minimal service interruptions, even during hardware or software failures.
Failover Clustering resources
This curated table of resources is designed to help you understand, plan, deploy, and manage failover clustering effectively.