Ensuring control plane resiliency with Operator Nexus Service
Artikel
The Nexus service is engineered to uphold control plane resiliency across various compute rack configurations.
Instances with three or more compute racks
Operator Nexus ensures the availability of three active Kubernetes control plane (KCP) nodes in instances with three or more compute racks. For configurations exceeding two compute racks, an extra spare node is also maintained. These nodes are strategically distributed across different racks to guarantee control plane resiliency, when possible.
Tip
The Kubernetes control plane is a set of components that manage the state of a Kubernetes cluster, schedule workloads, and respond to cluster events. It includes the API server, etcd storage, scheduler, and controller managers.
The remaining management nodes contain various operators which run the platform software as well as other components performing support capabilities for monitoring, storage and networking.
During runtime upgrades, Operator Nexus implements a sequential upgrade of the control plane nodes, thereby preserving resiliency throughout the upgrade process.
Three compute racks:
Rack 1
Rack 2
Rack 3
KCP
KCP
KCP
KCP-spare
MGMT
MGMT
Four or more compute racks:
Rack 1
Rack 2
Rack 3
Rack 4
KCP
KCP
KCP
KCP-spare
MGMT
MGMT
MGMT
MGMT
Instances with less than three compute racks
Operator Nexus maintains an active control plane node and, if available, a spare control plane instance. For instance, a two-rack configuration has one active Kubernetes Control Plane (KCP) node and one spare node.
Two compute racks:
Rack 1
Rack 2
KCP
KCP-spare
MGMT
MGMT
Single compute rack:
Operator Nexus supports control plane resiliency in single rack configurations by having three management nodes within the rack. For example, a single rack configuration with three management servers will provide an equivalent number of active control planes to ensure resiliency within a rack.
Rack 1
KCP
KCP
KCP
Resiliency implications of lost quorum
In disaster situations when the control plane loses quorum, there are impacts to the Kubernetes API across the instance. This scenario can affect a workload's ability to read and write Custom Resources (CRs) and talk across racks.
I dette modul får du mere at vide om kernekomponenterne i Kubernetes-infrastrukturen, herunder kontrolplannoder, nodepuljer og arbejdsbelastningsressourcer, f.eks. pods, udrulninger og sæt.