Kubernetes failover scenarios on a clustered Azure Stack Edge device
Kubernetes cluster is deployed as a popular open-source platform to orchestrate containerized applications. This article describes how Kubernetes works on your 2-node Azure Stack Edge device including the failure modes and the corresponding device responses.
On your Azure Stack Edge device, you can create a Kubernetes cluster by configuring the compute. When the compute role is configured, the Kubernetes cluster including the master and worker nodes are all deployed and configured for you. This cluster is then used for workload deployment via kubectl
, IoT Edge, or Azure Arc.
The Azure Stack Edge device is available as a 1-node configuration or a 2-node configuration that constitutes the infrastructure cluster. The Kubernetes cluster is separate from the infrastructure cluster and is deployed on top of the infrastructure cluster. The infrastructure cluster provides the persistent storage for your Azure Stack Edge device while the Kubernetes cluster is responsible solely for application orchestration.
The Kubernetes cluster comprises a master node and worker nodes. The Kubernetes nodes in a cluster are virtual machines that run your applications and cloud workflows.
- The Kubernetes master node is responsible for maintaining the desired state for your cluster. The master node also controls the worker node.
- The worker nodes run the containerized applications.
The Kubernetes cluster on the 2-node device has one master node and two worker nodes. The 2-node device is highly available, and if one of the nodes fails, both the device and the Kubernetes cluster keep running. For more information on the Kubernetes cluster architecture, go to Kubernetes core concepts.
On a 2-node Azure Stack Edge device, the Kubernetes master VM and a Kubernetes worker VM are running on node A of your device. On the node B, a single Kubernetes worker VM is running.
Each worker VM in the Kubernetes cluster is a pinned Hyper-V VM. A pinned VM is tied to the specific node it is running on. If the node A on the device fails, the master VM fails over to node B. But the worker VM on node A which is a pinned VM does not fail over to node B and vice-versa. Instead, the pods from the worker VM on node A are rebalanced onto node B.
In order for the rebalanced pods to have enough capacity to run on the device node B, the system enforces that no more than 50% of each ASE node’s capacity be used during regular 2-node Azure Stack Edge cluster operations. This capacity usage is done on a best effort basis and there are circumstances (for example, workloads requiring unavailable GPU resources when they are rebalanced to ASE Node B) in which rebalanced pods may not have sufficient resources to run.
These scenarios are covered in detail in the next section on Failure Modes and Behavior.
The Azure Stack Edge device nodes may fail under certain conditions. The various failure modes and the corresponding device responses are tabulated in this section.
Node | Failures | Responses |
---|---|---|
Node A has failures (Node B has no failures) |
Following possible failures can occur:
|
Following responses are seen for each of these failures:
|
Node A reboots (Node B has no failures) |
Node reboots | After node A completes rebooting and the worker VM is available, master VM will rebalance the pods from node B. |
Node B has failures (Node A has no failures) |
Following possible failures can occur:
|
Following responses are seen for each of these failures:
|
Node B reboots (Node A has no failures) |
Node reboots | After node B completes rebooting and the worker VM is available, master VM will rebalance the pods from node B. |
Update type | Responses |
---|---|
Device node update | Rolling updates are applied to device nodes and the nodes will reboot. |
Kubernetes service update | Kubernetes service update includes:
|
- Learn more about Kubernetes storage on Azure Stack Edge device.
- Understand the Kubernetes networking model on Azure Stack Edge device.
- Deploy Azure Stack Edge in Azure portal.