Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Applies to: AKS on Windows Server
To help minimize service disruptions for clusters, AKS on Windows Server continuously monitors the health state of worker nodes, and performs automatic node repair if issues arise or if they become unhealthy. This article describes how AKS Arc checks for unhealthy nodes and automatically repairs both Windows and Linux nodes. The article also shows how to manually check node health.
How AKS checks for unhealthy nodes
AKS Arc uses the following rules to determine if a node is unhealthy and needs repair:
- The node reports a NotReady status on consecutive checks.
- The node doesn't report any status within 20-30 minutes.
You can manually check the health state of your nodes with kubectl
, as follows:
kubectl get nodes
The status of the nodes should look similar to the following output:
NAME STATUS ROLES AGE VERSION
moc-l2tlqojhk2d Ready master 46h v1.19.7
moc-l8h8i6lxk1h Ready <none> 46h v1.19.7
moc-lqnjufwo2cy Ready master 46h v1.19.7
moc-ltyl8mqy47z Ready <none> 47h v1.19.7
moc-lwn5xnrapnj Ready master 47h v1.19.7
moc-wvt025q406z Ready <none> 47h v1.19.7
How automatic repair works
If AKS Arc identifies an unhealthy node that remains unhealthy for more than 20-30 minutes, it creates and reimages a new node.
It usually takes 20 to 30 minutes to repair the node. If AKS Arc finds multiple unhealthy nodes during a health check, each node is repaired individually before another repair begins.