Basic troubleshooting of Node Not Ready failures
This article provides troubleshooting steps to recover Microsoft Azure Kubernetes Service (AKS) cluster nodes after a failure. This article specifically addresses the most common error messages that are generated when a Node Not Ready failure occurs, and explains how node repair functionality can be done for both Windows and Linux nodes.
Before you begin
Read the official guide for troubleshooting Kubernetes clusters. Also, read the Microsoft engineer's guide to Kubernetes troubleshooting. This guide contains commands for troubleshooting pods, nodes, clusters, and other features.
Prerequisites
- Azure CLI, version 2.31 or a later version. If Azure CLI is already installed, you can find the version number by running
az --version
.
Basic troubleshooting
AKS continuously monitors the health state of worker nodes, and automatically repairs the nodes if they become unhealthy. The Azure Virtual Machine (VM) platform maintains VMs that experience issues. AKS and Azure VMs work together to reduce service disruptions for clusters.
For nodes, there are two forms of heartbeats:
Updates to the .status of a
Node
object.Lease objects within the kube-node-lease namespace. Each
Node
has an associatedLease
object.
Compared to updates to the .status of a Node
, a Lease
is a lightweight resource. Using Lease
objects for heartbeats reduces the performance impact of these updates for large clusters.
The kubelet is responsible for creating and updating the .status for Node
objects. It's also responsible for updating the Lease
objects that are related to the Node
objects.
- The kubelet updates the node
.status
when there's a change in status or if there has been no update for a configured interval. The default interval for.status
updates to nodes is five minutes, which is much longer than the 40-second default time-out for unreachable nodes. - The kubelet creates and then updates its
Lease
object every 10 seconds (the default update interval).Lease
updates occur independently from updates to the node.status
. If theLease
update fails, the kubelet retries, using an exponential backoff that starts at 200 milliseconds and is capped at seven seconds.
You can't schedule a pod on a node that has a status of NotReady
or Unknown
. You can schedule a pod only on nodes that are in the Ready
state.
If your node is in the MemoryPressure
, DiskPressure
, or PIDPressure
state, you must manage your resources in order to schedule extra pods on the node. If your node is in NetworkUnavailable
mode, you must configure the network on the node correctly.
AKS manages the lifecycle and operations of agent nodes for you. Modifying the IaaS resources associated with the agent nodes isn't supported. For example, customizing a node through SSH connections, updating packages, or changing the network configuration on a node isn't supported. For more information, see AKS support coverage for agent nodes.
Make sure that the following conditions are met:
Your cluster is in Succeeded (Running) state. To check the cluster status on the Azure portal, search for and select Kubernetes services, and select the name of your AKS cluster. Then, on the cluster's Overview page, look in Essentials to find the Status. Or, enter the az aks show command in Azure CLI.
Your node pool has a Provisioning state of Succeeded and a Power state of Running. To check the node pool status on the Azure portal, return to your AKS cluster's page, and then select Node pools. Alternatively, enter the az aks nodepool show command in Azure CLI.
The required egress ports are open in your network security groups (NSGs) and firewall so that the API server's IP address can be reached. For more information, see Required outbound network rules and FQDNs for AKS clusters.
Your nodes have deployed the latest node images.
Your nodes are in the
Running
state instead ofStopped
orDeallocated
.Your cluster is running an AKS-supported version of Kubernetes.
More information
To troubleshoot the Not Ready
status of a node, see Troubleshoot a change in a healthy node to Not Ready status.
Third-party contact disclaimer
Microsoft provides third-party contact information to help you find additional information about this topic. This contact information may change without notice. Microsoft does not guarantee the accuracy of third-party contact information.