Edit

Share via


Troubleshoot a KubernetesCluster with a node in NotReady state

Follow this troubleshooting guide if you see a kubernetesCluster with a node in NotReady.

Prerequisites

  • Ability to run kubectl commands against the KubernetesCluster
  • Familiarity with the capabilities referenced in this article by reviewing the Baremetalmachine actions.

Cause

  • After Baremetalmachine restart or Cluster runtime upgrade, a node may enter the NotReady status.
  • Tainting, cordoning, or powering off a Baremetalmachine causes nodes running on that Baremetalmachine to become NotReady. If possible, remove the taint, uncordon, or power on the Baremetalmachine. If not possible, the following the procedure below may allow the node to reschedule to a different Baremetalmachine.

Procedure

Delete the node by following the instructions below. This will allow the Cluster to attempt to reschedule and restart the node.

  1. Use kubectl to list the nodes using the wide flag. Observe the node in NotReady status.

    $ kubectl get nodes -owide
    NAME                                                 STATUS     ROLES           AGE    VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE            KERNEL-VERSION     CONTAINER-RUNTIME
    mytest-naks1-3b466a17-agentpool1-md-6bg5h-7qt2b   Ready      <none>          6d3h   v1.27.3   10.4.74.30    <none>        CBL-Mariner/Linux   5.15.153.1-2.cm2   containerd://1.6.26
    mytest-naks1-3b466a17-agentpool1-md-6bg5h-dqmzw   Ready      <none>          6d3h   v1.27.3   10.4.74.31    <none>        CBL-Mariner/Linux   5.15.153.1-2.cm2   containerd://1.6.26
    mytest-naks1-3b466a17-agentpool1-md-6bg5h-lkhhq   NotReady   <none>          6d3h   v1.27.3   10.4.74.29    <none>        CBL-Mariner/Linux   5.15.153.1-2.cm2   containerd://1.6.26
    mytest-naks1-3b466a17-control-plane-6q7ns         Ready      control-plane   6d3h   v1.27.3   10.4.74.14    <none>        CBL-Mariner/Linux   5.15.153.1-2.cm2   containerd://1.6.26
    mytest-naks1-3b466a17-control-plane-8qqvz         Ready      control-plane   6d3h   v1.27.3   10.4.74.28    <none>        CBL-Mariner/Linux   5.15.153.1-2.cm2   containerd://1.6.26
    mytest-naks1-3b466a17-control-plane-g42mh         Ready      control-plane   6d3h   v1.27.3   10.4.74.32    <none>        CBL-Mariner/Linux   5.15.153.1-2.cm2   containerd://1.6.26
    
  2. Issue the kubectl command to delete the node.

    $ kubectl delete node mytest-naks1-3b466a17-agentpool1-md-6bg5h-lkhhq
    node "mytest-naks1-3b466a17-agentpool1-md-6bg5h-lkhhq" deleted
    
  3. List the nodes again and see that the node is gone.

    $ kubectl get nodes -owide
    NAME                                                 STATUS   ROLES           AGE    VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE            KERNEL-VERSION     CONTAINER-RUNTIME
    mytest-naks1-3b466a17-agentpool1-md-6bg5h-7qt2b   Ready    <none>          6d3h   v1.27.3   10.4.74.30    <none>        CBL-Mariner/Linux   5.15.153.1-2.cm2   containerd://1.6.26
    mytest-naks1-3b466a17-agentpool1-md-6bg5h-dqmzw   Ready    <none>          6d3h   v1.27.3   10.4.74.31    <none>        CBL-Mariner/Linux   5.15.153.1-2.cm2   containerd://1.6.26
    mytest-naks1-3b466a17-control-plane-6q7ns         Ready    control-plane   6d3h   v1.27.3   10.4.74.14    <none>        CBL-Mariner/Linux   5.15.153.1-2.cm2   containerd://1.6.26
    mytest-naks1-3b466a17-control-plane-8qqvz         Ready    control-plane   6d3h   v1.27.3   10.4.74.28    <none>        CBL-Mariner/Linux   5.15.153.1-2.cm2   containerd://1.6.26
    mytest-naks1-3b466a17-control-plane-g42mh         Ready    control-plane   6d3h   v1.27.3   10.4.74.32    <none>        CBL-Mariner/Linux   5.15.153.1-2.cm2   containerd://1.6.26
    
  4. Wait 5-15 minutes for the node to be replaced. See that its returned with a new name. It will show NotReady as it comes up.

    $ kubectl get nodes -owide
    NAME                                                 STATUS     ROLES           AGE    VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE            KERNEL-VERSION     CONTAINER-RUNTIME
    mytest-naks1-3b466a17-agentpool1-md-6bg5h-7qt2b   Ready      <none>          6d3h   v1.27.3   10.4.74.30    <none>        CBL-Mariner/Linux   5.15.153.1-2.cm2   containerd://1.6.26
    mytest-naks1-3b466a17-agentpool1-md-6bg5h-dqmzw   Ready      <none>          6d3h   v1.27.3   10.4.74.31    <none>        CBL-Mariner/Linux   5.15.153.1-2.cm2   containerd://1.6.26
    mytest-naks1-3b466a17-agentpool1-md-6bg5h-nxkks   NotReady   <none>          42s    v1.27.3   10.4.74.12    <none>        CBL-Mariner/Linux   5.15.153.1-2.cm2   containerd://1.6.26
    mytest-naks1-3b466a17-control-plane-6q7ns         Ready      control-plane   6d3h   v1.27.3   10.4.74.14    <none>        CBL-Mariner/Linux   5.15.153.1-2.cm2   containerd://1.6.26
    mytest-naks1-3b466a17-control-plane-8qqvz         Ready      control-plane   6d3h   v1.27.3   10.4.74.28    <none>        CBL-Mariner/Linux   5.15.153.1-2.cm2   containerd://1.6.26
    mytest-naks1-3b466a17-control-plane-g42mh         Ready      control-plane   6d3h   v1.27.3   10.4.74.32    <none>        CBL-Mariner/Linux   5.15.153.1-2.cm2   containerd://1.6.26
    
  5. Wait a bit longer and the NotReady node becomes Ready.

    $ kubectl get nodes -owide
    NAME                                                 STATUS   ROLES           AGE    VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE            KERNEL-VERSION     CONTAINER-RUNTIME
    mytest-naks1-3b466a17-agentpool1-md-6bg5h-7qt2b   Ready    <none>          6d3h   v1.27.3   10.4.74.30    <none>        CBL-Mariner/Linux   5.15.153.1-2.cm2   containerd://1.6.26
    mytest-naks1-3b466a17-agentpool1-md-6bg5h-dqmzw   Ready    <none>          6d3h   v1.27.3   10.4.74.31    <none>        CBL-Mariner/Linux   5.15.153.1-2.cm2   containerd://1.6.26 
    mytest-naks1-3b466a17-agentpool1-md-6bg5h-nxkks   Ready    <none>          97s    v1.27.3   10.4.74.12    <none>        CBL-Mariner/Linux   5.15.153.1-2.cm2   containerd://1.6.26
    mytest-naks1-3b466a17-control-plane-6q7ns         Ready    control-plane   6d3h   v1.27.3   10.4.74.14    <none>        CBL-Mariner/Linux   5.15.153.1-2.cm2   containerd://1.6.26
    mytest-naks1-3b466a17-control-plane-8qqvz         Ready    control-plane   6d3h   v1.27.3   10.4.74.28    <none>        CBL-Mariner/Linux   5.15.153.1-2.cm2   containerd://1.6.26
    mytest-naks1-3b466a17-control-plane-g42mh         Ready    control-plane   6d3h   v1.27.3   10.4.74.32    <none>        CBL-Mariner/Linux   5.15.153.1-2.cm2   containerd://1.6.26
    

If you still have questions, contact support. For more information about Support plans, see Azure Support plans.