Share via

Self-hosted Karpenter in AKS - Nodes are in NotReady state due to CNI issues

Deivasigamani Duraisamy 45 Reputation points
2026-04-14T04:35:31.87+00:00

What karpenter features are relevant?

I use Azure CNI in AKS v1.32.x.

NAP

The Karpenter-provider-azure works fine when I enable it via NAP https://learn.microsoft.com/en-us/azure/aks/node-auto-provisioning. I can see the nodes in Ready state and they can run the pods.

Self-hosted

But, when I try to self-host it https://github.com/Azure/karpenter-provider-azure?tab=readme-ov-file#installation-self-hosted, the nodes are going to NotReady state.

# k get no -o yaml aks-node1-xxxxxx

  - lastHeartbeatTime: "2026-04-13T15:49:14Z"
    lastTransitionTime: "2026-04-13T15:23:14Z"
    message: 'container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady
      message:Network plugin returns error: cni plugin not initialized'
    reason: KubeletNotReady
    status: "False"
    type: Ready
# k describe no aks-node1-xxxxxx

Ready                         False   Mon, 13 Apr 2026 21:19:14 +0530   Mon, 13 Apr 2026 20:53:14 +0530   KubeletNotReady                 container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized

Events:
  Type    Reason                   Age                From                                                          Message
  ----    ------                   ----               ----                                                          -------
  Normal  Starting                 27m                kubelet                                                       Starting kubelet.
  Normal  NodeHasSufficientMemory  27m (x2 over 27m)  kubelet                                                       Node aks-node1-xxxxxx status is now: NodeHasSufficientMemory
  Normal  NodeHasNoDiskPressure    27m (x2 over 27m)  kubelet                                                       Node aks-node1-xxxxxx status is now: NodeHasNoDiskPressure
  Normal  NodeHasSufficientPID     27m (x2 over 27m)  kubelet                                                       Node aks-node1-xxxxxx status is now: NodeHasSufficientPID
  Normal  NodeAllocatableEnforced  27m                kubelet                                                       Updated Node Allocatable limit across pods
  Normal  RegisteredNode           27m                node-controller                                               Node aks-node1-xxxxxx event: Registered Node aks-node1-xxxxxx in Controller
  Normal  NoVMEventScheduled       14m                custom-scheduledevents-consolidated-condition-plugin-monitor  Node condition VMEventScheduled is now: Unknown, reason: NoVMEventScheduled, message: "IMDS query failed, exit code: 28\nConnection timed out after 24 seconds."
  Normal  VMEventScheduled         14m                karpenter                                                     Status condition transitioned, Type: VMEventScheduled, Status: False -> Unknown, Reason: NoVMEventScheduled, Message: IMDS query failed, exit code: 28
Connection timed out after 24 seconds.
  Normal   NoVMEventScheduled  14m                custom-scheduledevents-consolidated-condition-plugin-monitor  Node condition VMEventScheduled is now: False, reason: NoVMEventScheduled, message: "VM has no scheduled event"
  Warning  PreemptScheduled    14m (x2 over 14m)  custom-scheduledevents-consolidated-preempt-plugin-monitor    IMDS query failed, exit code: 28
Connection timed out after 24 seconds.
  Normal   VMEventScheduled    14m                  karpenter            Status condition transitioned, Type: VMEventScheduled, Status: Unknown -> False, Reason: NoVMEventScheduled, Message: VM has no scheduled event
  Warning  CoreDNSUnreachable  4m49s (x3 over 14m)  dns-problem-monitor  dns test to coredns:10.2.72.10 over udp failed after 2 attempts
  Normal   DisruptionBlocked   72s (x14 over 27m)   karpenter            Node isn't initialized

karpenter-values.yaml

    - name: NETWORK_PLUGIN
      value: azure
    - name: NETWORK_PLUGIN_MODE
      value: ""
    - name: NETWORK_POLICY
      value: azure

AKS Cluster Spec

        "networkProfile": {
            "networkPlugin": "azure",
            "networkPolicy": "azure",
            "networkDataplane": "azure",

Question

As I mentioned, I use Azure CNI. Is this a problem if I don't enable 1) overlay and 2) cilium dataplane.

  --network-plugin azure --network-plugin-mode overlay --network-dataplane cilium \

How is it working when I have the same Azure CNI but with NAP? https://github.com/Azure/karpenter-provider-azure/issues/1639

Azure Kubernetes Service
Azure Kubernetes Service

An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.


Answer accepted by question author

  1. Jilakara Hemalatha 13,330 Reputation points Microsoft External Staff Moderator
    2026-04-14T06:21:55.23+00:00

    Hello

    Thank you for sharing the detailed logs and configuration — that really helped narrow this down.

    From what you’ve described, the issue is happening because of a mismatch between your AKS cluster networking and what the self-hosted setup of karpenter-provider-azure expects during node provisioning.

    Your cluster is currently configured with Azure CNI in standard (non-overlay) mode along with Azure network policy and Azure dataplane. However, as per the official karpenter-provider-azure documentation, the self-hosted installation expects the cluster to be created using Azure CNI in overlay mode along with the Cilium dataplane enabled.

    This requirement is part of the documented cluster setup for self-hosted deployments in the installation guide

    https://github.com/Azure/karpenter-provider-azure/blob/main/README.md

    Because your cluster is not configured with overlay mode or Cilium dataplane, the nodes provisioned by self-hosted Karpenter are unable to initialize the CNI plugin correctly. This results in the error “NetworkPluginNotReady: cni plugin not initialized,” which indicates that the Kubernetes node failed to initialize the container networking stack.

    This also explains why things work when you enable Node Auto Provisioning (NAP) in Azure Kubernetes Service. In that case, Azure handles the node provisioning and networking configuration internally, so everything is aligned automatically and the nodes come up in a Ready state.

    To answer your question directly — yes, even though you are using Azure CNI, not enabling overlay mode and Cilium dataplane is very likely the reason for this behavior in the self-hosted scenario.

    At this point, you have two practical options. The simplest is to continue using NAP since it is already working correctly in your environment. If you specifically want to use self-hosted Karpenter, then the cluster networking needs to be aligned to Azure CNI overlay with Cilium so that it matches the expected configuration.

    You can refer to the following documentation for more details:

    Overview of networking configurations for node auto-provisioning (NAP) in Azure Kubernetes Service (AKS)

    Overview of Azure CNI Overlay networking in Azure Kubernetes Service (AKS)

    Configure Azure CNI Powered by Cilium in Azure Kubernetes Service (AKS)

    Troubleshoot node auto-provisioning (NAP) in Azure Kubernetes Service (AKS)

    Hope this helps! Please let me know if you have any queries.

    Was this answer helpful?

    1 person found this answer helpful.

Answer accepted by question author

  1. Alex Burlachenko 20,825 Reputation points MVP Volunteer Moderator
    2026-04-14T06:00:23.44+00:00

    Deivasigamani Duraisamy hi and thx for join us at Q&A portal,

    yeah this is CNI mismatch + bootstrap issue, not really Karpenter bug, NAP works bc AKS wires everything automatically but self-hosted u must match networking exactly, right now ur nodes come up but CNI never initializes so NotReady, error already says it cni plugin not initialized and IMDS timeout, that combo = node cant bootstrap networking, main issue u are mixing modes, Azure CNI classic is not same as overlay or cilium, and ur config hints conflict (env empty mode but Dockerfile shows overlay+cilium), so node expects different stack than cluster provides, fix is align exactly with cluster, if cluster is Azure CNI + azure dataplane then use --network-plugin azure --network-plugin-mode "" --network-dataplane azure, no overlay, no cilium, also IMDS timeout is critical, node must reach 169.254.169.254, if not then NSG/route/subnet is wrong and bootstrap fails so CNI never starts, thats why NAP works (AKS handles wiring) and self-hosted breaks, tl dr mismatch in CNI config + IMDS blocked

    rgds,

    Alex

    Was this answer helpful?

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.