Node not in Nodepool after scaling up

Markus 21 Reputation points
2022-04-05T08:52:58.343+00:00

I have a K8s cluster (Azure Kubernetes Service) with a node pool as follows in the Azure Portal:
Provisioning state: Succeeded
Power state: Running (3/3 nodes ready)
Node count: 3 nodes
(more details below)

I deployed new applications into my cluster and got "Insufficient CPU" so I decided to scale to 4 nodes via Azure Portal.

The following changed:
Node count: 4 nodes

But there are still only 3/3 nodes ready and the node list still shows 3 nodes. And I do still have the "Insufficient CPU" problem :(

Running "az aks nodepool list" shows:
[
{
"availabilityZones": null,
"count": 4,
"creationData": null,
"enableAutoScaling": false,
"enableEncryptionAtHost": false,
"enableFips": false,
"enableNodePublicIp": false,
"enableUltraSsd": false,
"gpuInstanceProfile": null,
"id": XXX,
"kubeletConfig": null,
"kubeletDiskType": "OS",
"linuxOsConfig": null,
"maxCount": null,
"maxPods": 110,
"minCount": null,
"mode": "System",
"name": "default",
"nodeImageVersion": "AKSUbuntu-1804containerd-2021.10.19",
"nodeLabels": null,
"nodePublicIpPrefixId": null,
"nodeTaints": null,
"orchestratorVersion": "1.20.9",
"osDiskSizeGb": 128,
"osDiskType": "Managed",
"osSku": "Ubuntu",
"osType": "Linux",
"podSubnetId": null,
"powerState": {
"code": "Running"
},
"provisioningState": "Succeeded",
"proximityPlacementGroupId": null,
"resourceGroup": "sgkb-web-develop",
"scaleDownMode": null,
"scaleSetEvictionPolicy": null,
"scaleSetPriority": null,
"spotMaxPrice": null,
"tags": null,
"type": "Microsoft.ContainerService/managedClusters/agentPools",
"typePropertiesType": "VirtualMachineScaleSets",
"upgradeSettings": {
"maxSurge": null
},
"vmSize": "Standard_D4_v3",
"vnetSubnetId": null,
"workloadRuntime": null
}
]

Any ideas how to solve this?

Many thanks in advance
Markus

Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
1,931 questions
0 comments No comments
{count} votes

Accepted answer
  1. deherman-MSFT 34,441 Reputation points Microsoft Employee
    2022-04-05T18:24:27.883+00:00

    @Markus I understand you added a node but not seeing it as part of the nodepool.

    The issue is that kubelet cannot be joined to the master.
    Below are few pointers that you can check and see if that helps to find the issue:

    • Is the node VM (or VMSS) Succeeded and Running? If not then you may have to troubleshoot the Linux VM (or VMSS).
    • Else SSH into the node and check if kubelet service and docker service are running using sudo systemctl status kubelet and sudo systemctl status docker respectively.
      If they are not running start them using sudo systemctl start <serviceName>
    • Is scale up giving a success response or is it failing. It can be that required endpoints are blocked. Please allow the endpoints mentioned here and AgentPoolExtensionProvisioning errors might be there.
    • If using custom DNS maybe the node cannot resolve to the required endpoints. [Leads to similar errors as above]
    • Also please try running AKS Diagnostics and see the results

    Hope this helps. Let me know if you are still having issues or need further assistance.

    -------------------------------

    Please don’t forget to "Accept the answer" and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Markus 21 Reputation points
    2022-04-06T11:37:15.94+00:00

    Dear @deherman-MSFT

    Thank you for your comprehensive answer and yes, your understanding is correct.

    Out of despair we continued trying to fix our cluster and we realized, that there where indeed 4 instances "running" under "Virtual machines scale sets". We could identify one of them (by computer name) as not being member of the node pool. We stopped this VM. Afterwards the numbers did match up (3/3 Ready -> 3 Nodes) and we could successfully scale up to 4 nodes.

    So I guess following your answer would have led us to solution as well.

    Best regards
    Markus

    0 comments No comments