Node not in Nodepool after scaling up

Question

I have a K8s cluster (Azure Kubernetes Service) with a node pool as follows in the Azure Portal:
Provisioning state: Succeeded
Power state: Running (3/3 nodes ready)
Node count: 3 nodes
(more details below)

I deployed new applications into my cluster and got "Insufficient CPU" so I decided to scale to 4 nodes via Azure Portal.

The following changed:
Node count: 4 nodes

But there are still only 3/3 nodes ready and the node list still shows 3 nodes. And I do still have the "Insufficient CPU" problem :(

Running "az aks nodepool list" shows:
[
{
"availabilityZones": null,
"count": 4,
"creationData": null,
"enableAutoScaling": false,
"enableEncryptionAtHost": false,
"enableFips": false,
"enableNodePublicIp": false,
"enableUltraSsd": false,
"gpuInstanceProfile": null,
"id": XXX,
"kubeletConfig": null,
"kubeletDiskType": "OS",
"linuxOsConfig": null,
"maxCount": null,
"maxPods": 110,
"minCount": null,
"mode": "System",
"name": "default",
"nodeImageVersion": "AKSUbuntu-1804containerd-2021.10.19",
"nodeLabels": null,
"nodePublicIpPrefixId": null,
"nodeTaints": null,
"orchestratorVersion": "1.20.9",
"osDiskSizeGb": 128,
"osDiskType": "Managed",
"osSku": "Ubuntu",
"osType": "Linux",
"podSubnetId": null,
"powerState": {
"code": "Running"
},
"provisioningState": "Succeeded",
"proximityPlacementGroupId": null,
"resourceGroup": "sgkb-web-develop",
"scaleDownMode": null,
"scaleSetEvictionPolicy": null,
"scaleSetPriority": null,
"spotMaxPrice": null,
"tags": null,
"type": "Microsoft.ContainerService/managedClusters/agentPools",
"typePropertiesType": "VirtualMachineScaleSets",
"upgradeSettings": {
"maxSurge": null
},
"vmSize": "Standard_D4_v3",
"vnetSubnetId": null,
"workloadRuntime": null
}
]

Any ideas how to solve this?

Many thanks in advance
Markus

Accepted Answer

@Markus I understand you added a node but not seeing it as part of the nodepool.

The issue is that kubelet cannot be joined to the master.
Below are few pointers that you can check and see if that helps to find the issue:

Is the node VM (or VMSS) Succeeded and Running? If not then you may have to troubleshoot the Linux VM (or VMSS).
Else SSH into the node and check if kubelet service and docker service are running using sudo systemctl status kubelet and sudo systemctl status docker respectively.
If they are not running start them using sudo systemctl start
Is scale up giving a success response or is it failing. It can be that required endpoints are blocked. Please allow the endpoints mentioned here and AgentPoolExtensionProvisioning errors might be there.
If using custom DNS maybe the node cannot resolve to the required endpoints. [Leads to similar errors as above]
Also please try running AKS Diagnostics and see the results

Hope this helps. Let me know if you are still having issues or need further assistance.

-------------------------------

Please don’t forget to "Accept the answer" and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

Answer

Dear @deherman-MSFT

Thank you for your comprehensive answer and yes, your understanding is correct.

Out of despair we continued trying to fix our cluster and we realized, that there where indeed 4 instances "running" under "Virtual machines scale sets". We could identify one of them (by computer name) as not being member of the node pool. We stopped this VM. Afterwards the numbers did match up (3/3 Ready -> 3 Nodes) and we could successfully scale up to 4 nodes.

So I guess following your answer would have led us to solution as well.

Best regards
Markus

Node not in Nodepool after scaling up

1 additional answer