Cannot schedule Pods in AKS GPU node remediator.aks.microsoft.com/unschedulable

Question

I have a Nodepool with Spot GPU nodes NC4as_T4_v3 and cluster autoscaling 0-1 .

After scheduling a Pod with Request nvidia.com/gpu , the Node would spawn, but it has this taint:
remediator.aks.microsoft.com/unschedulable

The nodepool does not launch any other new nodes and my Pod keeps in Pending state.

What is this taint and how can I prevent this happening, or fixing this problem?

Accepted Answer

Hi @David Giron ,
Thanks for reaching out to Microsoft QnA.
This seems like the AKS remediator has added this taint. Without having a look at the logs, it will be difficult for us to confirm or state a reason for this.
Have you checked the kubelet or control plane logs and were you able to find anything there? You can refer to the below links.
https://learn.microsoft.com/en-us/azure/aks/kubelet-logs
https://learn.microsoft.com/en-us/azure/aks/monitor-aks#collect-resource-logs

Edit:
I researched a bit more and checked internally. Seems that this is most likely going to be the autodrain feature with spot - Automatically repairing Azure Kubernetes Service (AKS) nodes - Azure Kubernetes Service | Microsoft Learn .

Hope this helps!

-----------------------------------

Please don't forget to and if you think the information provided was useful so that it can help others in the community looking for help on similar issues.

Answer

I couldn't access the kubelet logs since the node was in NotReady state.
Your explanation regarding autodrain node for Spot makes sense, so I assume that's the cause (Spot capacity dropped and the node gets removed / Preempted).
Still I expect the node gets back to Ready once there is capacity back again, but instead I had to manually scale the nodepool to get the node back again.

Answer

I had exactly the same issue twice with my cluster. No new nodes were provisioned for about a week because the autoscaler thought that the nodepool was tainted.
In our cases it got fixed by:

manually scaling the nodepool as David says
autoscaler being restarted due to cluster upgrade

It seems that when the last node for the nodepool gets drained and the remediator taint gets added, the state of the nodepool in the autoscaler never gets updated/refreshed and thus it keeps thinking that the nodepool cannot be used. Can anyone confirm this?

Answer

We are also hitting this issue - since about noon yesterday no new nodes GPU nodes were provisioned in a specific node pool. Upgrading the node pool images didn't help (already at the latest non-preview k8s version, so didn't want to upgrade the cluster), but manually scaling the node pool like Niels suggested seems to have done the trick.

Answer

Hit this today - though strangely, we could see the kubelet wouldn't schedule a pod because of this taint - but there were no nodes, or nodepools with this taint.

1 node(s) had taint {remediator.aks.microsoft.com/unschedulable: }

the taint couldn't be found with kubectl, az aks or in the portal.

Cannot schedule Pods in AKS GPU node remediator.aks.microsoft.com/unschedulable

4 additional answers