question

DavidGiron-0877 avatar image
4 Votes"
DavidGiron-0877 asked PatrickduBoucherRyan-1771 answered

Cannot schedule Pods in AKS GPU node remediator.aks.microsoft.com/unschedulable

I have a Nodepool with Spot GPU nodes NC4as_T4_v3 and cluster autoscaling 0-1 .

After scheduling a Pod with Request nvidia.com/gpu , the Node would spawn, but it has this taint:
remediator.aks.microsoft.com/unschedulable

The nodepool does not launch any other new nodes and my Pod keeps in Pending state.

What is this taint and how can I prevent this happening, or fixing this problem?

azure-kubernetes-service
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi @DavidGiron ,
Just checking if you have any update on the below answer?

0 Votes 0 ·
srbhatta-msft avatar image
0 Votes"
srbhatta-msft answered srbhatta-msft edited

Hi @DavidGiron-0877 ,
Thanks for reaching out to Microsoft QnA.
This seems like the AKS remediator has added this taint. Without having a look at the logs, it will be difficult for us to confirm or state a reason for this.
Have you checked the kubelet or control plane logs and were you able to find anything there? You can refer to the below links.
https://docs.microsoft.com/en-us/azure/aks/kubelet-logs
https://docs.microsoft.com/en-us/azure/aks/monitor-aks#collect-resource-logs

Edit:
I researched a bit more and checked internally. Seems that this is most likely going to be the autodrain feature with spot - Automatically repairing Azure Kubernetes Service (AKS) nodes - Azure Kubernetes Service | Microsoft Docs .

Hope this helps!


Please don't forget to 179759-accept.png and 179670-upvote.png if you think the information provided was useful so that it can help others in the community looking for help on similar issues.


5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

DavidGiron-0877 avatar image
0 Votes"
DavidGiron-0877 answered

I couldn't access the kubelet logs since the node was in NotReady state.
Your explanation regarding autodrain node for Spot makes sense, so I assume that's the cause (Spot capacity dropped and the node gets removed / Preempted).
Still I expect the node gets back to Ready once there is capacity back again, but instead I had to manually scale the nodepool to get the node back again.


5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

NielsClaeys-0699 avatar image
0 Votes"
NielsClaeys-0699 answered

I had exactly the same issue twice with my cluster. No new nodes were provisioned for about a week because the autoscaler thought that the nodepool was tainted.
In our cases it got fixed by:
- manually scaling the nodepool as David says
- autoscaler being restarted due to cluster upgrade

It seems that when the last node for the nodepool gets drained and the remediator taint gets added, the state of the nodepool in the autoscaler never gets updated/refreshed and thus it keeps thinking that the nodepool cannot be used. Can anyone confirm this?

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

MichaelTaron-0047 avatar image
0 Votes"
MichaelTaron-0047 answered

We are also hitting this issue - since about noon yesterday no new nodes GPU nodes were provisioned in a specific node pool. Upgrading the node pool images didn't help (already at the latest non-preview k8s version, so didn't want to upgrade the cluster), but manually scaling the node pool like Niels suggested seems to have done the trick.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

PatrickduBoucherRyan-1771 avatar image
0 Votes"
PatrickduBoucherRyan-1771 answered

Hit this today - though strangely, we could see the kubelet wouldn't schedule a pod because of this taint - but there were no nodes, or nodepools with this taint.

1 node(s) had taint {remediator.aks.microsoft.com/unschedulable: }

the taint couldn't be found with kubectl, az aks or in the portal.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.