Hello.
Given: AKS with gpu nodepool, set for autoscaling with 1 node minimal, 10 - maximum
Deployment: 1 pod, strategy: rolling update, node requirement: gpu
What happens: deployment starts, a pod from the new deployment is in pending state. This continues for 10 minutes
kubectl get configmap -n kube-system cluster-autoscaler-status -o yaml shows:
apiVersion: v1
data:
status: |+
Cluster-autoscaler status at 2022-11-07 15:00:07.602507854 +0000 UTC:
Cluster-wide:
Health: Healthy (ready=6 unready=0 notStarted=0 longNotStarted=0 registered=6 longUnregistered=0)
LastProbeTime: 2022-11-07 15:00:07.591651854 +0000 UTC m=+96513.429961395
LastTransitionTime: 2022-11-06 12:11:44.655026033 +0000 UTC m=+10.493335474
ScaleUp: InProgress (ready=6 registered=6)
LastProbeTime: 2022-11-07 15:00:07.591651854 +0000 UTC m=+96513.429961395
LastTransitionTime: 2022-11-07 14:53:43.908385909 +0000 UTC m=+96129.746695450
ScaleDown: NoCandidates (candidates=0)
LastProbeTime: 2022-11-07 15:00:07.591651854 +0000 UTC m=+96513.429961395
LastTransitionTime: 2022-11-06 12:11:44.655026033 +0000 UTC m=+10.493335474
NodeGroups:
Name: aks-gpunp-14929123-vmss
Health: Healthy (ready=1 unready=0 notStarted=0 longNotStarted=0 registered=1 longUnregistered=0 cloudProviderTarget=2 (minSize=1, maxSize=3))
LastProbeTime: 2022-11-07 15:00:07.591651854 +0000 UTC m=+96513.429961395
LastTransitionTime: 2022-11-06 12:11:44.655026033 +0000 UTC m=+10.493335474
ScaleUp: InProgress (ready=1 cloudProviderTarget=2)
LastProbeTime: 2022-11-07 15:00:07.591651854 +0000 UTC m=+96513.429961395
LastTransitionTime: 2022-11-07 14:53:43.908385909 +0000 UTC m=+96129.746695450
ScaleDown: NoCandidates (candidates=0)
LastProbeTime: 2022-11-07 15:00:07.591651854 +0000 UTC m=+96513.429961395
LastTransitionTime: 2022-11-06 12:11:44.655026033 +0000 UTC m=+10.493335474
a new node becomes available only after 10-11 minutes.
How this can be improved?
Thanks.