Node not able to schedule

Question

Node not able to schedule

Manshu 0

Hi Team Actually we are facing an issue in scheduling nodes in node pools in the AKS cluster.

Our Goal is to create node pools of NCads H100 v5 Series nodes inside our AKS cluster

Subscription ID: ad**********************02
Region: US West 2

Thanks Manshu Sharma

Manshu 0

Hi Himanshu,

Following are the required details you asked for

If have you raised a quota request for Standard_NCads_H100_v5 in your subscription and in US West 2 specifically? Also let us know the VM size (SKU)?

(base) manshusharma@Manshus-MacBook-Air ARCHIVE % az vm list-usage --location westus2 -o table | grep NCadsH100v5

Standard NCadsH100v5 Family vCPUs         40              160

(base) manshusharma@Manshus-MacBook-Air ARCHIVE % az vm list-skus \
  --location westus2 \
  --size Standard_NC40ads_H100_v5 \
  -o table

ResourceType     Locations    Name                      Zones    Restrictions
---------------  -----------  ------------------------  -------  --------------
virtualMachines  westus2      Standard_NC40ads_H100_v5  1,3      None

Is it possible trying alternative regions (e.g., East US, South Central US, or West Europe) to test availability?

No reason we only get in this region

Is AKS cluster configured for Virtual Machine Scale Sets (VMSS), not Availability Sets?

Yes all nodepools are set with VirtualMachineScaleSets

(base) manshusharma@Manshus-MacBook-Air ARCHIVE % az aks show \
  --resource-group voicing-production \
  --name voicing-aks \
  --query "agentPoolProfiles[].type" -o table
Result
-----------------------
VirtualMachineScaleSets
VirtualMachineScaleSets
VirtualMachineScaleSets

Note: Nodepool creation command I am using

az aks nodepool add \
  --resource-group voicing-production \
  --cluster-name voicing-aks \
  --name h100pool \
  --node-count 1 \
  --node-vm-size Standard_NC40ads_H100_v5 \
  --os-sku Ubuntu \
  --os-type Linux \
  --node-taints sku=gpu:NoSchedule

Hemalatha 14,525 Reputation points Microsoft External Staff Moderator

2025-09-05T07:06:03.38+00:00

Hi Manshu,

Could you please provide the error screenshot you are seeing while running the command. So, it will be helpful to investigate the issue further.
Hemalatha 14,525 Reputation points Microsoft External Staff Moderator

2025-10-07T00:37:41.4366667+00:00

Hi Manshu,

If the query has been resolved, please take a moment to accept answers and upvote it 👍 to make it helpful to the community.

Thank you for helping to improve Microsoft Q&A!

2 answers

Your answer

Manshu 0 Reputation points

2025-09-05T04:23:01.7866667+00:00

Hi Himanshu,

Following are the required details you asked for

If have you raised a quota request for Standard_NCads_H100_v5 in your subscription and in US West 2 specifically? Also let us know the VM size (SKU)?

(base) manshusharma@Manshus-MacBook-Air ARCHIVE % az vm list-usage --location westus2 -o table | grep NCadsH100v5 Standard NCadsH100v5 Family vCPUs 40 160

(base) manshusharma@Manshus-MacBook-Air ARCHIVE % az vm list-skus \ --location westus2 \ --size Standard_NC40ads_H100_v5 \ -o table ResourceType Locations Name Zones Restrictions --------------- ----------- ------------------------ ------- -------------- virtualMachines westus2 Standard_NC40ads_H100_v5 1,3 None

Is it possible trying alternative regions (e.g., East US, South Central US, or West Europe) to test availability?

No reason we only get in this region

Is AKS cluster configured for Virtual Machine Scale Sets (VMSS), not Availability Sets?

Yes all nodepools are set with VirtualMachineScaleSets

(base) manshusharma@Manshus-MacBook-Air ARCHIVE % az aks show \ --resource-group voicing-production \ --name voicing-aks \ --query "agentPoolProfiles[].type" -o table Result ----------------------- VirtualMachineScaleSets VirtualMachineScaleSets VirtualMachineScaleSets

Note: Nodepool creation command I am using

az aks nodepool add \ --resource-group voicing-production \ --cluster-name voicing-aks \ --name h100pool \ --node-count 1 \ --node-vm-size Standard_NC40ads_H100_v5 \ --os-sku Ubuntu \ --os-type Linux \ --node-taints sku=gpu:NoSchedule
Hemalatha 14,525 Reputation points Microsoft External Staff Moderator

2025-09-05T07:06:03.38+00:00

Hi Manshu,

Could you please provide the error screenshot you are seeing while running the command. So, it will be helpful to investigate the issue further.
Hemalatha 14,525 Reputation points Microsoft External Staff Moderator

2025-10-07T00:37:41.4366667+00:00

Hi Manshu,

If the query has been resolved, please take a moment to accept answers and upvote it 👍 to make it helpful to the community.

Thank you for helping to improve Microsoft Q&A!

Answer 1

Hi Manshu,

Based on the information you provided, and known challenges related to this issue is most likely below factors:

1.Even though your subscription has sufficient vCPU quota and the SKU is listed as available, the physical capacity in that region may be exhausted at the time of your deployment attempt.

2.From the output you have shared in that H100s are available in zones 1 and 3 in westus2. By default, AKS may try to use all available zones for a VMSS-backed node pool. To increase the chances of a successful deployment, explicitly target an availability zone to your command.

Try running the command by specifying a zone that you know has the SKU:

az aks nodepool add \ 
--resource-group voicing-production \ 
--cluster-name voicing-aks \ 
--name h100pool \ 
--node-count 1 \ 
--node-vm-size Standard_NC40ads_H100_v5 \ 
--os-sku Ubuntu \ 
--os-type Linux \ 
--node-taints sku=gpu:NoSchedule
--zone 1 or 3

At last, the issue might be related to the specific taint (sku=gpu:NoSchedule) you've applied in the node pool. The taint essentially ensures that only pods with the corresponding toleration can be scheduled on these nodes.

So, verify whether tolerations have been correctly configured in the pod specification file.

User's image

Please find the below related official documentations:

https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/

https://learn.microsoft.com/en-us/azure/aks/operator-best-practices-advanced-scheduler#use-taints-and-tolerations

https://learn.microsoft.com/en-us/azure/aks/quotas-skus-regions

Hope this helps! Please let me know if you have any queries.

Hemalatha 14,525 Reputation points Microsoft External Staff Moderator

2025-09-08T09:42:29.2466667+00:00

Hi Manshu,

Just checking if provided response was helpful to you. Please let me know if you have any other queries.
Manshu 0 Reputation points

2025-09-09T04:28:28.11+00:00

Hi Jilakara,

It Works. Able to setup MIG as well
Thanks!
Hemalatha 14,525 Reputation points Microsoft External Staff Moderator

2025-09-09T06:14:23.4133333+00:00

Thanks for confirmation Manshu,

Answer 2

Hello Manshu

Welcome to Microsoft Q&A Platform. Thank you for reaching out & hope you are doing well.

please let us know few details to investigate further:

If have you raised a quota request for Standard_NCads_H100_v5 in your subscription and in US West 2 specifically? Also let us know the VM size (SKU)?
Is it possible trying alternative regions (e.g., East US, South Central US, or West Europe) to test availability?
Is AKS cluster configured for Virtual Machine Scale Sets (VMSS), not Availability Sets?

Although the vCPU quota for your subscription is sufficient and the requested SKU appears as available, physical capacity for that VM size might be exhausted in the selected region at the time of deployment.

This can result in allocation failures despite quota availability.

Please see Microsoft’s documentation on VM allocation failures:

https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/windows/allocation-failure

The output you provided indicates that H100s are available only in zones 1 and 3 within the westus2 region.

By default, AKS attempts to use all available zones for a VMSS-backed node pool, which may lead to failures if the required SKU is not present in every zone.

To improve deployment success, we need to specify a supported availability zone directly in command.

Please see Microsoft’s documentation on configuring AKS node pools and availability zones for best practices:

https://learn.microsoft.com/en-us/azure/aks/reliability-availability-zones-configure

https://docs.azure.cn/en-us/aks/reliability-zone-resiliency-recommendations

Regards

Himanshu

Share via

Node not able to schedule

2 answers

Your answer