azure kubernetes - subscription node limit

Question

azure kubernetes - subscription node limit

AJG 436

I have been spending a little while now looking at options for doing hpc style compute on azure - without any real success.

I have now tried two approaches:

1.- azure batch
looks good - but most of the examples relate to trivial parallelization patterns, and even if you can figure out how to make it work (one of the suggested approaches doesn't seem to be supported - or at least after quite some time support have not come back with an answer - when I pointed out their proposed solution doesnt seem to be supported) - even then, if you need good interprocess comms - thats limited - so forget about real scalable HPC - its just not up to the job.

2.- AKS
i think a much better architecture - can do some refactoring - so less requirement for traditional ipc - so lets try that. set up an AKS cluster - got that working (there are some errors in the docs). intended usage - create an AKS cluster of whatever the required size is (100+ nodes) , run some jobs, then destroy the cluster. so - how did that go ? - well - all good for a 2 node cluster .. but then - when upscaling the cluster (using the portal) - it produced the message:

"The maximum node count you can select is 4 due to the remaining quota in the selected subscription (4 cores)."

which surprised me - since i upgraded to a paid subscription - because i want 'compute as a (paid) service on demand.

I cant do anything useful with 4 cores ... how to fix this ?

the situation remains that I have almost 100x the compute power at home than I have so far managed to achieve on azure ...

A

whoward-msft 2,771 Reputation points

2020-09-14T17:20:42.443+00:00

Hi @AJG ,

Are you still having trouble with your AKS cluster? If so did you purchase your subscription through a CSP? Do you have other VMs running on the same subscription? See my detailed answer below. I'd like to help you solve this problem so you can scale out your AKS Cluster.

Accepted answer

2 additional answers

Your answer

whoward-msft 2,771 Reputation points

2020-09-14T17:20:42.443+00:00

Hi @AJG ,

Are you still having trouble with your AKS cluster? If so did you purchase your subscription through a CSP? Do you have other VMs running on the same subscription? See my detailed answer below. I'd like to help you solve this problem so you can scale out your AKS Cluster.

Answer 1

whoward-msft 2,771

Thanks for the followup @AJG ,

So since your on a paid plan and your regional CPU limit is low you can request a quota increase to increase your VM vCPU Limit. All of this is subject to availability in each region. While the default for CPU cores is usually 20 per region per subscription this does depend on availability. Therefore you best bet is to request a quote increase here, you can request a quota increase for no additional charge. Have a good one.

Answer 2

Hi AJG-8960,

Thanks for your question, firstly I just want to point our that an AKS Node isn't the same as a CPU Core, a AKS Node is a an Azure VM that runs the Kubernetes node components and container runtime (Moby). So you can increase the Azure VM Size for your nodes to get more CPUs, Memory, or more storage accordingly.

If you still want to scale up your AKS Cluster the nodes per cluster limit is 100 if your using a Basic Load Balancer and you can go up to a 1000 nodes with a Standard Load Balancer. That's just per cluster, you can have 100 clusters per subscription. Did you purchase your subscription through a CSP? How many other VMs are you running on the same subscription? Other VMs (even if they aren't being used with AKS) could be using your up your core quota. For more on quotas see here

I look forward to hearing your reply and getting you up and running on AKS.

AJG 436 Reputation points

2020-09-15T15:49:48.53+00:00

Hi,
Unfortunately the size limit on comments is small - so I will send my reply as an answer to my question - which it is not ....
A

Answer 3

Hi AJG-8960,
Thanks for your question, firstly I just want to point our that an AKS Node isn't the same as a CPU Core, a AKS Node is a an Azure VM that runs the Kubernetes node components and container runtime (Moby). So you can increase the Azure VM Size for your nodes to get more CPUs, Memory, or more storage accordingly.

On the node vs vm issue - that is a fair point. However, whether or not this would help depends on the devil in the detail - which I would need to understand a bit better than I currently do.

For example, lets suppose that I have a calculation that can easily be parallelized - something like an SPMD/MPI workflow but perhaps using storage queue or message bus for coordinating the various pieces. Then I create a container that does does a piece of the calculation.

Then, for example - if I provision 10 nodes - but each runs on a very powerful VM with many vCPUs - can I arrange for multiple instances of that containerized component that does the calculation - to run on each AKS node ?

ie - in AKS terms each 'node' is a powerful multi-cpu VM, but in terms of the parallelization model each 'compute node' is an instance of the container which 'does the calculation' - which needs to be repeated n-fold in each AKS node, according to its underlying compute power.

Im guessing the answer must be 'yes' - I just need to find out how - and for the moment try keep things simple and have one compute instance per AKS node.

If you still want to scale up your AKS Cluster the nodes per cluster limit is 100 if your using a Basic Load Balancer and you can go up to a 1000 nodes with a Standard Load Balancer. That's just per cluster, you can have 100 clusters per subscription. Did you purchase your subscription through a CSP? How many other VMs are you running on the same subscription? Other VMs (even if they aren't being used with AKS) could be using your up your core quota. For more on quotas see here

If the nodes per cluster limit is 100 - that is fine - I saw somewhere that Kubernetes itself goes up to 5000 nodes, but I could happily live with 100 - especially given your clarification re AKS nodes and vCPUs. What I want to do is dynamically create clusters - run analytics - and then de-provision the cluster.

I submitted a support request on the upscale issue, and was directed towards the Get-AzVMUsage cmdlet - which was quite illuminating- according to that the(top) entries were:

Name Current Value Limit Unit

Availability Sets 0 2500 Count
Total Regional vCPUs 6 10 Count
Virtual Machines 3 25000 Count
Virtual Machine Scale Sets 1 2500 Count
Dedicated vCPUs 0 3000 Count
Total Regional Low-priority vCPUs 0 10 Count
Standard DSv3 Family vCPUs 2 10 Count
Standard DSv2 Family vCPUs 4 10 Count
Basic A Family vCPUs 0 10 Count
Standard A0-A7 Family vCPUs 0 10 Count

I do not entirely understand the detailed definitions of the terms used here, but one thing that does strike me is that it is quite difficult to run 25,000 virtual machines - if you only have 10 vCPUs available - so the VM limit is irrelevant, but I would need a (large) increase in the 'Total Regional vCPU' limit, and the limit for whatever VM was provisioned - unless I misunderstood something ?

I have some more low-level technical questions - about preserving affinity between AKS nodes, containers, vCPUs and hardware CPUs - but put those off until later.

I look forward to hearing your reply and getting you up and running on AKS.

see above !

Share via

azure kubernetes - subscription node limit

2 additional answers

Your answer