azure kubernetes - subscription node limit

AJG 436 Reputation points
2020-09-11T16:27:33.09+00:00

I have been spending a little while now looking at options for doing hpc style compute on azure - without any real success.

I have now tried two approaches:

1.- azure batch
looks good - but most of the examples relate to trivial parallelization patterns, and even if you can figure out how to make it work (one of the suggested approaches doesn't seem to be supported - or at least after quite some time support have not come back with an answer - when I pointed out their proposed solution doesnt seem to be supported) - even then, if you need good interprocess comms - thats limited - so forget about real scalable HPC - its just not up to the job.

2.- AKS
i think a much better architecture - can do some refactoring - so less requirement for traditional ipc - so lets try that. set up an AKS cluster - got that working (there are some errors in the docs). intended usage - create an AKS cluster of whatever the required size is (100+ nodes) , run some jobs, then destroy the cluster. so - how did that go ? - well - all good for a 2 node cluster .. but then - when upscaling the cluster (using the portal) - it produced the message:

  • "The maximum node count you can select is 4 due to the remaining quota in the selected subscription (4 cores)."

which surprised me - since i upgraded to a paid subscription - because i want 'compute as a (paid) service on demand.

I cant do anything useful with 4 cores ... how to fix this ?

the situation remains that I have almost 100x the compute power at home than I have so far managed to achieve on azure ...

A

Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
2,146 questions
{count} votes

Accepted answer
  1. whoward-msft 2,766 Reputation points
    2020-09-16T23:40:11.193+00:00

    Thanks for the followup @AJG ,

    So since your on a paid plan and your regional CPU limit is low you can request a quota increase to increase your VM vCPU Limit. All of this is subject to availability in each region. While the default for CPU cores is usually 20 per region per subscription this does depend on availability. Therefore you best bet is to request a quote increase here, you can request a quota increase for no additional charge. Have a good one.

    0 comments No comments

2 additional answers

Sort by: Most helpful
  1. whoward-msft 2,766 Reputation points
    2020-09-11T18:46:54.727+00:00

    Hi AJG-8960,

    Thanks for your question, firstly I just want to point our that an AKS Node isn't the same as a CPU Core, a AKS Node is a an Azure VM that runs the Kubernetes node components and container runtime (Moby). So you can increase the Azure VM Size for your nodes to get more CPUs, Memory, or more storage accordingly.

    If you still want to scale up your AKS Cluster the nodes per cluster limit is 100 if your using a Basic Load Balancer and you can go up to a 1000 nodes with a Standard Load Balancer. That's just per cluster, you can have 100 clusters per subscription. Did you purchase your subscription through a CSP? How many other VMs are you running on the same subscription? Other VMs (even if they aren't being used with AKS) could be using your up your core quota. For more on quotas see here

    I look forward to hearing your reply and getting you up and running on AKS.


  2. AJG 436 Reputation points
    2020-09-15T16:04:44.82+00:00

    Hi AJG-8960,
    Thanks for your question, firstly I just want to point our that an AKS Node isn't the same as a CPU Core, a AKS Node is a an Azure VM that runs the Kubernetes node components and container runtime (Moby). So you can increase the Azure VM Size for your nodes to get more CPUs, Memory, or more storage accordingly.

    On the node vs vm issue - that is a fair point. However, whether or not this would help depends on the devil in the detail - which I would need to understand a bit better than I currently do.

    For example, lets suppose that I have a calculation that can easily be parallelized - something like an SPMD/MPI workflow but perhaps using storage queue or message bus for coordinating the various pieces. Then I create a container that does does a piece of the calculation.

    Then, for example - if I provision 10 nodes - but each runs on a very powerful VM with many vCPUs - can I arrange for multiple instances of that containerized component that does the calculation - to run on each AKS node ?

    ie - in AKS terms each 'node' is a powerful multi-cpu VM, but in terms of the parallelization model each 'compute node' is an instance of the container which 'does the calculation' - which needs to be repeated n-fold in each AKS node, according to its underlying compute power.

    Im guessing the answer must be 'yes' - I just need to find out how - and for the moment try keep things simple and have one compute instance per AKS node.

    If you still want to scale up your AKS Cluster the nodes per cluster limit is 100 if your using a Basic Load Balancer and you can go up to a 1000 nodes with a Standard Load Balancer. That's just per cluster, you can have 100 clusters per subscription. Did you purchase your subscription through a CSP? How many other VMs are you running on the same subscription? Other VMs (even if they aren't being used with AKS) could be using your up your core quota. For more on quotas see here

    If the nodes per cluster limit is 100 - that is fine - I saw somewhere that Kubernetes itself goes up to 5000 nodes, but I could happily live with 100 - especially given your clarification re AKS nodes and vCPUs. What I want to do is dynamically create clusters - run analytics - and then de-provision the cluster.

    I submitted a support request on the upscale issue, and was directed towards the Get-AzVMUsage cmdlet - which was quite illuminating- according to that the(top) entries were:

    Name Current Value Limit Unit


    Availability Sets 0 2500 Count
    Total Regional vCPUs 6 10 Count
    Virtual Machines 3 25000 Count
    Virtual Machine Scale Sets 1 2500 Count
    Dedicated vCPUs 0 3000 Count
    Total Regional Low-priority vCPUs 0 10 Count
    Standard DSv3 Family vCPUs 2 10 Count
    Standard DSv2 Family vCPUs 4 10 Count
    Basic A Family vCPUs 0 10 Count
    Standard A0-A7 Family vCPUs 0 10 Count

    I do not entirely understand the detailed definitions of the terms used here, but one thing that does strike me is that it is quite difficult to run 25,000 virtual machines - if you only have 10 vCPUs available - so the VM limit is irrelevant, but I would need a (large) increase in the 'Total Regional vCPU' limit, and the limit for whatever VM was provisioned - unless I misunderstood something ?

    I have some more low-level technical questions - about preserving affinity between AKS nodes, containers, vCPUs and hardware CPUs - but put those off until later.

    I look forward to hearing your reply and getting you up and running on AKS.

    see above !

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.