Hi AJG-8960,
Thanks for your question, firstly I just want to point our that an AKS Node isn't the same as a CPU Core, a AKS Node is a an Azure VM that runs the Kubernetes node components and container runtime (Moby). So you can increase the Azure VM Size for your nodes to get more CPUs, Memory, or more storage accordingly.
On the node vs vm issue - that is a fair point. However, whether or not this would help depends on the devil in the detail - which I would need to understand a bit better than I currently do.
For example, lets suppose that I have a calculation that can easily be parallelized - something like an SPMD/MPI workflow but perhaps using storage queue or message bus for coordinating the various pieces. Then I create a container that does does a piece of the calculation.
Then, for example - if I provision 10 nodes - but each runs on a very powerful VM with many vCPUs - can I arrange for multiple instances of that containerized component that does the calculation - to run on each AKS node ?
ie - in AKS terms each 'node' is a powerful multi-cpu VM, but in terms of the parallelization model each 'compute node' is an instance of the container which 'does the calculation' - which needs to be repeated n-fold in each AKS node, according to its underlying compute power.
Im guessing the answer must be 'yes' - I just need to find out how - and for the moment try keep things simple and have one compute instance per AKS node.
If you still want to scale up your AKS Cluster the nodes per cluster limit is 100 if your using a Basic Load Balancer and you can go up to a 1000 nodes with a Standard Load Balancer. That's just per cluster, you can have 100 clusters per subscription. Did you purchase your subscription through a CSP? How many other VMs are you running on the same subscription? Other VMs (even if they aren't being used with AKS) could be using your up your core quota. For more on quotas see here
If the nodes per cluster limit is 100 - that is fine - I saw somewhere that Kubernetes itself goes up to 5000 nodes, but I could happily live with 100 - especially given your clarification re AKS nodes and vCPUs. What I want to do is dynamically create clusters - run analytics - and then de-provision the cluster.
I submitted a support request on the upscale issue, and was directed towards the Get-AzVMUsage cmdlet - which was quite illuminating- according to that the(top) entries were:
Name Current Value Limit Unit
Availability Sets 0 2500 Count
Total Regional vCPUs 6 10 Count
Virtual Machines 3 25000 Count
Virtual Machine Scale Sets 1 2500 Count
Dedicated vCPUs 0 3000 Count
Total Regional Low-priority vCPUs 0 10 Count
Standard DSv3 Family vCPUs 2 10 Count
Standard DSv2 Family vCPUs 4 10 Count
Basic A Family vCPUs 0 10 Count
Standard A0-A7 Family vCPUs 0 10 Count
I do not entirely understand the detailed definitions of the terms used here, but one thing that does strike me is that it is quite difficult to run 25,000 virtual machines - if you only have 10 vCPUs available - so the VM limit is irrelevant, but I would need a (large) increase in the 'Total Regional vCPU' limit, and the limit for whatever VM was provisioned - unless I misunderstood something ?
I have some more low-level technical questions - about preserving affinity between AKS nodes, containers, vCPUs and hardware CPUs - but put those off until later.
I look forward to hearing your reply and getting you up and running on AKS.
see above !