AKS - when cpu usage is at the maximum value, requests are dropped

Jens Voorpyl 15 Reputation points
2023-05-03T14:30:10.11+00:00

In Azure kubernetes services I deploy pods containing a Flask API. Each pod gets assigned 500 millicores of cpu resources.

When I run many requests at once, the 500m limit is reached for all pods. The autoscaler scales up the pods, but this takes a while. While there are no pods available, requests are ignored and left pending.

I started solving this by implementing a WSGI like Waitress to keep the requests in a queue. But now I am wondering if this is the way to go, because the WSGI will be in the container that runs Flask. And since that container is on its maximum cpu, would adding a request queue even change anything if it is unreachable?

I also thought about creating the queue by hand. With this I mean: storing the requests flask receives and then executing them when there is free cpu. Again though I run into the problem that the requests would not even reach this handmade queue.

Important to know is that I ran the container locally in dockerdesktop and it was able to handle all the requests I sent it. It took a while to process them all, but none got an error and none were ignored.

Azure Kubernetes Service
Azure Kubernetes Service
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
2,447 questions
{count} votes

1 answer

Sort by: Most helpful
  1. kobulloc-MSFT 26,801 Reputation points Microsoft Employee Moderator
    2023-05-03T18:20:05.4066667+00:00

    Hello, @Jens Voorpyl !

    How can I scale up my AKS resource faster than using the cluster autoscaler?

    There are many scaling options available in AKS including manual scaling, the horizontal pod autoscaler, the cluster autoscaler, and integrating with Azure Container Instances (ACI). The cluster autoscaler may take a few minutes for nodes to provision and for Kubernetes scheduler to run pods them.

    A rapid scaling option would be to integrate with Azure Container Instances (ACI):

    https://learn.microsoft.com/en-us/azure/aks/concepts-scale#burst-to-azure-container-instances-aci

    ACI lets you quickly deploy container instances without additional infrastructure overhead. When you connect with AKS, ACI becomes a secured, logical extension of your AKS cluster. The virtual nodes component, which is based on virtual Kubelet, is installed in your AKS cluster that presents ACI as a virtual Kubernetes node. Kubernetes can then schedule pods that run as ACI instances through virtual nodes, not as pods on VM nodes directly in your AKS cluster. Your application requires no modifications to use virtual nodes. Your deployments can scale across AKS and ACI and with no delay as the cluster autoscaler deploys new nodes in your AKS cluster. Virtual nodes are deployed to an additional subnet in the same virtual network as your AKS cluster. This virtual network configuration secures the traffic between ACI and AKS. Like an AKS cluster, an ACI instance is a secure, logical compute resource isolated from other users. Kubernetes burst scaling to ACI

    Hopefully this is what you are looking for! If you have additional questions, please let us know in the comments.

    If this has been helpful, please take a moment to accept answers as this helps increase visibility of this question for other members of the Microsoft Q&A community. Thank you for helping to improve Microsoft Q&A!

    User's image


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.