AKS - when cpu usage is at the maximum value, requests are dropped

Question

AKS - when cpu usage is at the maximum value, requests are dropped

Jens Voorpyl 15

In Azure kubernetes services I deploy pods containing a Flask API. Each pod gets assigned 500 millicores of cpu resources.

When I run many requests at once, the 500m limit is reached for all pods. The autoscaler scales up the pods, but this takes a while. While there are no pods available, requests are ignored and left pending.

I started solving this by implementing a WSGI like Waitress to keep the requests in a queue. But now I am wondering if this is the way to go, because the WSGI will be in the container that runs Flask. And since that container is on its maximum cpu, would adding a request queue even change anything if it is unreachable?

I also thought about creating the queue by hand. With this I mean: storing the requests flask receives and then executing them when there is free cpu. Again though I run into the problem that the requests would not even reach this handmade queue.

Important to know is that I ran the container locally in dockerdesktop and it was able to handle all the requests I sent it. It took a while to process them all, but none got an error and none were ignored.

kobulloc-MSFT 26,801 Reputation points Microsoft Employee Moderator

2023-05-05T22:40:08.1333333+00:00

Hello, @Jens Voorpyl !

I hope this has been helpful! We appreciate hearing from you and would love to help others who may have the same question. Accepting answers helps increase visibility of this question for other members of the Microsoft Q&A community.

If you still have questions, please let us know what is needed in the comments so the question can be answered. Thank you for helping to improve Microsoft Q&A!

1 answer

Your answer

kobulloc-MSFT 26,801 Reputation points Microsoft Employee Moderator

2023-05-05T22:40:08.1333333+00:00

Hello, @Jens Voorpyl !

I hope this has been helpful! We appreciate hearing from you and would love to help others who may have the same question. Accepting answers helps increase visibility of this question for other members of the Microsoft Q&A community.

If you still have questions, please let us know what is needed in the comments so the question can be answered. Thank you for helping to improve Microsoft Q&A!

Answer 1

kobulloc-MSFT 26,801 Microsoft Employee Moderator

Hello, @Jens Voorpyl !

How can I scale up my AKS resource faster than using the cluster autoscaler?

There are many scaling options available in AKS including manual scaling, the horizontal pod autoscaler, the cluster autoscaler, and integrating with Azure Container Instances (ACI). The cluster autoscaler may take a few minutes for nodes to provision and for Kubernetes scheduler to run pods them.

A rapid scaling option would be to integrate with Azure Container Instances (ACI):

https://learn.microsoft.com/en-us/azure/aks/concepts-scale#burst-to-azure-container-instances-aci

ACI lets you quickly deploy container instances without additional infrastructure overhead. When you connect with AKS, ACI becomes a secured, logical extension of your AKS cluster. The virtual nodes component, which is based on virtual Kubelet, is installed in your AKS cluster that presents ACI as a virtual Kubernetes node. Kubernetes can then schedule pods that run as ACI instances through virtual nodes, not as pods on VM nodes directly in your AKS cluster. Your application requires no modifications to use virtual nodes. Your deployments can scale across AKS and ACI and with no delay as the cluster autoscaler deploys new nodes in your AKS cluster. Virtual nodes are deployed to an additional subnet in the same virtual network as your AKS cluster. This virtual network configuration secures the traffic between ACI and AKS. Like an AKS cluster, an ACI instance is a secure, logical compute resource isolated from other users.

Hopefully this is what you are looking for! If you have additional questions, please let us know in the comments.

If this has been helpful, please take a moment to accept answers as this helps increase visibility of this question for other members of the Microsoft Q&A community. Thank you for helping to improve Microsoft Q&A!

User's image

Jens Voorpyl 15 Reputation points

2023-05-08T11:53:52.6766667+00:00

Thank you @kobulloc-MSFT for the response! This does not really answer my question, though it is useful information to know.

What I would like to know is how to prevent requests from being dropped. It is okay if it takes somewhat longer if there is a lot of traffic, but not AKS not sending back an error is not good for our application.

Our AKS runs an algorithm that can take 10 seconds to 2 minutes to complete, so implementing a "retry" policy seems hard because we don't know if a request just takes a long time to finish, which is expected, or if it is left pending by AKS.

In summary: is there a way to queue the requests in AKS so that they are not left pending?
kobulloc-MSFT 26,801 Reputation points Microsoft Employee Moderator

2023-05-10T06:35:16.8366667+00:00

No worries, @Jens Voorpyl ! I reached out to the AKS team to see if they have a recommendation for this scenario and will update you when I hear back from them.
kobulloc-MSFT 26,801 Reputation points Microsoft Employee Moderator

2023-05-12T16:04:14.45+00:00

Hello, @Jens Voorpyl ! I spoke with the AKS team and the ACI route would be the Azure native recommendation. Having said that, one recommendation was to look into KEDA:

https://keda.sh/docs/2.0/concepts/scaling-deployments/

This is outside the scope of what we can provide detailed information on, but hopefully it helps!
Jens Voorpyl 15 Reputation points

2023-05-15T08:46:49.4066667+00:00

Thank you @kobulloc-MSFT for the response!

I looked into KEDA, and while it seems like a useful tool, it still does not provide a solution to the problem I am encountering.

It scales the cluster, which is great. However, how I understand it, we will still get the same result from before. Requests will be ignored or receive an error when there is a large amount of requests arriving at the same time.

How do I make sure no requests are ignored? Be it to receive an error or for AKS to wait for a resource to be available. It is problematic to leave requests pending/ignored in our application!
kobulloc-MSFT 26,801 Reputation points Microsoft Employee Moderator

2023-05-17T23:48:41.8166667+00:00

Hello, @Jens Voorpyl ! As a word of warning, this goes well outside my area of expertise however it sounds like you are looking for a queue service like Azure Service Bus. There is an architecture example of using Service Bus message queue with AKS in a fictitious drone delivery app:

https://learn.microsoft.com/en-us/azure/architecture/reference-architectures/containers/aks-microservices/aks-microservices-advanced

Bing generated summary:

Here, AKS is used to host microservices that are used to ingest data and process it. The microservices are deployed as Docker containers in an AKS cluster. The requests pass through Azure Application Gateway into the ingestion web application, which runs as an in-cluster microservice in AKS. The ingestion web application produces a message and sends it to the Service Bus message queue. The backend system assigns a drone and notifies the user.
Jens Voorpyl 15 Reputation points

2023-05-24T13:00:28.0733333+00:00

Thank you @kobulloc-MSFT for the response!

I will look into the Service Bus service, this sounds like it might be a solution to my problem.

I still have one question left: how is it possible that Azure Kubernetes Service just ignores requests? I would expect an error to be returned. When I run "kubectl top pods" I can see that none of the pods are in use after a while, even though some requests have not received a response.
kobulloc-MSFT 26,801 Reputation points Microsoft Employee Moderator

2023-05-25T05:27:13.3+00:00

Hello, @Jens Voorpyl !

When I run many requests at once, the 500m limit is reached for all pods. The autoscaler scales up the pods, but this takes a while. While there are no pods available, requests are ignored and left pending.

In the original issue it sounds like AKS isn't ignoring the requests but rather there are no resources available to address them. Historically, this would result in over allocating server resources just to be safe which lead to containers and cloud solutions. As a result, asynchronous messaging is a common architectural pattern used when there are dynamic loads and available resources.

When I run "kubectl top pods" I can see that none of the pods are in use after a while, even though some requests have not received a response.

This sounds like there may be a separate issue with the service as the pods should be responsive if resources are available. Start a new thread and we'd be happy to help you troubleshoot that issue.

If this has been helpful, please take a moment to accept answers as this helps increase visibility of this question for other members of the Microsoft Q&A community. Thank you for helping to improve Microsoft Q&A!

Share via

AKS - when cpu usage is at the maximum value, requests are dropped

1 answer

Your answer