Non-Restarting RabbitMQ Pod in AKS

Question

I am currently deploying RabbitMQ Server on Azure Kubernetes Service (AKS), specifically utilizing Availability Zone 1.

I am aiming to ensure that the running pods, especially the RabbitMQ Server, do not experience any unnecessary restarts unless it's due to issues originating from RabbitMQ itself. Last days, I changed Availablity Zone from "None" to "Zone1" for AKS. I supposed it would be enough to fix the issue but it didn't work. I realized all pods are restarted.

Could you please provide guidance on best practices or configurations within AKS to achieve this goal?

Answer

Hello Koprucu, Mert (ADV D EU TR AP&I TIA 1)

Welcome to Microsoft Q&A Platform, thanks for posting your query here.

To ensure that your running pods, especially the RabbitMQ Server, do not experience any unnecessary restarts unless it's due to issues originating from RabbitMQ itself, you can consider using Pod Disruption Budgets. Pod Disruption Budgets define how many replicas in a deployment can be taken down during an update or node upgrade.
For example, if you have five replicas in your deployment, you can define a pod disruption of four to only allow one replica to be deleted or rescheduled at a time. As with pod resource limits, best practice is to define pod disruption budgets on applications that require a minimum number of replicas to always be present.

Additionally, you mentioned that you changed the Availability Zone from "None" to "Zone1" for your AKS cluster, but it did not fix the issue. It's important to note that simply enabling Availability Zones does not guarantee high availability or prevent pod restarts. You also need to ensure that your RabbitMQ Server deployment is configured to take advantage of Availability Zones.

To achieve high availability on the node-level, you can use Kubernetes Pod Topology Spread Constraints or Pod Anti-Affinity to schedule your RabbitMQ Server pods on nodes spread across Availability Zones. This can help prevent disruptions due to data center or node failures

Hope this helps.

Share via

Non-Restarting RabbitMQ Pod in AKS

1 answer

Your answer