Disaster Recovery Solution for Azure Kubernetes Service (AKS)

thanakrit rungchatkamol 21 Reputation points
2024-02-11T15:40:41.6566667+00:00

Hi All, How i find solution disaster recovery solution for Azure Kubernetes Service (AKS) in case i use on-premise is DR site for AKS? who you recommend possible solution? or use case scenario. BR Thanakrit

Azure Kubernetes Service
Azure Kubernetes Service
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
2,457 questions
{count} votes

1 answer

Sort by: Most helpful
  1. kobulloc-MSFT 26,811 Reputation points Microsoft Employee Moderator
    2024-02-11T18:44:15.7033333+00:00

    Hello, @thanakrit.r !

    What guidance is there for disaster recovery (DR) for on-premises deployments of AKS?

    Azure Kubernetes Service (AKS) on Azure Stack HCI and Windows Server is an on-premises Kubernetes implementation of AKS. This means that the management cluster is deployed as a single standalone virtual machine (VM) per deployment.

    Guidance for restoring the state of AKS on new hardware and recovering from management cluster corruption can be found in our AKS disaster recovery documentation:

    https://learn.microsoft.com/en-us/azure/aks/hybrid/restore-aks-cluster

    In AKS on Azure Stack HCI or Windows Server, the management cluster is deployed as a single standalone virtual machine (VM) per deployment, making it a single point of failure. It is important to note that a management cluster outage has no impact on applications running in the workload clusters. When the management cluster VM fails, the workload clusters (and workloads) continue running, but you won't be able to perform day-2 operations. For example, you cannot create new workload clusters, create or scale a node pool, or upgrade Kubernetes versions, until the VM is restored.

    The management cluster is a VM that's tracked in Windows failover clustering. It is also resilient to host-level disruptions. In other words, during a host machine failure, Windows failover clustering restarts the VM on a healthy host machine. This article provides guidance on how to perform the following tasks:

    • Restore the state of AKS on new hardware (could be a new site).
    • Recovery from corruption of the management cluster.

    In either of these scenarios, the management cluster and all the workload clusters must be recreated.


    I hope this has been helpful! Your feedback is important so please take a moment to accept answers. If you still have questions, please let us know what is needed in the comments so the question can be answered. Thank you for helping to improve Microsoft Q&A!

    User's image

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.