AKS cluster in strange state after subscription has been disabled/re-enabled

Sarah Gibson 1 Reputation point
2020-09-02T19:40:14.063+00:00

Hi there!

I run a production AKS cluster at the Alan Turing Institute and our internal process of allocating credits to subscriptions often means that mean subs get disabled and re-enabled after some time.

Upon re-enabling the sub, I find the AKS cluster is in a super weird state. In the kube-system namespace, a lot of the pods are duplicated and stuck either in the Pending or Terminating state and this means that helm then can't find a ready tiller pod and I'm kinda stuck. So my questions are as follows:

  • How can I get my cluster operational again without deleting it and redeploying from scratch?
  • How can I prevent this happening again?

I'd be hugely grateful for any support you can provide!

Pods output:

$ kubectl -n kube-system get pods
NAME                                         READY   STATUS        RESTARTS   AGE
azure-cni-networkmonitor-8wl2j               1/1     Running       0          214d
azure-cni-networkmonitor-b7wkg               1/1     Running       0          128d
azure-cni-networkmonitor-q8dzp               1/1     Running       0          183d
azure-cni-networkmonitor-tdsqd               1/1     Running       0          72d
azure-ip-masq-agent-859zn                    1/1     Terminating   0          183d
azure-ip-masq-agent-c29hr                    1/1     Terminating   0          214d
azure-ip-masq-agent-fbch4                    1/1     Terminating   0          72d
azure-ip-masq-agent-k2zqj                    1/1     Terminating   0          128d
azure-npm-g9ttc                              1/1     Terminating   0          65d
azure-npm-jm5ld                              1/1     Terminating   0          65d
azure-npm-l97zz                              1/1     Terminating   0          65d
azure-npm-mxf4b                              1/1     Terminating   0          65d
coredns-6c66fc4fcb-9w5pg                     1/1     Running       0          62d
coredns-6c66fc4fcb-s4lbp                     1/1     Terminating   0          55d
coredns-869cb84759-4fh5l                     0/1     Pending       0          33d
coredns-869cb84759-zkpmz                     0/1     Pending       0          33d
coredns-autoscaler-5b867494f-8ffr2           0/1     Pending       0          33d
coredns-autoscaler-78959b4578-hxbht          1/1     Terminating   0          54d
dashboard-metrics-scraper-566c858889-zb67b   0/1     Pending       0          33d
dashboard-metrics-scraper-5f44bbb8b5-72mzn   1/1     Terminating   0          62d
dashboard-metrics-scraper-5f44bbb8b5-xfmb4   0/1     Pending       0          32m
kube-proxy-2hvgb                             1/1     Running       0          62d
kube-proxy-sjmvp                             1/1     Running       0          62d
kube-proxy-w8n74                             1/1     Running       0          62d
kube-proxy-wl2g4                             1/1     Running       0          62d
kubernetes-dashboard-785654f667-f8f6z        1/1     Terminating   0          62d
kubernetes-dashboard-7f7d6bbd7f-cschs        0/1     Pending       0          33d
metrics-server-6cd7558856-7gb58              0/1     Pending       0          33d
metrics-server-85c57978c6-dnmml              1/1     Terminating   0          55d
omsagent-k7bs6                               1/1     Terminating   0          61d
omsagent-m57lm                               1/1     Terminating   0          61d
omsagent-r565v                               1/1     Terminating   0          61d
omsagent-rs-669fd8467-78kl7                  1/1     Terminating   0          55d
omsagent-rs-6b8f79bd9b-czpwl                 0/1     Pending       0          42d
omsagent-rs-7d6b44bbdd-jhn5b                 0/1     Pending       0          33d
omsagent-vrg86                               1/1     Terminating   0          61d
tiller-deploy-77d5bddbc9-tcm2b               0/1     Pending       0          30m
tiller-deploy-77d5bddbc9-tvl6h               1/1     Terminating   0          62d
tunnelfront-5c54945cc5-gn4j6                 0/2     Pending       0          33d
tunnelfront-976977d47-b9v54                  2/2     Terminating   0          53d
Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
2,146 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. vipullag-MSFT 26,391 Reputation points
    2020-09-03T04:36:59.293+00:00

    @Sarah Gibson

    As the subscription got disabled, all the nodes might have got deleted. From the details shared, looks like there are no nodes available for AKS.

    Please check the availability of nodes kubectl get nodes If nothing is healthy, then would recommend to scale up so that you will get new nodes and the pods will get deployed to the healthy nodes.

    To avoid running into this is to ensure the subscription is not disabled.

    Hope this information is helpful.

    Please 'Accept as answer' if it helped, so that it can help others in the community looking for help on similar topics.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.