An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
We worked with the customer offline and the issue was mitigated, below is the RCA for the issue:
Issue Summary:
The customer's AKS cluster showed Istio add-on as "enabled" in the Azure Portal UI, but no Istio-related namespaces (e.g., aks-istio-system) were created, leading to control-plane instability.
Root Cause
We checked the issue from the backend and found that the Istio add-on could not be enabled because some parts of a previous Istio installation were not fully removed from the cluster.
There were leftover webhooks from the earlier Istio setup. These webhooks had different configuration details than what the AKS Istio add-on expects. Because of this mismatch, the API server was not able to communicate with them properly, and as a result, Istio could not be configured successfully.
Mitigation:
->We identified the Istio-related webhooks by running the command
kubectl get validatingwebhookconfiguration and kubectl get mutatingwebhookconfiguration, then filtering for Istio entries. After identifying them, we deleted the leftover webhooks.
-> Next, we disabled the Istio add-on and then enabled it again. This time, it was deployed correctly with the proper settings, which helped stabilize the cluster.
Best Practices:
- Before enabling Istio, always check if there are any existing Istio resources in the cluster. Look for validating or mutating webhooks, CRDs, and related namespaces, and remove them if they are leftovers from a previous installation.
- Avoid making manual changes in the kube-system namespace, as this can affect core cluster components and cause unexpected issues.
- After uninstalling Istio, make sure everything is completely removed, including webhooks, CRDs, roles, and any related resources. This helps prevent conflicts when reinstalling or enabling the add-on again.