An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
Hello @Tina Yan .
it looks like you're running into some issues with pod creation in your AKS after turning on Workload Identity. The error messages you’re getting, especially around the webhook and the 504 Gateway Timeout, indicate that there may be some network connectivity issues between your Kubernetes components.
Here are some steps you can take to troubleshoot this:
Check Network Connectivity:
Ensure that the subnet configuration of your AKS cluster allows for proper communication between your nodes and services. Issues with your Virtual Network (VNet) or Network Security Group (NSG) settings could lead to connectivity problems and timeouts.
Inspect Logs:
Look at the logs of the konnectivity-agent pods. These logs can give you insights into connection issues with the backend services.
Use kubectl Commands:
Run the following commands after connecting to your AKS cluster to check the status of your nodes and pods:
az aks get-credentials --resource-group MyResourceGroup --name MyManagedCluster kubectl get nodes kubectl get pods -n kube-system
This will help identify if your nodes are operational and if the required system pods are running.
Describe the Webhook:
- Use
kubectl describeon the problematic pod to view detailed events and status messages that could provide clues about what’s failing:
kubectl describe pod <pod-name> -n kube-system
Review Resource Group:
- In the Azure portal, check the resource group associated with your AKS cluster. Look at the VM scale sets; any failed statuses may indicate deeper provisioning issues.
Refer to Documentation:
- For further guidance, check out these links:
If you're still having issues after trying these steps, could you provide more details on the following:
- Have there been any recent changes to your network configuration?
- Are other pods or deployments functioning properly in your AKS cluster?
- Are there any specific configurations or policies in your AKS cluster that might affect the Workload Identity feature?
Hope this helps you troubleshoot the problem! If you have further questions, feel free to ask!