Share via

AKS pod creation failed

Tina Yan 0 Reputation points Microsoft Employee
2026-02-11T23:55:58.0366667+00:00

We recently enabled Workload Identity on our pods. And pod creation failed with

 

KubeEvents //| where ObjectKind == "ReplicaSet" //| where Reason == "FailedCreate" | where Name startswith "odata-"

 

Error creating: Internal error occurred: failed calling webhook "mutation.azure-workload-identity.io": failed to call webhook: Post "https://azure-wi-webhook-webhook-service.kube-system.svc:443/mutate-v1-pod?timeout=10s": code 504: 504 Gateway Timeout

 

Error message from konnectivity-agent pods:

Azure Kubernetes Service
Azure Kubernetes Service

An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.

0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Nikhil Duserla 9,685 Reputation points Microsoft External Staff Moderator
    2026-02-12T05:07:30.46+00:00

    Hello @Tina Yan .

    it looks like you're running into some issues with pod creation in your AKS after turning on Workload Identity. The error messages you’re getting, especially around the webhook and the 504 Gateway Timeout, indicate that there may be some network connectivity issues between your Kubernetes components.

    Here are some steps you can take to troubleshoot this:

    Check Network Connectivity:

    Ensure that the subnet configuration of your AKS cluster allows for proper communication between your nodes and services. Issues with your Virtual Network (VNet) or Network Security Group (NSG) settings could lead to connectivity problems and timeouts.

    Inspect Logs:

    Look at the logs of the konnectivity-agent pods. These logs can give you insights into connection issues with the backend services.

    Use kubectl Commands:

    Run the following commands after connecting to your AKS cluster to check the status of your nodes and pods:

    az aks get-credentials --resource-group MyResourceGroup --name MyManagedCluster kubectl get nodes kubectl get pods -n kube-system

    This will help identify if your nodes are operational and if the required system pods are running.

    Describe the Webhook:

    • Use kubectl describe on the problematic pod to view detailed events and status messages that could provide clues about what’s failing:
    kubectl describe pod <pod-name> -n kube-system
    

    Review Resource Group:

    • In the Azure portal, check the resource group associated with your AKS cluster. Look at the VM scale sets; any failed statuses may indicate deeper provisioning issues.

    Refer to Documentation:

    If you're still having issues after trying these steps, could you provide more details on the following:

    • Have there been any recent changes to your network configuration?
    • Are other pods or deployments functioning properly in your AKS cluster?
    • Are there any specific configurations or policies in your AKS cluster that might affect the Workload Identity feature?

    Hope this helps you troubleshoot the problem! If you have further questions, feel free to ask!

    0 comments No comments

  2. Q&A Assist
    2026-02-11T23:56:11.2866667+00:00
    AI generated content. This question contains an answer created with AI from Q&A Assist. Learn more

    The error you are encountering when creating pods with Workload Identity enabled seems to be related to connectivity issues with the webhook service responsible for handling the mutation of the pods. The specific error message indicates a timeout when trying to reach the webhook service at https://azure-wi-webhook-webhook-service.kube-system.svc:443/mutate-v1-pod. This could be caused by several factors:

    1. Network Connectivity Issues: Ensure that the network policies are correctly configured to allow traffic between the pods and the webhook service. You can check for any misconfigured network rules that might be blocking this communication.
    2. Webhook Service Availability: Verify that the Azure Workload Identity webhook service is running and healthy. You can check the logs of the webhook service to see if there are any errors or issues reported.
    3. Resource Constraints: Sometimes, resource constraints on the nodes can lead to timeouts. Ensure that your nodes have sufficient resources (CPU, memory) to handle the workloads and the webhook service.
    4. Firewall Rules: If you have firewall rules in place, ensure that they allow traffic to and from the necessary ports used by the webhook service.
    5. Timeout Settings: If the webhook service is taking too long to respond, consider checking the timeout settings and adjusting them if necessary.

    To troubleshoot further, you can use commands like kubectl logs to check the logs of the webhook service and kubectl describe pod to get more details about the pod creation failure. Additionally, reviewing the network policies and ensuring that the necessary ports are open for communication between the pods and the webhook service will be crucial.


    References:

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.