CoreDNS Intermittent Failure on AKS

RamKumar 0 Reputation points
2025-04-18T04:56:49.1966667+00:00

We're experiencing intermittent issues with CoreDNS in our Azure Kubernetes Service (AKS) cluster. At times, DNS resolution works correctly, but it frequently becomes unresponsive without any clear pattern.

Interestingly, the same services function normally in our local environment, and external name resolution using Google DNS (8.8.8.8) is consistently successful. This indicates that the issue is specific to CoreDNS within AKS and not related to the services themselves or external DNS resolution.

We suspect the problem might be related to CoreDNS configuration, resource limitations, or internal networking within the AKS cluster. Further investigation and potential debugging (e.g., logs, restarts, monitoring pod metrics) are needed to determine the root cause.

Error:
{

    "success": false,

    "errorMessage": "getaddrinfo EAI_AGAIN api.nbq.ae"

}

 

Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
2,409 questions
{count} votes

1 answer

Sort by: Most helpful
  1. UJTyagi-MSFT 1,010 Reputation points Microsoft Employee
    2025-04-18T07:49:48.4733333+00:00

    Hello @Ramkumar

    Welcome to the Microsoft Q&A Platform. Thank you for reaching out & I hope you are doing well.

    The error you are seeing getaddrinfo EAI_AGAIN shows it could be DNS resolution or timeout issues.

    Let's try to follow the below steps to troubleshoot the issue:

    • Check the logs of the Core DNS pods
    kubectl -n kube-system logs -l k8s-app=kube-dns
    

    Check if you see the timeouts, restarts type of errors.

    • Check Core DNS pods Resource Utilization
    kubectl top pod -n kube-system -l k8s-app=kube-dns
    
    • Check id the pods are overutilized Consider increasing resource limits.
    • Try to Restart CoreDNS pods as see if the issue persists
    kubectl rollout restart deployment coredns -n kube-system
    
    • Run DNS Test pods

    Deploy a test pod to run the DNS lookup

    kubectl run -it --rm dnsutils --image=tutum/dnsutils -- bash
    

    then test DNS

    dig api.nbq.ae
    dig kubernetes.default
    
    • Login into Node and check if DNS resolution on Nodes are working fine
    nslookup api.nbq.ae
    
    • Sometimes pods on the AKS cluster might switch up to public resolutions too despite having private resolution setting up fine. Those case can be verified by running the packet captures on the cluster Nodes.

    Kindy check and revert if have any further queries.

    Regards

    Ujjawal Tyagi

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.