Cannot communicate with AKS cluster DNS name

Darren Slaughter 21 Reputation points
2021-06-09T09:15:00.343+00:00

Hi. Hoping you can asist with an AKS comms issue. I am new to Kubernetes. I can successfully deploy an AKS private cluster using Terraform, from a self-hosted Azure DevOps agent, but when the Terraform attempts to add Kubernetes namespaces, it fails to connect to the cluster DNS name on port 443, however it can communicate to the private IP address of the cluster on 443.

The Terraform works 100% when run locally, however fails when run from the ADO Agent:
Error: Post "https://<MYCLUSTERNAME>.privatelink.northeurope.azmk8s.io:443/api/v1/namespaces": dial tcp: lookup <MYCLUSTERNAME>.privatelink.northeurope.azmk8s.io: no such host

Test-NetConnection to the FQDN on 443 fails, however, Test-NetConnection to the private IP address on 443 passes

Is there a specific Azure IP range/subnet, that I need to open 443 from the ADO Agent in order to reach the AKS Cluster?

I have tried some manual steps to test connectivity:
az login
-- I can login successfully via CLI
az aks get-credentials --name <MYCLUSTERNAME> --resource-group <CLUSTERRESOURCEGROUP>
-- Credentials successfully loaded into .kube/config
kubectl get nodes
-- I login with the Microsoft Device Code login, but then receive an error: Unable to connect to the server: dial tcp: lookup <MYCLUSTERNAME>.privatelink.northeurope.azmk8s.io: no such host

Any advice will be appreciated.

Thanks
Darren

Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
1,871 questions
{count} votes

3 answers

Sort by: Most helpful
  1. SRIJIT-BOSE-MSFT 4,331 Reputation points Microsoft Employee
    2021-06-09T10:04:33.687+00:00

    @Darren Slaughter , Thank you for the question.

    A simple solution in your situation would be to use AKS Run Command Feature (Preview)

    AKS run command allows you to remotely invoke commands in an AKS cluster through the AKS API. This feature provides an API that allows you to, for example, execute just-in-time commands from a remote laptop for a private cluster. This can greatly assist with quick just-in-time access to a private cluster when the client machine is not on the cluster private network while still retaining and enforcing the same RBAC controls and private API server.

    Please find the instructions to register the RunCommandPreview Feature here.

    Here are a few examples of how to use the feature.

    Hope this helps!

    Please "Accept as Answer" if it helped, so that it can help others in the community looking for help on similar topics.

    1 person found this answer helpful.
    0 comments No comments

  2. Darren Slaughter 21 Reputation points
    2021-06-24T14:03:25.337+00:00

    Hi @SRIJIT-BOSE-MSFT

    I have a new/fresh subscription, and trying to use same Terraform code to deploy an AKS cluster, and hitting the same problem, where the public DNS name: <MYCLUSTERNAME>-SHORTUUID.LONGUUID.privatelink.northeurope.azmk8s.io is not resolvable during the terraform apply, which means the rest of the terraform fails (trying to create namespaces on the cluster, since it cannot resolve with the newly created DNS zone.

    I followed your instruction to enable the AKS Run Command Function (as per MS documentation: https://learn.microsoft.com/en-us/azure/aks/private-clusters#aks-run-command-preview). I was able to successfully register the extension:
    109015-image.png

    However, when I try the simple example command from the MS documentation fails, with:
    109093-image.png

    Hoping you can advise how I can create the namespaces, during the terraform creation of the cluster.

    Thank you

    Full error from Terraform here:

    Error: waiting for creation of Managed Kubernetes Cluster "MYCLUSTERNAME" (Resource Group "MYCLUSTERRESOURCEGROUP"): Code="CreateVMSSAgentPoolFailed" Message="Agents are unable to resolve Kubernetes API server name. It's likely custom DNS server is not correctly configured, please see https://aka.ms/aks/private-cluster#hub-and-spoke-with-custom-dns for more information. Details: Code=\"VMExtensionProvisioningError\" Message=\"VM has reported a failure when processing extension 'vmssCSE'. Error message: \\\"Enable failed: failed to execute command: command terminated with exit status=52\\n[stdout]\\n{ \\\"ExitCode\\\": \\\"52\\\", \\\"Output\\\": \\\"Thu Jun 24 13:18:08 UTC 2021,aks-default-11336002-vmss000000\\\\nConnection to mcr.microsoft.com 443 port [tcp/https] succeeded!\\\\n? kubelet.service - Kubelet\\\\n Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled)\\\\n Active: active (running) since Thu 2021-06-24 13:18:29 UTC; 3min 22s ago\\\\n Main PID: 3076 (kubelet)\\\\n Tasks: 13 (limit: 4915)\\\\n CGroup: /system.slice/kubelet.service\\\\n +-3076 /usr/local/bin/kubelet --enable-server --node-labels=kubernetes.azure.com/role=agent,agentpool=default,storageprofile=managed,storagetier=Standard_LRS,kubernetes.azure.com/os-sku=Ubuntu,kubernetes.azure.com/cluster=MYCLUSTERNAME-NODES-RG,kubernetes.azure.com/mode=system,kubernetes.azure.com/node-image-version=AKSUbuntu-1804gen2containerd-2021.06.02 --v=2 --container-runtime=remote --runtime-request-timeout=15m --container-runtime-endpoint=unix:///run/containerd/containerd.sock --volume-plugin-dir=/etc/kubernetes/volumeplugins --address=0.0.0.0 --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --azure-container-registry-config=/etc/kubernetes/azure.json --cgroups-per-qos=true --client-ca-file=/etc/kubernetes/certs/ca.crt --cloud-config=/etc/kubernetes/azure.json --cloud-provider=azure --cluster-dns=100.1.0.10 --cluster-domain=cluster.local --dynamic-config-dir=/var/lib/kubelet --enforce-node-allocatable=pods --event-qps=0 --eviction-hard=memory.available<750Mi,nodefs.available<10%!,(MISSING)nodefs.inodesFree<5%!f(MISSING)eature-gates=RotateKubeletServerCertificate=true --image-gc-high-threshold=85 --image-gc-low-threshold=80 --image-pull-progress-deadline=30m --keep-terminated-pod-volumes=false --kube-reserved=cpu=100m,memory=1843Mi --kubeconfig=/var/lib/kubelet/kubeconfig --max-pods=110 --network-plugin=cni --node-status-update-frequency=10s --non-masquerade-cidr=100.0.0.0/16 --pod-infra-container-image=mcr.microsoft.com/oss/kubernetes/pause:3.5 --pod-manifest-path=/etc/kubernetes/manifests --pod-max-pids=-1 --protect-kernel-defaults=true --read-only-port=0 --resolv-conf=/run/systemd/resolve/resolv.conf --rotate-certificates=false --streaming-connection-idle-timeout=4h --tls-cert-file=/etc/kubernetes/certs/kubeletserver.crt --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_GCM_SHA256 --tls-private-key-file=/etc/kubernetes/certs/kubeletserver.key\\\\n\\\\nJun 24 13:21:51 aks-default-11336002-vmss000000 kubelet[3076]: E0624 13:21:51.273190 3076 kubelet.go:2209] node \\\\\\\"aks-default-11336002-vmss000000\\\\\\\" not found\\\\nJun 24 13:21:51 aks-default-11336002-vmss000000 kubelet[3076]: E0624 13:21:51.373229 3076 kubelet.go:2209] node \\\\\\\"aks-default-11336002-vmss000000\\\\\\\" not found\\\\nJun 24 13:21:51 aks-default-11336002-vmss000000 kubelet[3076]: E0624 13:21:51.473380 \\\", \\\"Error\\\": \\\"\\\", \\\"ExecDuration\\\": \\\"224\\\" }\\n\\n[stderr]\\n\\\"\\r\\n\\r\\nMore information on troubleshooting is available at https://aka.ms/VMExtensionCSELinuxTroubleshoot \""  
    

  3. Leonardo Bispo 1 Reputation point
    2021-10-07T17:19:32.873+00:00

    @Darren Slaughter

    Are you running the terraform in your local machine?

    If so, you must run it inside the private network (using vpn or a vm). It is trying to resolve the cluster fqdn on your local machine (and you should be inside the vnet).

    I hope this answer can help you to solve your problem

    I spent half day to figure out this was my problem