Incorrect Hostname of pods in same stateful set in an AKS cluster

Brian Lui 0 Reputation points
2023-11-24T05:23:06.2466667+00:00

Hello all

I am at my wits' end here. I have been trying to solve this issue that all of a sudden started happening with my recent deployments to AKS to no solution. I have gone through a lot of different resources for documentation, as well as various Stack Overflow questions and answers. I'm not a Kubernetes expert, but I'm trying.

My main issue is:

I have JBoss application I am deploying to AKS. The application is being deployed as a StatefulSet (replicas=2), and into the default namespace. The deployment creates the following services (all in the default namespace):

  • demo-app-hs (headless service)
    • Has no ClusterIP, and shows 2 pods (demo-app-depl-0 and 1) when I drill in.
  • demo-app-service (non-headless service)
    • Has a ClusterIP and an ExternalIP, and shows 2 pods (demo-app-depl-0 and 1) when I drill in.
  • demo-app-service-lb (default lb using the Azure LoadBalancer)
    • Has a ClusterIP and an ExternalIP, and shows 2 pods (demo-app-depl-0 and 1) when I drill in.

The first node comes up as 'demo-app-depl-0' and works perfectly fine. I can access it, no errors. The second node comes up as 'demo-app-depl-1' and in its' logs, I see the error thrown which leads me to believe that this pod cannot connect to the master pod in the cluster:

[exec] 2023-11-24 04:45:13.378+0000 ERROR [org.apache.activemq.artemis.core.client:877] {} (Thread-28 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@21918ea4)) AMQ214016: Failed to create netty connection: java.net.UnknownHostException: demo-app-depl-0

When I hop into the pod (demo-app-depl-1) and check the /etc/resolv.conf file, I see the following:

search default.svc.cluster.local svc.cluster.local cluster.local

nameserver 10.0.0.10

options ndots:5

When I run 'kubectl exec -i -t demo-app-depl-1 -- nslookup default.svc.cluster.local', I am returned:

Server: 10.0.0.10

Address: 10.0.0.10#53

*** Can't find default.svc.cluster.local: No answer

When I run 'kubectl exec -i -t demo-app-depl-1 -- nslookup demo-app-hs.default.svc.cluster.local', everything can be resolved fine:

Server: 10.0.0.10

Address: 10.0.0.10#53

Name: demo-app-hs.default.svc.cluster.local

Address: 10.244.2.6

Name: demo-app-hs.default.svc.cluster.local

Address: 10.244.2.7

I haven't changed my method of deploying, which uses helm over the last year however, just recently, I started running into this issue. I'm not sure what to do at this point.

Any help would be appreciated, thank you.

Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
1,992 questions
{count} votes