Incorrect Hostname of pods in same stateful set in an AKS cluster
Hello all
I am at my wits' end here. I have been trying to solve this issue that all of a sudden started happening with my recent deployments to AKS to no solution. I have gone through a lot of different resources for documentation, as well as various Stack Overflow questions and answers. I'm not a Kubernetes expert, but I'm trying.
My main issue is:
I have JBoss application I am deploying to AKS. The application is being deployed as a StatefulSet (replicas=2), and into the default namespace. The deployment creates the following services (all in the default namespace):
- demo-app-hs (headless service)
- Has no ClusterIP, and shows 2 pods (demo-app-depl-0 and 1) when I drill in.
- demo-app-service (non-headless service)
- Has a ClusterIP and an ExternalIP, and shows 2 pods (demo-app-depl-0 and 1) when I drill in.
- demo-app-service-lb (default lb using the Azure LoadBalancer)
- Has a ClusterIP and an ExternalIP, and shows 2 pods (demo-app-depl-0 and 1) when I drill in.
The first node comes up as 'demo-app-depl-0' and works perfectly fine. I can access it, no errors. The second node comes up as 'demo-app-depl-1' and in its' logs, I see the error thrown which leads me to believe that this pod cannot connect to the master pod in the cluster:
[exec] 2023-11-24 04:45:13.378+0000 ERROR [org.apache.activemq.artemis.core.client:877] {} (Thread-28 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@21918ea4)) AMQ214016: Failed to create netty connection: java.net.UnknownHostException: demo-app-depl-0
When I hop into the pod (demo-app-depl-1) and check the /etc/resolv.conf file, I see the following:
search default.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.0.0.10
options ndots:5
When I run 'kubectl exec -i -t demo-app-depl-1 -- nslookup default.svc.cluster.local', I am returned:
Server: 10.0.0.10
Address: 10.0.0.10#53
*** Can't find default.svc.cluster.local: No answer
When I run 'kubectl exec -i -t demo-app-depl-1 -- nslookup demo-app-hs.default.svc.cluster.local', everything can be resolved fine:
Server: 10.0.0.10
Address: 10.0.0.10#53
Name: demo-app-hs.default.svc.cluster.local
Address: 10.244.2.6
Name: demo-app-hs.default.svc.cluster.local
Address: 10.244.2.7
I haven't changed my method of deploying, which uses helm over the last year however, just recently, I started running into this issue. I'm not sure what to do at this point.
Any help would be appreciated, thank you.