Backend pool of app gateway is unhealthy and backend app can't be reached through ingress controller
I am setting up a new environment in Azure for services running in AKS cluster. For the beginning and test I am using just one sample application - azure-vote-front.
The configuration is the following:
- 2 VNETs have been set up,
- for the application gateway (GW) - vnet-gw with address range 10.21.46.0-10.21.47.255
- for the AKS cluster - vnet-aksondes with address range 10.21.40.0-10.21.40.255
- the VNETs are peered
- app GW was set up with public IP (private IP determined dynamically and set to 10.21.46.70)
- AKS cluster was created and in it an Ingress controller installed at IP address 10.21.40.254
- azure-vote-front app is deployed as ingress in AKS - the AKS node is running at IP 10.21.40.4 and the app also received 10.21.40.254 as External IP in Azure portal
- the ingress has a rule in deployment YAML that defines "internal" host name used to determine the service to be invoked
- company DNS zone has already been set up (e.g. company.com) with records pointing to existing services already in use
- additionally added A record for the sample app - azure-vote-front.company.com pointing to public IP of the GW
- this DNS zone is in a different Azure subscription than all the other resources created for this environment (although it doesn't seem important)
- added new DNS subzone for my environment - int.dev.company.com
- added A record to point all the traffic from this domain to the ingress controller - 10.21.40.254
- app GW has listener at azure-vote-front.company.com with rule that targets the single backend pool pointing to FQDN azure-vote-front.int.dev.company.com
In theory, the idea of the setup is that when the request comes at azure-vote-front.company.com, the GW routes it to the ingress controller through the network peering, and the ingress controller further routes the traffic to azure-vote-front app in the cluster based on the "internal" FQDN (azure-vote-front.int.dev.company.com). The described setup is also shown on the diagram below.
Issues/errors:
- Overview blade of the GW in Azure portal shows the error that backend pools are unhealthy which may result in 502 Bad Gateway
- Backend health of the GW in Azure portal also shows that the backend pool is unhealthy
- Connection troubleshoot on the GW in Azure portal with destination azure-vote-front.int.dev.company.com is not able to establish connection beyond the IP 10.21.40.254. Next hop (10.21.40.4) is shown, but not reached.
What has been checked:
- the app itself seems to be running - checked with curl to the localhost invoked from within the azure-vote-front pod's shell - returns HTTP 200
- curl from the ingress controller's pod shell to the internal IP of the azure-vote-front's pod also returns HTTP 200
- ping from a test virtual machine inside vnet-gw to the AKS node (10.21.40.4) returns response, but not to the ingress controller (10.21.40.254). I've heard that maybe the ingress controller doesn't have a ping service, but anyway the first ping confirms that the VNETs can see each other
- DNS seem properly resolved to the expected IPs, even in Azure diagnostic tests - azure-vote-front.company.com to the GWs public IP, and azure-vote-front.int.dev.company.com to the IP of the ingress controller (10.21.40.254)
However, the errors described above still remain and the application is not available at azure-vote-front.company.com. Instead, as one of the errors on the portal suggests, 502 Bad gateway is returned.
Any ideas what else might be wrong or missing in the setup? Any help is appreciated.