Backend pool of app gateway is unhealthy and backend app can't be reached through ingress controller

Question

Backend pool of app gateway is unhealthy and backend app can't be reached through ingress controller

Milo Milovanovic 0

I am setting up a new environment in Azure for services running in AKS cluster. For the beginning and test I am using just one sample application - azure-vote-front.

The configuration is the following:

2 VNETs have been set up,
- for the application gateway (GW) - vnet-gw with address range 10.21.46.0-10.21.47.255
- for the AKS cluster - vnet-aksondes with address range 10.21.40.0-10.21.40.255
- the VNETs are peered
app GW was set up with public IP (private IP determined dynamically and set to 10.21.46.70)
AKS cluster was created and in it an Ingress controller installed at IP address 10.21.40.254
azure-vote-front app is deployed as ingress in AKS - the AKS node is running at IP 10.21.40.4 and the app also received 10.21.40.254 as External IP in Azure portal
- the ingress has a rule in deployment YAML that defines "internal" host name used to determine the service to be invoked
company DNS zone has already been set up (e.g. company.com) with records pointing to existing services already in use
- additionally added A record for the sample app - azure-vote-front.company.com pointing to public IP of the GW
- this DNS zone is in a different Azure subscription than all the other resources created for this environment (although it doesn't seem important)
added new DNS subzone for my environment - int.dev.company.com
- added A record to point all the traffic from this domain to the ingress controller - 10.21.40.254
app GW has listener at azure-vote-front.company.com with rule that targets the single backend pool pointing to FQDN azure-vote-front.int.dev.company.com

In theory, the idea of the setup is that when the request comes at azure-vote-front.company.com, the GW routes it to the ingress controller through the network peering, and the ingress controller further routes the traffic to azure-vote-front app in the cluster based on the "internal" FQDN (azure-vote-front.int.dev.company.com). The described setup is also shown on the diagram below.

User's image

Issues/errors:

Overview blade of the GW in Azure portal shows the error that backend pools are unhealthy which may result in 502 Bad Gateway
Backend health of the GW in Azure portal also shows that the backend pool is unhealthy
Connection troubleshoot on the GW in Azure portal with destination azure-vote-front.int.dev.company.com is not able to establish connection beyond the IP 10.21.40.254. Next hop (10.21.40.4) is shown, but not reached.

What has been checked:

the app itself seems to be running - checked with curl to the localhost invoked from within the azure-vote-front pod's shell - returns HTTP 200
curl from the ingress controller's pod shell to the internal IP of the azure-vote-front's pod also returns HTTP 200
ping from a test virtual machine inside vnet-gw to the AKS node (10.21.40.4) returns response, but not to the ingress controller (10.21.40.254). I've heard that maybe the ingress controller doesn't have a ping service, but anyway the first ping confirms that the VNETs can see each other
DNS seem properly resolved to the expected IPs, even in Azure diagnostic tests - azure-vote-front.company.com to the GWs public IP, and azure-vote-front.int.dev.company.com to the IP of the ingress controller (10.21.40.254)

However, the errors described above still remain and the application is not available at azure-vote-front.company.com. Instead, as one of the errors on the portal suggests, 502 Bad gateway is returned.

Any ideas what else might be wrong or missing in the setup? Any help is appreciated.

AirGordon 7,150 Reputation points

2023-06-23T15:21:53.4633333+00:00

I'd start by successfully exposing it via IP through AppGw before then layering on the DNS config. You've got a lot of moving parts to debug.

Here's my "known good" AGIC IP based manifest, it might be a good reference. https://github.com/Gordonby/Snippets/blob/master/AKS/Azure-Vote-Labelled-ILB-AgicPrivate-NetPolicy.yaml
Ben Gimblett 4,560 Reputation points Microsoft Employee

2023-06-26T09:44:16.5933333+00:00
@Milo Milovanovic - thanks for the question. As per my colleagues comment - the assumption when reading your question is that you're using the Az Application Gateway with AGIC (Application Gateway Ingress Controller) whereby the AGIC uses the App Gateway as the reverse proxy and maintains the App Gateway config based on what's been monitored in the cluster.

If this is not the case please let us know by comment here.

There's a couple of tutorials on how to set this up. Here for when you're setting up everything new. And here for where you want to setup AGIC for an existing cluster and/or existing App Gateway.

In addition to the advice to leave the DNS till last here (and as per docs):

make sure the vnets dont overlap

as per the note here , if you're using an network overlay (e.g kubenet) that's relying on UDRs (user defined routes) you need to ensure these are available to the app gateway subnet

Do as you have been doing; work backwards from the service checking connectivity. I wouldn't rely on ping. use CURL (either to the IP direct, or name resolution) from a VM or test container shell

One final note; be aware that if you re-write the host header on an http request at a reverse proxy (which it sounds like you are doing) this can impact cookies and oauth as the client sees a different host header to the one the backend listens on. Often it's conceptually easier to preserve the same host header end to end - this does mean you also ref the ssl cert in more than place though

Milo Milovanovic 0

@AirGordon & @Ben Gimblett Thank you for your replies so far.

Sorry, I forgot to mention that I am using nginx ingress controller, not AGIC.

The ingress controller was installed using the following statement:

helm upgrade --install ingress-nginx ingress-nginx \ 	  
--repo https://kubernetes.github.io/ingress-nginx \ 	  
--namespace ingress-nginx --create-namespace \ 	  
--debug \ 	  
--set defaultBackend.nodeSelector."kubernetes.io/os"=linux \ 	  
--set controller.service.loadBalancerIP=10.21.40.254 \ 	  
--set controller.service.annotations."service\.beta\.kubernetes\.io/azure-load-balancer-internal"=true

and the front-end app using following YAML:

apiVersion: apps/v1 
kind: Deployment 
metadata:   
	name: azure-vote-front 
spec:   
	replicas: 1   
	selector:     
		matchLabels:       
			app: azure-vote-front   
	strategy:     
		rollingUpdate:       
			maxSurge: 1       
			maxUnavailable: 1   
	minReadySeconds: 5    
template:     
	metadata:       
		labels:         
			app: azure-vote-front     
	spec:       
		nodeSelector:         
			"beta.kubernetes.io/os": linux       
		containers:       
		- name: azure-vote-front         
		  image: azurevotedtatest.azurecr.io/azure-vote-front:v1         
		  ports:         
		  - containerPort: 80         
		  resources:           
				requests:             
					cpu: 250m           
				limits:             
					cpu: 500m         
		  env:         
		  - name: REDIS           
			value: "azure-vote-back" 
--- 
apiVersion: v1 
kind: Service 
metadata:   
	name: azure-vote-front   
	namespace: default 
spec:   
	type: ClusterIP   
	ports:   
	- port: 80     
	  targetPort: 80   
	selector:     
		app: azure-vote-front 
--- 
apiVersion: networking.k8s.io/v1 
kind: Ingress 
metadata:   
	name: azurevotefront-ingress   
	annotations:     
		#kubernetes.io/ingress.class: nginx      
spec:   
	ingressClassName: nginx    
	rules:     
	- host: azure-vote-front.int.dev.company.com       
	  http:         
		paths:           
		  - path: /             
			pathType: Prefix             
			backend:               
				service:                 
					name: azure-vote-front                 
					port:                   
						number: 80

Actually, after performing all the checks I mentioned earlier, I have a feeling that the problem is somewhere in this configuration.

(The commented out ingress class annotation line was used first, and then the ingressClassName, but both with the same result).

Ben Gimblett 4,560 Reputation points Microsoft Employee

2023-06-26T12:04:54.5233333+00:00

OK so not AGIC. But still ,in your diagram it shows what is in effect the following request path

Client => App Gateway => K8S Ingress (Nginx) => Backend

Where each "=>" is an individual tcp connection

At both the App Gateway and the Ingress controller you have the option of listening on "all hosts" or a specific "host" header.

In the comment above you're config for the ingress and the rule (which otherwise looks OK to me) is not accepting all traffic, but only requests with a hostheader of "azure-vote-front.int.dev.company.com"

In your scenario this means the App Gateway needs to be configured to re-write the client host header so the http request leaving App Gateway contains the right host header for the Nginx ingress rule to filter on

Again, as per AirGordons advice - I'd simplify this for now, ignore the dns and just prove you have connectivity at each step. For the Ingress, you could do that by simplifying the rule (remove the host filter to just test connectivity) and using a test VM to CURL to "http://<ingress private ip>/<your path>" for example

Moving forward; remember that the dns used for the tcp socket connect is Layer 3/4 and the host header is Layer 7.

What do i mean by that ? I mean that it is possible to configure an App Gateway backend to resolve an IP (for the onward connection) using DNS which is not the same "name" as the host header you rewrite for the outgoing request. Although in practice it would make sense if they WERE the same.
Milo Milovanovic 0 Reputation points

2023-06-27T12:04:30.4+00:00

Once again, thank you both for the hints.

Indeed, deploying the app without the ingress hostname rule was successful.

And as @Ben Gimblett suggested, it was necessary to configure this hostname on the GW later on, so I did as on the image below.

However, I am still confused why it didn't work with the first option since I already had the ingress hostname defined as FQDN target in backend pool.

Actually, this is how I saw it in another environment I used for comparison and as kind of a starting point for my setup.
Ben Gimblett 4,560 Reputation points Microsoft Employee

2023-06-27T12:49:04.3766667+00:00

My point was that you need to ensure AppGW overwrites the hostname for the outgoing request towards your ingress
Both the above options SHOULD work fine, provided that (if you choose the dynamic option, pick from backend target) the DNS FQDN entered in the backend setting for the backend DNS lookup matches exactly the host header required
if not, then it's better to be explicit (the second option)

In this context , other than there being one extra config value to maintain there's little between the two options IMHO, they both should achieve the objective, the second option could be viewed as more "explicit" and "specific" though.

Your answer

AirGordon 7,150 Reputation points

2023-06-23T15:21:53.4633333+00:00

I'd start by successfully exposing it via IP through AppGw before then layering on the DNS config. You've got a lot of moving parts to debug.

Here's my "known good" AGIC IP based manifest, it might be a good reference. https://github.com/Gordonby/Snippets/blob/master/AKS/Azure-Vote-Labelled-ILB-AgicPrivate-NetPolicy.yaml
Ben Gimblett 4,560 Reputation points Microsoft Employee

2023-06-26T09:44:16.5933333+00:00

@Milo Milovanovic - thanks for the question. As per my colleagues comment - the assumption when reading your question is that you're using the Az Application Gateway with AGIC (Application Gateway Ingress Controller) whereby the AGIC uses the App Gateway as the reverse proxy and maintains the App Gateway config based on what's been monitored in the cluster.

If this is not the case please let us know by comment here.

There's a couple of tutorials on how to set this up. Here for when you're setting up everything new. And here for where you want to setup AGIC for an existing cluster and/or existing App Gateway.

In addition to the advice to leave the DNS till last here (and as per docs):

make sure the vnets dont overlap

as per the note here , if you're using an network overlay (e.g kubenet) that's relying on UDRs (user defined routes) you need to ensure these are available to the app gateway subnet

Do as you have been doing; work backwards from the service checking connectivity. I wouldn't rely on ping. use CURL (either to the IP direct, or name resolution) from a VM or test container shell

One final note; be aware that if you re-write the host header on an http request at a reverse proxy (which it sounds like you are doing) this can impact cookies and oauth as the client sees a different host header to the one the backend listens on. Often it's conceptually easier to preserve the same host header end to end - this does mean you also ref the ssl cert in more than place though
Milo Milovanovic 0 Reputation points

2023-06-26T11:43:07.89+00:00

@AirGordon & @Ben Gimblett Thank you for your replies so far.

Sorry, I forgot to mention that I am using nginx ingress controller, not AGIC.

The ingress controller was installed using the following statement:

helm upgrade --install ingress-nginx ingress-nginx \ --repo https://kubernetes.github.io/ingress-nginx \ --namespace ingress-nginx --create-namespace \ --debug \ --set defaultBackend.nodeSelector."kubernetes.io/os"=linux \ --set controller.service.loadBalancerIP=10.21.40.254 \ --set controller.service.annotations."service\.beta\.kubernetes\.io/azure-load-balancer-internal"=true

and the front-end app using following YAML:

apiVersion: apps/v1 kind: Deployment metadata: name: azure-vote-front spec: replicas: 1 selector: matchLabels: app: azure-vote-front strategy: rollingUpdate: maxSurge: 1 maxUnavailable: 1 minReadySeconds: 5 template: metadata: labels: app: azure-vote-front spec: nodeSelector: "beta.kubernetes.io/os": linux containers: - name: azure-vote-front image: azurevotedtatest.azurecr.io/azure-vote-front:v1 ports: - containerPort: 80 resources: requests: cpu: 250m limits: cpu: 500m env: - name: REDIS value: "azure-vote-back" --- apiVersion: v1 kind: Service metadata: name: azure-vote-front namespace: default spec: type: ClusterIP ports: - port: 80 targetPort: 80 selector: app: azure-vote-front --- apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: azurevotefront-ingress annotations: #kubernetes.io/ingress.class: nginx spec: ingressClassName: nginx rules: - host: azure-vote-front.int.dev.company.com http: paths: - path: / pathType: Prefix backend: service: name: azure-vote-front port: number: 80

Actually, after performing all the checks I mentioned earlier, I have a feeling that the problem is somewhere in this configuration.

(The commented out ingress class annotation line was used first, and then the ingressClassName, but both with the same result).
Ben Gimblett 4,560 Reputation points Microsoft Employee

2023-06-26T12:04:54.5233333+00:00

OK so not AGIC. But still ,in your diagram it shows what is in effect the following request path

Client => App Gateway => K8S Ingress (Nginx) => Backend

Where each "=>" is an individual tcp connection

At both the App Gateway and the Ingress controller you have the option of listening on "all hosts" or a specific "host" header.

In the comment above you're config for the ingress and the rule (which otherwise looks OK to me) is not accepting all traffic, but only requests with a hostheader of "azure-vote-front.int.dev.company.com"

In your scenario this means the App Gateway needs to be configured to re-write the client host header so the http request leaving App Gateway contains the right host header for the Nginx ingress rule to filter on

Again, as per AirGordons advice - I'd simplify this for now, ignore the dns and just prove you have connectivity at each step. For the Ingress, you could do that by simplifying the rule (remove the host filter to just test connectivity) and using a test VM to CURL to "http://<ingress private ip>/<your path>" for example

Moving forward; remember that the dns used for the tcp socket connect is Layer 3/4 and the host header is Layer 7.

What do i mean by that ? I mean that it is possible to configure an App Gateway backend to resolve an IP (for the onward connection) using DNS which is not the same "name" as the host header you rewrite for the outgoing request. Although in practice it would make sense if they WERE the same.
Milo Milovanovic 0 Reputation points

2023-06-27T12:04:30.4+00:00

Once again, thank you both for the hints.

Indeed, deploying the app without the ingress hostname rule was successful.

And as @Ben Gimblett suggested, it was necessary to configure this hostname on the GW later on, so I did as on the image below.

However, I am still confused why it didn't work with the first option since I already had the ingress hostname defined as FQDN target in backend pool.

Actually, this is how I saw it in another environment I used for comparison and as kind of a starting point for my setup.
Ben Gimblett 4,560 Reputation points Microsoft Employee

2023-06-27T12:49:04.3766667+00:00

My point was that you need to ensure AppGW overwrites the hostname for the outgoing request towards your ingress
Both the above options SHOULD work fine, provided that (if you choose the dynamic option, pick from backend target) the DNS FQDN entered in the backend setting for the backend DNS lookup matches exactly the host header required
if not, then it's better to be explicit (the second option)

In this context , other than there being one extra config value to maintain there's little between the two options IMHO, they both should achieve the objective, the second option could be viewed as more "explicit" and "specific" though.

Share via

Backend pool of app gateway is unhealthy and backend app can't be reached through ingress controller

Your answer