An Azure service that provides a platform-managed, scalable, and highly available application delivery controller as a service.
Hi @ Socotec Admin Xian Zhang,
Welcome to Microsoft Q&A Platform.
When Azure Application Gateway uses FQDN-based backends and retains cached DNS resolution or existing backend connections, even while health probes continue reporting the backend as healthy.
Can Application Gateway retain stale DNS or connection state for ACA FQDN backends while probes still show healthy?
Yes. Application Gateway caches DNS resolution for backend FQDNs based on the DNS TTL and also maintains pooled TCP/TLS connections to backend IPs.
If the Azure Container Apps (ACA) environment changes backend IPs during scaling events, infrastructure updates,ingress recycling,or platform maintenance,live traffic may continue attempting to use stale backend connections or cached IPs until Application Gateway refreshes them.
In some situations: lightweight health probes may still succeed,while live client requests fail with intermittent 502 responses.
Does removing and re-adding the backend force a refresh?
Yes. Removing and re-adding the backend target forces Application Gateway to: refresh DNS resolution,rebuild backend connection pools,and establish new backend sessions.
This aligns with your observation that traffic recovered immediately after re-adding the backend.
Known cases where probes pass but live traffic never reaches ACA?
Common scenarios are:
- Backend IP changed (scale-up/down, platform upgrade) during the cached DNS window.
- TLS handshake errors because the SNI/Host header in live traffic didn’t match what the CA certificate expects.
- HTTP/2 connection reuse glitches on v2 gateways.
Recommended HTTP settings for Azure Container Apps FQDN backends
- Pick host name from backend target: Yes. This ensures the Host header equals your ..eastus2.azurecontainerapps.io domain.
- SNI: Enable. Container Apps uses a TLS certificate that’s valid for the generated FQDN, so you need SNI so the correct cert is presented.
- Probe host behavior: Use a custom health probe that also “Pick host name from backend target.” Point it at a lightweight endpoint (e.g. /health or /). This makes the probe path and Host header match your real traffic.
What logs/metrics to collect to differentiate DNS/AGW data-plane issues from ACA ingress issues?
- On the Application Gateway side: • Access logs (ensure you’re logging the backend status code, request time, host and port). • Enable the “502 error origin” diagnostic in the portal (AppGw502StatusCodeAzurePortalInsight) to see if the 502 is truly coming from AGW vs. the backend. • Metrics: FailedRequests (500–599), UnhealthyHostCount, ConnectionErrors.
- On the Container Apps side: • Ingress logs (Envoy): see if any request reached the mesh. • App logs / Container stdout. • Azure Monitor metrics for HTTP 4xx/5xx and any throttling.
- DNS angle: if you use custom/private DNS, check your DNS server’s query logs or enable Azure DNS analytics to see if the FQDN is resolving to the expected IP at the time of failure.
Reference links for more troubleshooting :
Troubleshoot bad gateway (502) errors in Application Gateway https://docs.microsoft.com/azure/application-gateway/application-gateway-troubleshooting-502
Backend health troubleshooting in Application Gateway https://docs.microsoft.com/azure/application-gateway/application-gateway-backend-health-troubleshooting
Application Gateway access log reference https://docs.microsoft.com/azure/application-gateway/monitor-application-gateway-reference#access-log-category
High-traffic scaling guide for Application Gateway https://docs.microsoft.com/azure/application-gateway/high-traffic-support
DNS/probe caching behavior notes https://learn.microsoft.com/azure/application-gateway/application-gateway-probe-overview (see DNS TTL section)
Please
and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.