Troubleshooting in Application Gateway for Containers
This article provides some guidance to help you troubleshoot common problems in Application Gateway for Containers.
Find the version of ALB Controller
Before you start troubleshooting, determine the version of ALB Controller that is deployed. You can determine which version of ALB Controller is running by using the following kubectl command (ensure you substitute your namespace if not using the default namespace of azure-alb-system
):
kubectl get deployment -n azure-alb-system -o wide
Example output:
NAME | READY | UP-TO-DATE | AVAILABLE | AGE | CONTAINERS | IMAGES | SELECTOR |
---|---|---|---|---|---|---|---|
alb-controller | 2/2 | 2 | 2 | 18d | alb-controller | mcr.microsoft.com/application-lb/images/alb-controller:1.3.7 | app=alb-controller |
alb-controller-bootstrap | 1/1 | 1 | 1 | 18d | alb-controller-bootstrap | mcr.microsoft.com/application-lb/images/alb-controller-bootstrap:1.3.7 | app=alb-controller-bootstrap |
In this example, the ALB controller version is 1.3.7.
The ALB Controller version can be upgraded by running the helm upgrade alb-controller
command. For more information, see Install the ALB Controller.
Tip
The latest ALB Controller version can be found in the ALB Controller release notes.
Collect ALB Controller logs
Logs can be collected from the ALB Controller by using the kubectl logs command referencing the ALB Controller pod.
Get the running ALB Controller pod name
Run the following kubectl command. Ensure you substitute your namespace if not using the default namespace of
azure-alb-system
:kubectl get pods -n azure-alb-system
You should see output similar to the following example. Pod names might differ slightly.
NAME READY STATUS RESTARTS AGE alb-controller-6648c5d5c-sdd9t 1/1 Running 0 4d6h alb-controller-6648c5d5c-au234 1/1 Running 0 4d6h alb-controller-bootstrap-6648c5d5c-hrmpc 1/1 Running 0 4d6h ALB controller uses an election provided by controller-runtime manager to determine an active and standby pod for high availability.
Copy the name of each alb-controller pod (not the bootstrap pod, in this case:
alb-controller-6648c5d5c-sdd9t
andalb-controller-6648c5d5c-au234
) and run the following command to determine the active pod.kubectl logs alb-controller-6648c5d5c-sdd9t -n azure-alb-system -c alb-controller | grep "successfully acquired lease"
You should see the following if the pod is primary:
successfully acquired lease azure-alb-system/alb-controller-leader-election
Collect the logs
Logs from ALB Controller are returned in JSON format.
Execute the following kubectl command, replacing the name with the pod name returned in step 1:
kubectl logs -n azure-alb-system alb-controller-6648c5d5c-sdd9t
Similarly, you can redirect the output of the existing command to a file by specifying the greater than (>) sign and the filename to write the logs to:
kubectl logs -n azure-alb-system alb-controller-6648c5d5c-sdd9t > alb-controller-logs.json
Configuration errors
Application Gateway for Containers returns 500 status code
Scenarios in which you would notice a 500-error code on Application Gateway for Containers are as follows:
Invalid backend Entries : A backend is defined as invalid in the following scenarios:
- It refers to an unknown or unsupported kind of resource. In this case, the HTTPRoute's status has a condition with reason set to
InvalidKind
and the message explains which kind of resource is unknown or unsupported. - It refers to a resource that doesn't exist. In this case, the HTTPRoute's status has a condition with reason set to
BackendNotFound
and the message explains that the resource doesn't exist. - It refers to a resource in another namespace when the reference isn't explicitly allowed by a ReferenceGrant (or equivalent concept). In this case, the HTTPRoute's status has a condition with reason set to
RefNotPermitted
and the message explains which cross-namespace reference isn't allowed.
For instance, if an HTTPRoute has two backends specified with equal weights, and one is invalid 50 percent of the traffic must receive a 500.
- It refers to an unknown or unsupported kind of resource. In this case, the HTTPRoute's status has a condition with reason set to
No endpoints found for all backends: when there are no endpoints found for all the backends referenced in an HTTPRoute, a 500 error code is obtained.
Application Load Balancer custom resource doesn't reflect Ready status
Symptoms
ApplicationLoadBalancer custom resource status message continually says "Application Gateway for Containers resource Application Gateway for Containers-name
is undergoing an update."
The following logs are repeated by the primary alb-controller pod.
{"level":"info","version":"x.x.x","Timestamp":"2024-02-26T20:31:53.760150719Z","message":"Stream opened for config updates"}
{"level":"info","version":"x.x.x","operationID":"aaaa0000-bb11-2222-33cc-444444dddddd","Timestamp":"2024-02-26T20:31:53.760313623Z","message":"Successfully sent config update request"}
{"level":"error","version":"x.x.x","error":"rpc error: code = PermissionDenied desc = ALB Controller with object id 'aaaa0000-bb11-2222-33cc-444444dddddd' does not have authorization to perform action on Application Gateway for Containers resource.Please check RBAC delegations to the Application Gateway for Containers resource.","Timestamp":"2024-02-26T20:31:53.769444995Z","message":"Unable to capture config update response"}
{"level":"info","version":"x.x.x","Timestamp":"2024-02-26T20:31:53.769504489Z","message":"Retrying to open config update stream"}
{"level":"info","version":"x.x.x","Timestamp":"2024-02-26T20:31:54.461487406Z","message":"Stream opened up for endpoint updates"}
{"level":"info","version":"x.x.x","operationID":"808825c2-b0a8-476b-b83a-8e7357c55750","Timestamp":"2024-02-26T20:31:54.462070039Z","message":"Successfully sent endpoint update request"}
{"level":"error","version":"x.x.x","error":"rpc error: code = PermissionDenied desc = ALB Controller with object id 'aaaa0000-bb11-2222-33cc-444444dddddd' does not have authorization to perform action on Application Gateway for Containers resource.Please check RBAC delegations to the Application Gateway for Containers resource.","Timestamp":"2024-02-26T20:31:54.470728646Z","message":"Unable to capture endpoint update response"}
{"level":"info","version":"x.x.x","Timestamp":"2024-02-26T20:31:54.47077373Z","message":"Retrying to open up endpoint update stream"}
Kubernetes Gateway resource fails to get token from credential chain
Symptoms
No changes to HttpRoutes are being applied to Application Gateway for Containers.
The following error message is returned on the Kubernetes Gateway resource and no changes are reflected for any HttpRoute resources.
status:
conditions:
- lastTransitionTime: "2023-04-28T22:08:34Z"
message: The Gateway is not scheduled
observedGeneration: 2
reason: Scheduled
status: "False"
type: Scheduled
- lastTransitionTime: "2023-04-28T22:08:34Z"
message: "No addresses have been assigned to the Gateway : failed to get token
from credential chain: [FromAssertion(): http call(https://login.microsoftonline.com/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/oauth2/v2.0/token)(POST)
error: reply status code was 401:\n{\"error\":\"unauthorized_client\",\"error_description\":\"AADSTS70021:
No matching federated identity record found for presented assertion. Assertion
Issuer: 'https://azureregion.oic.prod-aks.azure.com/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/'.
Assertion Subject: 'system:serviceaccount:azure-application-lb-system:gateway-controller-sa'.
Assertion Audience: 'api://AzureADTokenExchange'. https://docs.microsoft.com/en-us/azure/active-directory/develop/workload-identity-federation\\r\\nTrace
ID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx\\r\\nCorrelation ID: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx\\r\\nTimestamp:
2023-04-28 22:08:46Z\",\"error_codes\":[70021],\"timestamp\":\"2023-04-28 22:08:46Z\",\"trace_id\":\"aaaa0000-bb11-2222-33cc-444444dddddd\",\"correlation_id\":\"aaaa0000-bb11-2222-33cc-444444dddddd\",\"error_uri\":\"https://login.microsoftonline.com/error?code=70021\"}
DefaultAzureCredential: failed to acquire a token.\nAttempted credentials:\n\tEnvironmentCredential:
incomplete environment variable configuration. Only AZURE_TENANT_ID and AZURE_CLIENT_ID
are set\n\tManagedIdentityCredential: IMDS token request timed out\n\tAzureCLICredential:
fork/exec /bin/sh: no such file or directory]"
observedGeneration: 2
reason: AddressNotAssigned
status: "False"
type: Ready
Solution
Ensure the federated credentials of the managed identity for the ALB Controller pod to make changes to Application Gateway for Containers are configured in Azure. Instructions on how to configure federated credentials can be found in the quickstart guides: