ImagePullBackOff persists on AKS node despite confirmed AcrPull role assignment to kubelet identity
David Campbell
0
Reputation points
Azure Support Ticket: ImagePullBackOff Issue on AKS
Issue Summary
Title:
ImagePullBackOff persists on AKS node despite confirmed AcrPull role assignment to kubelet identity
Description:
We are experiencing an ImagePullBackOff
error on one of our AKS pods pulling from our private Azure Container Registry (ACR), even though the AKS kubelet identity has been granted the AcrPull
role. Other pods on different nodes are pulling the same image (frontend:latest
) successfully. The affected node consistently fails with a 401 Unauthorized error when attempting to pull the image.
Environment Details
- Cluster Name:
apip-dev-aks-uaenorth
- Resource Group:
apip-dev-rg-uaenorth
- ACR Name:
apipdevacr
- Region:
uaenorth
- Image:
apipdevacr.azurecr.io/frontend:latest
- Kubelet Object ID:
747ad783-9416-48a7-bcab-a1bc64898b45
- ACR Scope: Correctly scoped to the ACR registry resource
- Assignment Timestamp:
2025-04-16T05:49:16Z
Troubleshooting Performed
- Confirmed kubelet identity:
- Retrieved from
az aks show --query identityProfile.kubeletidentity.objectId
- Retrieved from
- Verified ACR role assignment:
- Used
az role assignment list
to confirm theAcrPull
role is correctly assigned to the kubelet identity for the ACR scope
- Used
- Image verified in ACR:
- Pulled
frontend:latest
manually using Docker andaz acr login
- Working successfully on other AKS nodes
- Pulled
- Pod consistently fails on node
aks-agentpool-22692403-vmss000000
:- Output of
kubectl describe pod
confirms:failed to fetch anonymous token: unexpected status from GET ... 401 Unauthorized
- Output of
- Deleted and recreated pod:
- Pod reschedules but fails again on the same node
- Confirmed issue is isolated to a specific node:
- Other pods using the same image are running fine on different nodes
- Waited >30 minutes for potential role propagation:
- Error persists beyond typical AAD propagation window
Request
We request assistance from Azure support to:
- Investigate potential misconfiguration or delay in RBAC token propagation at the node or VMSS instance level
- Validate whether the kubelet on the specific node has successfully received the updated token permissions
- Suggest additional diagnostics or a workaround to refresh or reset the identity on the affected node
Sign in to answer