aks kube-system readiness liveness probe keep failing

Aniss Chohra 0 Reputation points
2025-05-28T17:09:16.82+00:00

Description of the Problem

Hello,

I am suing the AKS-Construction Helper (https://github.com/Azure/AKS-Construction) to deploy a public azure kubernetes cluster. At the end, I get the following command and parameters file:

az deployment group create -g alfred-butler-dev-rg  --template-uri https://github.com/Azure/AKS-Construction/releases/download/0.10.7/main.json --parameters \
	resourceName=my-aks-dev\
	managedNodeResourceGroup=my-managed-aks-dev-rg \
	kubernetesVersion=1.32.4 \
	agentCount=5 \
	upgradeChannel=stable \
	agentCountMax=10 \
	osDiskType=Managed \
	osDiskSizeGB=32 \
	custom_vnet=true \
	enable_aad=true \
	AksDisableLocalAccounts=true \
	enableAzureRBAC=true \
	adminPrincipalId=$(az ad signed-in-user show --query id --out tsv) \
	registries_sku=Basic \
	acrPushRolePrincipalId=$(az ad signed-in-user show --query id --out tsv) \
	azureFirewalls=true \
	certManagerFW=true \
	enableNodePublicIP=true \
	omsagent=true \
	retentionInDays=90 \
	containerLogsV2BasicLogs=true \
	serviceMeshProfile=NaN \
	azurepolicy=audit \
	podCidr=10.240.100.0/24 \
	cniDynamicIpAllocation=true \
	dnsZoneId=/subscriptions/72467751-07c0-46cb-8104-517f3e9cfd63/resourceGroups/civilia-dev-dns-rg/providers/Microsoft.Network/dnszones/dev-civilia.ca \
	ingressApplicationGateway=true \
	appGWcount=8 \
	appGWsku=WAF_v2 \
	appGWmaxCount=20 \
	appgwKVIntegration=true \
	keyVaultKmsCreate=true \
	keyVaultKmsOfficerRolePrincipalId=$(az ad signed-in-user show --query id --out tsv) \
	keyVaultAksCSI=true \
	keyVaultCreate=true \
	keyVaultOfficerRolePrincipalId=$(az ad signed-in-user show --query id --out tsv) \
	oidcIssuer=true \
	workloadIdentity=true \
	enableSysLog=true

when I connect to the created cluster and check the pods in the kube-system namespace; I get the following:

kubectl get pods -o wide --namespace kube-system NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES aks-secrets-store-csi-driver-75kgs 2/3 CrashLoopBackOff 28 (3m42s ago) 54m 10.240.100.95 aks-userpool01-68104822-vmss000001 <none> <none> aks-secrets-store-csi-driver-hx8h5 1/3 CrashLoopBackOff 29 (3m17s ago) 58m 10.240.100.19 aks-agentpool-30311657-vmss000000 <none> <none> aks-secrets-store-csi-driver-qx4rg 2/3 CrashLoopBackOff 28 (4m25s ago) 54m 10.240.100.58 aks-userpool01-68104822-vmss000000 <none> <none> aks-secrets-store-csi-driver-wpgn6 2/3 CrashLoopBackOff 28 (4m37s ago) 54m 10.240.100.84 aks-userpool01-68104822-vmss000003 <none> <none> aks-secrets-store-csi-driver-x6dt8 1/3 CrashLoopBackOff 27 (3m44s ago) 53m 10.240.100.106 aks-userpool01-68104822-vmss000004 <none> <none> aks-secrets-store-csi-driver-x8bq4 2/3 CrashLoopBackOff 28 (4m17s ago) 54m 10.240.100.51 aks-userpool01-68104822-vmss000002 <none> <none> aks-secrets-store-provider-azure-c57hs 1/1 Running 0 54m 10.240.0.9 aks-userpool01-68104822-vmss000001 <none> <none> aks-secrets-store-provider-azure-h2rl5 1/1 Running 0 54m 10.240.0.7 aks-userpool01-68104822-vmss000002 <none> <none> aks-secrets-store-provider-azure-l5scf 1/1 Running 0 58m 10.240.0.4 aks-agentpool-30311657-vmss000000 <none> <none> aks-secrets-store-provider-azure-txqkz 1/1 Running 0 54m 10.240.0.6 aks-userpool01-68104822-vmss000003 <none> <none> aks-secrets-store-provider-azure-wzsmn 1/1 Running 0 53m 10.240.0.5 aks-userpool01-68104822-vmss000004 <none> <none> aks-secrets-store-provider-azure-xbn44 1/1 Running 0 54m 10.240.0.8 aks-userpool01-68104822-vmss000000 <none> <none> ama-logs-5r878 2/2 Running 0 54m 10.240.100.76 aks-userpool01-68104822-vmss000003 <none> <none> ama-logs-c8j4w 2/2 Running 0 53m 10.240.100.109 aks-userpool01-68104822-vmss000004 <none> <none> ama-logs-drpzj 2/2 Running 0 57m 10.240.100.30 aks-agentpool-30311657-vmss000000 <none> <none> ama-logs-l66xn 2/2 Running 0 54m 10.240.100.94 aks-userpool01-68104822-vmss000001 <none> <none> ama-logs-rs-78d658c9-tfkh7 1/1 Running 0 57m 10.240.100.13 aks-agentpool-30311657-vmss000000 <none> <none> ama-logs-tpxdj 2/2 Running 0 54m 10.240.100.62 aks-userpool01-68104822-vmss000000 <none> <none> ama-logs-zz7xd 2/2 Running 0 54m 10.240.100.39 aks-userpool01-68104822-vmss000002 <none> <none> azure-cns-8z68p 1/1 Running 0 54m 10.240.0.7 aks-userpool01-68104822-vmss000002 <none> <none> azure-cns-9ztdr 1/1 Running 0 54m 10.240.0.6 aks-userpool01-68104822-vmss000003 <none> <none> azure-cns-fdmtr 1/1 Running 0 54m 10.240.0.8 aks-userpool01-68104822-vmss000000 <none> <none> azure-cns-g6ssq 1/1 Running 0 58m 10.240.0.4 aks-agentpool-30311657-vmss000000 <none> <none> azure-cns-jdgpt 1/1 Running 0 54m 10.240.0.9 aks-userpool01-68104822-vmss000001 <none> <none> azure-cns-tlgf5 1/1 Running 0 53m 10.240.0.5 aks-userpool01-68104822-vmss000004 <none> <none> azure-policy-86887785db-k885d 1/1 Running 0 60m 10.240.100.24 aks-agentpool-30311657-vmss000000 <none> <none> azure-policy-webhook-6fbc657745-9jddk 0/1 CrashLoopBackOff 9 (4m21s ago) 27m 10.240.100.104 aks-userpool01-68104822-vmss000004 <none> <none> azure-wi-webhook-controller-manager-58bf869886-6gdll 0/1 CrashLoopBackOff 15 (2m33s ago) 57m 10.240.100.25 aks-agentpool-30311657-vmss000000 <none> <none> azure-wi-webhook-controller-manager-58bf869886-tjb4d 0/1 CrashLoopBackOff 15 (2m26s ago) 57m 10.240.100.22 aks-agentpool-30311657-vmss000000 <none> <none> cloud-node-manager-d2lf8 1/1 Running 0 54m 10.240.0.6 aks-userpool01-68104822-vmss000003 <none> <none> cloud-node-manager-pj9sq 1/1 Running 0 58m 10.240.0.4 aks-agentpool-30311657-vmss000000 <none> <none> cloud-node-manager-s5j29 1/1 Running 0 54m 10.240.0.7 aks-userpool01-68104822-vmss000002 <none> <none> cloud-node-manager-tsjvv 1/1 Running 0 54m 10.240.0.8 aks-userpool01-68104822-vmss000000 <none> <none> cloud-node-manager-z2zpg 1/1 Running 0 54m 10.240.0.9 aks-userpool01-68104822-vmss000001 <none> <none> cloud-node-manager-zc2mw 1/1 Running 0 53m 10.240.0.5 aks-userpool01-68104822-vmss000004 <none> <none> coredns-77f74c584-2slsm 0/1 Running 0 60m 10.240.100.7 aks-agentpool-30311657-vmss000000 <none> <none> coredns-autoscaler-79bcb4fd6b-l9k2q 1/1 Running 22 (5m28s ago) 60m 10.240.100.14 aks-agentpool-30311657-vmss000000 <none> <none> csi-azuredisk-node-fc5b9 3/3 Running 0 54m 10.240.0.7 aks-userpool01-68104822-vmss000002 <none> <none> csi-azuredisk-node-fw45w 3/3 Running 0 54m 10.240.0.8 aks-userpool01-68104822-vmss000000 <none> <none> csi-azuredisk-node-jvtmx 3/3 Running 0 53m 10.240.0.5 aks-userpool01-68104822-vmss000004 <none> <none> csi-azuredisk-node-nk7gk 3/3 Running 0 58m 10.240.0.4 aks-agentpool-30311657-vmss000000 <none> <none> csi-azuredisk-node-ttpj8 3/3 Running 0 54m 10.240.0.9 aks-userpool01-68104822-vmss000001 <none> <none> csi-azuredisk-node-zkzpv 3/3 Running 0 54m 10.240.0.6 aks-userpool01-68104822-vmss000003 <none> <none> csi-azurefile-node-7fdzf 3/3 Running 0 54m 10.240.0.6 aks-userpool01-68104822-vmss000003 <none> <none> csi-azurefile-node-8mhbt 3/3 Running 0 54m 10.240.0.7 aks-userpool01-68104822-vmss000002 <none> <none> csi-azurefile-node-8w2pd 3/3 Running 0 58m 10.240.0.4 aks-agentpool-30311657-vmss000000 <none> <none> csi-azurefile-node-kcrsv 3/3 Running 0 54m 10.240.0.9 aks-userpool01-68104822-vmss000001 <none> <none> csi-azurefile-node-l5nkt 3/3 Running 0 54m 10.240.0.8 aks-userpool01-68104822-vmss000000 <none> <none> csi-azurefile-node-w6mpb 3/3 Running 0 53m 10.240.0.5 aks-userpool01-68104822-vmss000004 <none> <none> ingress-appgw-deployment-5c9c7bbdbc-jwgsm 0/1 CrashLoopBackOff 19 (4m37s ago) 60m 10.240.100.16 aks-agentpool-30311657-vmss000000 <none> <none> konnectivity-agent-7ccd6c6945-vfl72 0/1 Running 2 (2m27s ago) 8m27s 10.240.100.54 aks-userpool01-68104822-vmss000000 <none> <none> konnectivity-agent-autoscaler-844df78bbd-bl2w7 0/1 CrashLoopBackOff 4 (47s ago) 3m27s 10.240.100.33 aks-agentpool-30311657-vmss000000 <none> <none> kube-proxy-chrrl 1/1 Running 0 58m 10.240.0.4 aks-agentpool-30311657-vmss000000 <none> <none> kube-proxy-kslf5 1/1 Running 0 53m 10.240.0.5 aks-userpool01-68104822-vmss000004 <none> <none> kube-proxy-lbf4z 1/1 Running 0 54m 10.240.0.9 aks-userpool01-68104822-vmss000001 <none> <none> kube-proxy-lsrqm 1/1 Running 0 54m 10.240.0.8 aks-userpool01-68104822-vmss000000 <none> <none> kube-proxy-mfkgf 1/1 Running 0 54m 10.240.0.7 aks-userpool01-68104822-vmss000002 <none> <none> kube-proxy-txpsl 1/1 Running 0 54m 10.240.0.6 aks-userpool01-68104822-vmss000003 <none> <none> metrics-server-6c47bdccfd-fwgjc 1/2 CrashLoopBackOff 15 (2m32s ago) 60m 10.240.100.8 aks-agentpool-30311657-vmss000000 <none> <none> metrics-server-6c47bdccfd-kjdsz 1/2 CrashLoopBackOff 15 (2m13s ago) 60m 10.240.100.10 aks-agentpool-30311657-vmss000000 <none> <none>

It turns out all the failing deployments and their corresponding pods are due to failing readiness and liveness probes (like the following logs from the ingress-appgw deployment):

Events: Type Reason Age From Message


Warning FailedScheduling 59m (x17 over 62m) default-scheduler no nodes available to schedule pods Warning FailedScheduling 59m default-scheduler 0/1 nodes are available: 1 node(s) had untolerated taint {node.cloudprovider.kubernetes.io/uninitialized: true}. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling. Normal Scheduled 59m default-scheduler Successfully assigned kube-system/ingress-appgw-deployment-5c9c7bbdbc-jwgsm to aks-agentpool-30311657-vmss000000 Normal Pulling 59m kubelet Pulling image "mcr.microsoft.com/azure-application-gateway/kubernetes-ingress:1.8.1" Normal Pulled 59m kubelet Successfully pulled image "mcr.microsoft.com/azure-application-gateway/kubernetes-ingress:1.8.1" in 1.586s (1.586s including waiting). Image size: 27686229 bytes. Warning Unhealthy 56m (x8 over 59m) kubelet Liveness probe failed: Get "http://10.240.100.16:8123/health/alive": dial tcp 10.240.100.16:8123: connect: connection refused Normal Created 54m (x6 over 59m) kubelet Created container: ingress-appgw-container Normal Started 54m (x6 over 59m) kubelet Started container ingress-appgw-container Normal Killing 14m (x17 over 58m) kubelet Container ingress-appgw-container failed liveness probe, will be restarted Warning Unhealthy 14m (x114 over 59m) kubelet Readiness probe failed: Get "http://10.240.100.16:8123/health/ready": dial tcp 10.240.100.16:8123: connect: connection refused Warning BackOff 4m15s (x166 over 53m) kubelet Back-off restarting failed container ingress-appgw-container in pod ingress-appgw-deployment-5c9c7bbdbc-jwgsm_kube-system(522153dc-3909-4fb8-af56-44e2e6d90c91) Normal Pulled 70s (x20 over 58m) kubelet Container image "mcr.microsoft.com/azure-application-gateway/kubernetes-ingress:1.8.1" already present on machine

I also noticed that the following command fails for any pod (i have to describe the pod to get the logs instead):

kubectl logs ingress-appgw-deployment-5c9c7bbdbc-jwgsm --namespace kube-system Error from server: Get "https://10.240.0.4:10250/containerLogs/kube-system/ingress-appgw-deployment-5c9c7bbdbc-jwgsm/ingress-appgw-container": EOF

Did anyone get these issues? I am confused and stuck to find a solution. Thanks.

Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
2,441 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Pramidha Yathipathi 1,135 Reputation points Microsoft External Staff Moderator
    2025-05-30T18:40:43.63+00:00

    Hi Aniss Chohra,

    Gather IP ranges: Collect the IP ranges for

    AKS node subnet

    Pod CIDR ranges (from your AKS cluster config)

    Application Gateway subnet

    Other VNets/subnets peered

    Check peering configs: Confirm peering allows forwarded traffic.

    Create/adjust firewall rules:

    Create Network Rules to allow traffic between these IP ranges on required ports.

    Create Application Rules or Network Rules for outbound traffic to Azure services.

    Use service tags where possible to simplify.

    Use Azure Firewall Diagnostic Logs

    Enable diagnostics to see what traffic is blocked.

    Analyze the logs to pinpoint which IPs or ports are being denied.

    Adjust rules accordingly.
    https://learn.microsoft.com/en-us/azure/aks/limit-egress-traffic?tabs=aks-with-system-assigned-identities

    https://learn.microsoft.com/en-us/azure/architecture/guide/aks/aks-firewall

    If you found information helpful, please click "Upvote" on the post to let us know any further queries feel free to reach out us we are happy to assist you.

    Thank You.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.