Hi,
I am working on to create a kubernetes cluster in Azure. The whole infrastructure must be coded in terraform. This is fine. However, when I deploy the AKS cluster, the VMSS creation is always failing with the following error in the activity log:
{
"status": "Failed",
"error": {
"code": "ResourceOperationFailure",
"message": "The resource operation completed with terminal provisioning state 'Failed'.",
"details": [
{
"code": "VMExtensionProvisioningError",
"target": "0", "message": "VM has reported a failure when processing extension 'vmssCSE' (publisher 'Microsoft.Azure.Extensions' and type 'CustomScript'). Error message: \"Enable failed: failed to execute command: command terminated with exit status=124\\\\n[stdout]\\\\n{ \"ExitCode\": \"124\", \"Output\": \"0i.privatelink.westeurope.azmk8s.io 443\\\\\\\\n+ '[' 62 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz clus-non-prod-cluster-cdyrgl0i.privatelink.westeurope.azmk8s.io 443\\\\\\\\n+ '[' 63 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz clus-non-prod-cluster-cdyrgl0i.privatelink.westeurope.azmk8s.io 443\\\\\\\\n+ '[' 64 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz clus-non-prod-cluster-cdyrgl0i.privatelink.westeurope.azmk8s.io 443\\\\\\\\n+ '[' 65 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz clus-non-prod-cluster-cdyrgl0i.privatelink.westeurope.azmk8s.io 443\\\\\\\\n+ '[' 66 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz clus-non-prod-cluster-cdyrgl0i.privatelink.westeurope.azmk8s.io 443\\\\\\\\n+ '[' 67 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz clus-non-prod-cluster-cdyrgl0i.privatelink.westeurope.azmk8s.io 443\\\\\\\\n+ '[' 68 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz clus-non-prod-cluster-cdyrgl0i.privatelink.westeurope.azmk8s.io 443\\\\\\\\n+ '[' 69 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz clus-non-prod-cluster-cdyrgl0i.privatelink.westeurope.azmk8s.io 443\\\\\\\\n+ '[' 70 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz clus-non-prod-cluster-cdyrgl0i.privatelink.westeurope.azmk8s.io 443\\\\\\\\n+ '[' 71 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz clus-non-prod-cluster-cdyrgl0i.privatelink.westeurope.azmk8s.io 443\\\\\\\\n+ '[' 72 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz clus-non-prod-cluster-cdyrgl0i.privatelink.westeurope.azmk8s.io 443\\\\\\\\n+ '[' 73 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz clus-non-prod-cluster-cdyrgl0i.privatelink.westeurope.azmk8s.io 443\\\\\\\\n+ '[' 74 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz clus-non-prod-cluster-cdyrgl0i.privatelink.westeurope.azmk8s.io 443\\\\\\\\n+ '[' 75 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz clus-non-prod-cluster-cdyrgl0i.privatelink.westeurope.azmk8s.io 443\\\\\\\\n+ '[' 76 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz clus-non-prod-cluster-cdyrgl0i.privatelink.westeurope.azmk8s.io 443\\\\\\\\n+ '[' 77 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz clus-non-prod-cluster-cdyrgl0i.privatelink.westeurope.azmk8s.io 443\\\\\\\\n+ '[' 78 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz clus-non-prod-cluster-cdyrgl0i.privatelink.westeurope.azmk8s.io 443\\\\\\\\n+ '[' 79 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz clus-non-prod-cluster-cdyrgl0i.privatelink.westeurope.azmk8s.io 443\\\\\\\\n+ '[' 80 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz clus-non-prod-cluster-cdyrgl0i.privatelink.westeurope.azmk8s.io 443\\\\\\\\n+ '[' 81 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz clus-non-prod-cluster-cdyrgl0i.privatelink.westeurope.azmk8s.io 443\", \"Error\": \"\", \"ExecDuration\": \"900\", \"KernelStartTime\": \"Wed 2023-10-25 16:32:41 UTC\", \"CloudInitLocalStartTime\": \"Wed 2023-10-25 16:32:45 UTC\", \"CloudInitStartTime\": \"Wed 2023-10-25 16:32:48 UTC\", \"CloudFinalStartTime\": \"Wed 2023-10-25 16:32:56 UTC\", \"NetworkdStartTime\": \"Wed 2023-10-25 16:32:46 UTC\", \"CSEStartTime\": \"Wed Oct 25 16:33:02 UTC 2023\", \"GuestAgentStartTime\": \"Wed 2023-10-25 16:32:55 UTC\", \"SystemdSummary\": \"Startup finished in 2.546s (kernel) + 1min 30.260s (userspace) = 1min 32.807s \\\\\\\\ngraphical.target reached after 12.209s in userspace\", \"BootDatapoints\": { \"KernelStartTime\": \"Wed 2023-10-25 16:32:41 UTC\", \"CSEStartTime\": \"Wed Oct 25 16:33:02 UTC 2023\", \"GuestAgentStartTime\": \"Wed 2023-10-25 16:32:55 UTC\", \"KubeletStartTime\": \"Wed 2023-10-25 16:33:05 UTC\" } }\\\\n\\\\n[stderr]\\\\n\". More information on troubleshooting is available at https://aka.ms/VMExtensionCSELinuxTroubleshoot. "
}
]
}
}
The same issue happening when I want to create the cluster manually without terraform code.
Some words about the infra:
- Terraform uses userDefinedRouting network outbound setup, while the manual one uses Load Balancer
- The default outbound route is going through an Azure Firewall, where all traffic is allowed to internet direction. During the deployment a lot of traffic is visible on the Azure Firewall's log from AKS nodes.
- DNS name resolution is working fine. Tested from a linux machine created manually for testing purposes into the same vnet where AKS is deployed in.
Anyone any idea?
Thanks!
G