Azure kubernetes service is not deploying neither with terraform, nor with manually

Gabor Varga 26 Reputation points


I am working on to create a kubernetes cluster in Azure. The whole infrastructure must be coded in terraform. This is fine. However, when I deploy the AKS cluster, the VMSS creation is always failing with the following error in the activity log:

	"status": "Failed",
	"error": {
		"code": "ResourceOperationFailure",
		"message": "The resource operation completed with terminal provisioning state 'Failed'.",
		"details": [
				"code": "VMExtensionProvisioningError",
				"target": "0",		"message": "VM has reported a failure when processing extension 'vmssCSE' (publisher 'Microsoft.Azure.Extensions' and type 'CustomScript'). Error message: \"Enable failed: failed to execute command: command terminated with exit status=124\\\\n[stdout]\\\\n{ \"ExitCode\": \"124\", \"Output\": \" 443\\\\\\\\n+ '[' 62 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz 443\\\\\\\\n+ '[' 63 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz 443\\\\\\\\n+ '[' 64 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz 443\\\\\\\\n+ '[' 65 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz 443\\\\\\\\n+ '[' 66 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz 443\\\\\\\\n+ '[' 67 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz 443\\\\\\\\n+ '[' 68 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz 443\\\\\\\\n+ '[' 69 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz 443\\\\\\\\n+ '[' 70 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz 443\\\\\\\\n+ '[' 71 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz 443\\\\\\\\n+ '[' 72 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz 443\\\\\\\\n+ '[' 73 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz 443\\\\\\\\n+ '[' 74 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz 443\\\\\\\\n+ '[' 75 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz 443\\\\\\\\n+ '[' 76 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz 443\\\\\\\\n+ '[' 77 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz 443\\\\\\\\n+ '[' 78 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz 443\\\\\\\\n+ '[' 79 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz 443\\\\\\\\n+ '[' 80 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz 443\\\\\\\\n+ '[' 81 -eq 100 ']'\\\\\\\\n+ sleep 1\\\\\\\\n+ for i in $(seq 1 $retries)\\\\\\\\n+ timeout 10 nc -vz 443\", \"Error\": \"\", \"ExecDuration\": \"900\", \"KernelStartTime\": \"Wed 2023-10-25 16:32:41 UTC\", \"CloudInitLocalStartTime\": \"Wed 2023-10-25 16:32:45 UTC\", \"CloudInitStartTime\": \"Wed 2023-10-25 16:32:48 UTC\", \"CloudFinalStartTime\": \"Wed 2023-10-25 16:32:56 UTC\", \"NetworkdStartTime\": \"Wed 2023-10-25 16:32:46 UTC\", \"CSEStartTime\": \"Wed Oct 25 16:33:02 UTC 2023\", \"GuestAgentStartTime\": \"Wed 2023-10-25 16:32:55 UTC\", \"SystemdSummary\": \"Startup finished in 2.546s (kernel) + 1min 30.260s (userspace) = 1min 32.807s \\\\\\\\ reached after 12.209s in userspace\", \"BootDatapoints\": { \"KernelStartTime\": \"Wed 2023-10-25 16:32:41 UTC\", \"CSEStartTime\": \"Wed Oct 25 16:33:02 UTC 2023\", \"GuestAgentStartTime\": \"Wed 2023-10-25 16:32:55 UTC\", \"KubeletStartTime\": \"Wed 2023-10-25 16:33:05 UTC\" } }\\\\n\\\\n[stderr]\\\\n\". More information on troubleshooting is available at "

The same issue happening when I want to create the cluster manually without terraform code.

Some words about the infra:

  • Terraform uses userDefinedRouting network outbound setup, while the manual one uses Load Balancer
  • The default outbound route is going through an Azure Firewall, where all traffic is allowed to internet direction. During the deployment a lot of traffic is visible on the Azure Firewall's log from AKS nodes.
  • DNS name resolution is working fine. Tested from a linux machine created manually for testing purposes into the same vnet where AKS is deployed in.

Anyone any idea?



Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
1,931 questions
{count} votes

2 answers

Sort by: Most helpful
  1. AlaaBarqawi_MSFT 942 Reputation points Microsoft Employee

    Hi @Gabor Varga

    it seems there is connectivity issue from worker node to outbound that prevent the VMs to get provisioned

    do you have rout table + firewall in the AKS node subnet ?

    can you run this command from cloud shell ?

    # Get the VMSS instance IDs.
    az vmss list-instances --resource-group <mc-resource-group-name> \
        --name <vmss-name> \
        --output table
    # Use an instance ID to test outbound connectivity.
    az vmss run-command invoke --resource-group <mc-resource-group-name> \
        --name <vmss-name> \
        --command-id RunShellScript \
        --instance-id <vmss-instance-id> \
        --output json \
        --scripts "nc -vz 443"

    and send the results

    refer to :

    0 comments No comments

  2. Gabor Varga 26 Reputation points


    Meanwhile the issue was found.

    Root cause: I added the following route in the subnet's route table:

    • Address prefix: vnet address space
    • Next hop type: Virtual Network

    This routing setup caused some problem because if this route is configured, then the nodes cannot connect to the cluster api server.

    The default route ( is pointing to the Azure firewall directly.

    DNS server is also the Azure firewall.

    I added the route mentioned above to all vnet-internal traffic bypass the firewall. But it seems it causes issues to accessing to api server.

    All other services works fine with this setup, except AKS api server.