Quickstart: Create an Azure Kubernetes Service (AKS) cluster using Terraform

Azure Kubernetes Service (AKS) is a managed Kubernetes service that lets you quickly deploy and manage clusters. In this quickstart, you:

  • Deploy an AKS cluster using Terraform.
  • Run a sample multi-container application with a group of microservices and web front ends simulating a retail scenario.

Note

This sample application is just for demo purposes and doesn't represent all the best practices for Kubernetes applications.

Screenshot of browsing to Azure Store sample application.

Before you begin

Note

The Azure Linux node pool is now generally available (GA). To learn about the benefits and deployment steps, see the Introduction to the Azure Linux Container Host for AKS.

Login to your Azure Account

Terraform and Azure authentication scenarios

Terraform only supports authenticating to Azure via the Azure CLI. Authenticating using Azure PowerShell isn't supported. Therefore, while you can use the Azure PowerShell module when doing your Terraform work, you first need to authenticate to Azure using the Azure CLI.

This article explains how to authenticate Terraform to Azure for the following scenarios. For more information about options to authenticate Terraform to Azure, see Authenticating using the Azure CLI.

Authenticate to Azure via a Microsoft account

A Microsoft account is a username (associated with an email and its credentials) that is used to sign in to Microsoft services - such as Azure. A Microsoft account can be associated with one or more Azure subscriptions, with one of those subscriptions being the default.

The following steps show you how:

  • Sign in to Azure interactively using a Microsoft account
  • List the account's associated Azure subscriptions (including the default)
  • Set the current subscription.
  1. Open a command line that has access to the Azure CLI.

  2. Run az login without any parameters and follow the instructions to sign in to Azure.

    az login
    

    Key points:

    • Upon successful sign in, az login displays a list of the Azure subscriptions associated with the logged-in Microsoft account, including the default subscription.
  3. To confirm the current Azure subscription, run az account show.

    az account show
    
  4. To view all the Azure subscription names and IDs for a specific Microsoft account, run az account list.

    az account list --query "[?user.name=='<microsoft_account_email>'].{Name:name, ID:id, Default:isDefault}" --output Table
    

    Key points:

    • Replace the <microsoft_account_email> placeholder with the Microsoft account email address whose Azure subscriptions you want to list.
    • With a Live account - such as a Hotmail or Outlook - you might need to specify the fully qualified email address. For example, if your email address is admin@hotmail.com, you might need to replace the placeholder with live.com#admin@hotmail.com.
  5. To use a specific Azure subscription, run az account set.

    az account set --subscription "<subscription_id_or_subscription_name>"
    

    Key points:

    • Replace the <subscription_id_or_subscription_name> placeholder with the ID or name of the subscription you want to use.
    • Calling az account set doesn't display the results of switching to the specified Azure subscription. However, you can use az account show to confirm that the current Azure subscription has changed.
    • If you run the az account list command from the previous step, you see that the default Azure subscription has changed to the subscription you specified with az account set.

Create a service principal

Automated tools that deploy or use Azure services - such as Terraform - should always have restricted permissions. Instead of having applications sign in as a fully privileged user, Azure offers service principals.

The most common pattern is to interactively sign in to Azure, create a service principal, test the service principal, and then use that service principal for future authentication (either interactively or from your scripts).

  1. To create a service principal, sign in to Azure. After authenticating to Azure via a Microsoft account, return here.

  2. If you're creating a service principal from Git Bash, set the MSYS_NO_PATHCONV environment variable. (This step isn't necessary if you're using Cloud Shell.)

    export MSYS_NO_PATHCONV=1    
    

    Key points:

    • You can set the MSYS_NO_PATHCONV environment variable globally (for all terminal sessions) or locally (for just the current session). As creating a service principal isn't something you do often, the sample sets the value for the current session. To set this environment variable globally, add the setting to the ~/.bashrc file.
  3. To create a service principal, run az ad sp create-for-rbac.

    az ad sp create-for-rbac --name <service_principal_name> --role Contributor --scopes /subscriptions/<subscription_id>
    

    Key points:

    • You can replace the <service-principal-name> with a custom name for your environment or omit the parameter entirely. If you omit the parameter, the service principal name is generated based on the current date and time.
    • Upon successful completion, az ad sp create-for-rbac displays several values. The appId, password, and tenant values are used in the next step.
    • The password can't be retrieved if lost. As such, you should store your password in a safe place. If you forget your password, you can reset the service principal credentials.
    • For this article, a service principal with a Contributor role is being used. For more information about Role-Based Access Control (RBAC) roles, see RBAC: Built-in roles.
    • The output from creating the service principal includes sensitive credentials. Be sure that you don't include these credentials in your code or check the credentials into your source control.
    • For more information about options when creating a service principal with the Azure CLI, see the article Create an Azure service principal with the Azure CLI.

Specify service principal credentials in environment variables

Once you create a service principal, you can specify its credentials to Terraform via environment variables.

  1. Edit the ~/.bashrc file by adding the following environment variables.

    export ARM_SUBSCRIPTION_ID="<azure_subscription_id>"
    export ARM_TENANT_ID="<azure_subscription_tenant_id>"
    export ARM_CLIENT_ID="<service_principal_appid>"
    export ARM_CLIENT_SECRET="<service_principal_password>"
    
  2. To execute the ~/.bashrc script, run source ~/.bashrc (or its abbreviated equivalent . ~/.bashrc). You can also exit and reopen Cloud Shell for the script to run automatically.

    . ~/.bashrc
    
  3. Once the environment variables have been set, you can verify their values as follows:

    printenv | grep ^ARM*
    

Key points:

  • As with any environment variable, to access an Azure subscription value from within a Terraform script, use the following syntax: ${env.<environment_variable>}. For example, to access the ARM_SUBSCRIPTION_ID value, specify ${env.ARM_SUBSCRIPTION_ID}.
  • Creating and applying Terraform execution plans makes changes on the Azure subscription associated with the service principal. This fact can sometimes be confusing if you're logged into one Azure subscription and the environment variables point to a second Azure subscription. Let's look at the following example to explain. Let's say you have two Azure subscriptions: SubA and SubB. If the current Azure subscription is SubA (determined via az account show) while the environment variables point to SubB, any changes made by Terraform are on SubB. Therefore, you would need to log in to your SubB subscription to run Azure CLI commands or Azure PowerShell commands to view your changes.

Specify service principal credentials in a Terraform provider block

The Azure provider block defines syntax that allows you to specify your Azure subscription's authentication information.

terraform {
  required_providers {
    azurerm = {
      source = "hashicorp/azurerm"
      version = "~>2.0"
    }
  }
}

provider "azurerm" {
  features {}

  subscription_id   = "<azure_subscription_id>"
  tenant_id         = "<azure_subscription_tenant_id>"
  client_id         = "<service_principal_appid>"
  client_secret     = "<service_principal_password>"
}

# Your code goes here

Caution

The ability to specify your Azure subscription credentials in a Terraform configuration file can be convenient - especially when testing. However, it isn't advisable to store credentials in a clear-text file that can be viewed by non-trusted individuals.

Implement the Terraform code

Note

The sample code for this article is located in the Azure Terraform GitHub repo. You can view the log file containing the test results from current and previous versions of Terraform.

See more articles and sample code showing how to use Terraform to manage Azure resources

  1. Create a directory you can use to test the sample Terraform code and make it your current directory.

  2. Create a file named providers.tf and insert the following code:

    terraform {
      required_version = ">=1.0"
    
      required_providers {
        azapi = {
          source  = "azure/azapi"
          version = "~>1.5"
        }
        azurerm = {
          source  = "hashicorp/azurerm"
          version = "~>3.0"
        }
        random = {
          source  = "hashicorp/random"
          version = "~>3.0"
        }
        time = {
          source  = "hashicorp/time"
          version = "0.9.1"
        }
      }
    }
    
    provider "azurerm" {
      features {}
    }
    
  3. Create a file named ssh.tf and insert the following code:

    resource "random_pet" "ssh_key_name" {
      prefix    = "ssh"
      separator = ""
    }
    
    resource "azapi_resource_action" "ssh_public_key_gen" {
      type        = "Microsoft.Compute/sshPublicKeys@2022-11-01"
      resource_id = azapi_resource.ssh_public_key.id
      action      = "generateKeyPair"
      method      = "POST"
    
      response_export_values = ["publicKey", "privateKey"]
    }
    
    resource "azapi_resource" "ssh_public_key" {
      type      = "Microsoft.Compute/sshPublicKeys@2022-11-01"
      name      = random_pet.ssh_key_name.id
      location  = azurerm_resource_group.rg.location
      parent_id = azurerm_resource_group.rg.id
    }
    
    output "key_data" {
      value = jsondecode(azapi_resource_action.ssh_public_key_gen.output).publicKey
    }
    
  4. Create a file named main.tf and insert the following code:

    # Generate random resource group name
    resource "random_pet" "rg_name" {
      prefix = var.resource_group_name_prefix
    }
    
    resource "azurerm_resource_group" "rg" {
      location = var.resource_group_location
      name     = random_pet.rg_name.id
    }
    
    resource "random_pet" "azurerm_kubernetes_cluster_name" {
      prefix = "cluster"
    }
    
    resource "random_pet" "azurerm_kubernetes_cluster_dns_prefix" {
      prefix = "dns"
    }
    
    resource "azurerm_kubernetes_cluster" "k8s" {
      location            = azurerm_resource_group.rg.location
      name                = random_pet.azurerm_kubernetes_cluster_name.id
      resource_group_name = azurerm_resource_group.rg.name
      dns_prefix          = random_pet.azurerm_kubernetes_cluster_dns_prefix.id
    
      identity {
        type = "SystemAssigned"
      }
    
      default_node_pool {
        name       = "agentpool"
        vm_size    = "Standard_D2_v2"
        node_count = var.node_count
      }
      linux_profile {
        admin_username = var.username
    
        ssh_key {
          key_data = jsondecode(azapi_resource_action.ssh_public_key_gen.output).publicKey
        }
      }
      network_profile {
        network_plugin    = "kubenet"
        load_balancer_sku = "standard"
      }
    }
    
  5. Create a file named variables.tf and insert the following code:

    variable "resource_group_location" {
      type        = string
      default     = "eastus"
      description = "Location of the resource group."
    }
    
    variable "resource_group_name_prefix" {
      type        = string
      default     = "rg"
      description = "Prefix of the resource group name that's combined with a random ID so name is unique in your Azure subscription."
    }
    
    variable "node_count" {
      type        = number
      description = "The initial quantity of nodes for the node pool."
      default     = 3
    }
    
    variable "msi_id" {
      type        = string
      description = "The Managed Service Identity ID. Set this value if you're running this example using Managed Identity as the authentication method."
      default     = null
    }
    
    variable "username" {
      type        = string
      description = "The admin username for the new cluster."
      default     = "azureadmin"
    }
    
  6. Create a file named outputs.tf and insert the following code:

    output "resource_group_name" {
      value = azurerm_resource_group.rg.name
    }
    
    output "kubernetes_cluster_name" {
      value = azurerm_kubernetes_cluster.k8s.name
    }
    
    output "client_certificate" {
      value     = azurerm_kubernetes_cluster.k8s.kube_config[0].client_certificate
      sensitive = true
    }
    
    output "client_key" {
      value     = azurerm_kubernetes_cluster.k8s.kube_config[0].client_key
      sensitive = true
    }
    
    output "cluster_ca_certificate" {
      value     = azurerm_kubernetes_cluster.k8s.kube_config[0].cluster_ca_certificate
      sensitive = true
    }
    
    output "cluster_password" {
      value     = azurerm_kubernetes_cluster.k8s.kube_config[0].password
      sensitive = true
    }
    
    output "cluster_username" {
      value     = azurerm_kubernetes_cluster.k8s.kube_config[0].username
      sensitive = true
    }
    
    output "host" {
      value     = azurerm_kubernetes_cluster.k8s.kube_config[0].host
      sensitive = true
    }
    
    output "kube_config" {
      value     = azurerm_kubernetes_cluster.k8s.kube_config_raw
      sensitive = true
    }
    

Initialize Terraform

Run terraform init to initialize the Terraform deployment. This command downloads the Azure provider required to manage your Azure resources.

terraform init -upgrade

Key points:

  • The -upgrade parameter upgrades the necessary provider plugins to the newest version that complies with the configuration's version constraints.

Create a Terraform execution plan

Run terraform plan to create an execution plan.

terraform plan -out main.tfplan

Key points:

  • The terraform plan command creates an execution plan, but doesn't execute it. Instead, it determines what actions are necessary to create the configuration specified in your configuration files. This pattern allows you to verify whether the execution plan matches your expectations before making any changes to actual resources.
  • The optional -out parameter allows you to specify an output file for the plan. Using the -out parameter ensures that the plan you reviewed is exactly what is applied.
  • To read more about persisting execution plans and security, see the security warning section.

Apply a Terraform execution plan

Run terraform apply to apply the execution plan to your cloud infrastructure.

terraform apply main.tfplan

Key points:

  • The example terraform apply command assumes you previously ran terraform plan -out main.tfplan.
  • If you specified a different filename for the -out parameter, use that same filename in the call to terraform apply.
  • If you didn't use the -out parameter, call terraform apply without any parameters.

Verify the results

  1. Get the Azure resource group name using the following command.

    resource_group_name=$(terraform output -raw resource_group_name)
    
  2. Display the name of your new Kubernetes cluster using the az aks list command.

    az aks list \
      --resource-group $resource_group_name \
      --query "[].{\"K8s cluster name\":name}" \
      --output table
    
  3. Get the Kubernetes configuration from the Terraform state and store it in a file that kubectl can read using the following command.

    echo "$(terraform output kube_config)" > ./azurek8s
    
  4. Verify the previous command didn't add an ASCII EOT character using the following command.

    cat ./azurek8s
    

    Key points:

    • If you see << EOT at the beginning and EOT at the end, remove these characters from the file. Otherwise, you may receive the following error message: error: error loading config file "./azurek8s": yaml: line 2: mapping values are not allowed in this context
  5. Set an environment variable so kubectl can pick up the correct config using the following command.

    export KUBECONFIG=./azurek8s
    
  6. Verify the health of the cluster using the kubectl get nodes command.

    kubectl get nodes
    

Key points:

  • When you created the AKS cluster, monitoring was enabled to capture health metrics for both the cluster nodes and pods. These health metrics are available in the Azure portal. For more information on container health monitoring, see Monitor Azure Kubernetes Service health.
  • Several key values classified as output when you applied the Terraform execution plan. For example, the host address, AKS cluster user name, and AKS cluster password are output.

Deploy the application

To deploy the application, you use a manifest file to create all the objects required to run the AKS Store application. A Kubernetes manifest file defines a cluster's desired state, such as which container images to run. The manifest includes the following Kubernetes deployments and services:

Screenshot of Azure Store sample architecture.

  • Store front: Web application for customers to view products and place orders.
  • Product service: Shows product information.
  • Order service: Places orders.
  • Rabbit MQ: Message queue for an order queue.

Note

We don't recommend running stateful containers, such as Rabbit MQ, without persistent storage for production. These are used here for simplicity, but we recommend using managed services, such as Azure CosmosDB or Azure Service Bus.

  1. Create a file named aks-store-quickstart.yaml and copy in the following manifest:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: rabbitmq
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: rabbitmq
      template:
        metadata:
          labels:
            app: rabbitmq
        spec:
          nodeSelector:
            "kubernetes.io/os": linux
          containers:
          - name: rabbitmq
            image: mcr.microsoft.com/mirror/docker/library/rabbitmq:3.10-management-alpine
            ports:
            - containerPort: 5672
              name: rabbitmq-amqp
            - containerPort: 15672
              name: rabbitmq-http
            env:
            - name: RABBITMQ_DEFAULT_USER
              value: "username"
            - name: RABBITMQ_DEFAULT_PASS
              value: "password"
            resources:
              requests:
                cpu: 10m
                memory: 128Mi
              limits:
                cpu: 250m
                memory: 256Mi
            volumeMounts:
            - name: rabbitmq-enabled-plugins
              mountPath: /etc/rabbitmq/enabled_plugins
              subPath: enabled_plugins
          volumes:
          - name: rabbitmq-enabled-plugins
            configMap:
              name: rabbitmq-enabled-plugins
              items:
              - key: rabbitmq_enabled_plugins
                path: enabled_plugins
    ---
    apiVersion: v1
    data:
      rabbitmq_enabled_plugins: |
        [rabbitmq_management,rabbitmq_prometheus,rabbitmq_amqp1_0].
    kind: ConfigMap
    metadata:
      name: rabbitmq-enabled-plugins            
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: rabbitmq
    spec:
      selector:
        app: rabbitmq
      ports:
        - name: rabbitmq-amqp
          port: 5672
          targetPort: 5672
        - name: rabbitmq-http
          port: 15672
          targetPort: 15672
      type: ClusterIP
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: order-service
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: order-service
      template:
        metadata:
          labels:
            app: order-service
        spec:
          nodeSelector:
            "kubernetes.io/os": linux
          containers:
          - name: order-service
            image: ghcr.io/azure-samples/aks-store-demo/order-service:latest
            ports:
            - containerPort: 3000
            env:
            - name: ORDER_QUEUE_HOSTNAME
              value: "rabbitmq"
            - name: ORDER_QUEUE_PORT
              value: "5672"
            - name: ORDER_QUEUE_USERNAME
              value: "username"
            - name: ORDER_QUEUE_PASSWORD
              value: "password"
            - name: ORDER_QUEUE_NAME
              value: "orders"
            - name: FASTIFY_ADDRESS
              value: "0.0.0.0"
            resources:
              requests:
                cpu: 1m
                memory: 50Mi
              limits:
                cpu: 75m
                memory: 128Mi
          initContainers:
          - name: wait-for-rabbitmq
            image: busybox
            command: ['sh', '-c', 'until nc -zv rabbitmq 5672; do echo waiting for rabbitmq; sleep 2; done;']
            resources:
              requests:
                cpu: 1m
                memory: 50Mi
              limits:
                cpu: 75m
                memory: 128Mi    
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: order-service
    spec:
      type: ClusterIP
      ports:
      - name: http
        port: 3000
        targetPort: 3000
      selector:
        app: order-service
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: product-service
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: product-service
      template:
        metadata:
          labels:
            app: product-service
        spec:
          nodeSelector:
            "kubernetes.io/os": linux
          containers:
          - name: product-service
            image: ghcr.io/azure-samples/aks-store-demo/product-service:latest
            ports:
            - containerPort: 3002
            resources:
              requests:
                cpu: 1m
                memory: 1Mi
              limits:
                cpu: 1m
                memory: 7Mi
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: product-service
    spec:
      type: ClusterIP
      ports:
      - name: http
        port: 3002
        targetPort: 3002
      selector:
        app: product-service
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: store-front
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: store-front
      template:
        metadata:
          labels:
            app: store-front
        spec:
          nodeSelector:
            "kubernetes.io/os": linux
          containers:
          - name: store-front
            image: ghcr.io/azure-samples/aks-store-demo/store-front:latest
            ports:
            - containerPort: 8080
              name: store-front
            env: 
            - name: VUE_APP_ORDER_SERVICE_URL
              value: "http://order-service:3000/"
            - name: VUE_APP_PRODUCT_SERVICE_URL
              value: "http://product-service:3002/"
            resources:
              requests:
                cpu: 1m
                memory: 200Mi
              limits:
                cpu: 1000m
                memory: 512Mi
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: store-front
    spec:
      ports:
      - port: 80
        targetPort: 8080
      selector:
        app: store-front
      type: LoadBalancer
    

    For a breakdown of YAML manifest files, see Deployments and YAML manifests.

  2. Deploy the application using the kubectl apply command and specify the name of your YAML manifest.

    kubectl apply -f aks-store-quickstart.yaml
    

    The following example output shows the deployments and services:

    deployment.apps/rabbitmq created
    service/rabbitmq created
    deployment.apps/order-service created
    service/order-service created
    deployment.apps/product-service created
    service/product-service created
    deployment.apps/store-front created
    service/store-front created
    

Test the application

When the application runs, a Kubernetes service exposes the application front end to the internet. This process can take a few minutes to complete.

  1. Check the status of the deployed pods using the kubectl get pods command. Make all pods are Running before proceeding.

  2. Check for a public IP address for the store-front application. Monitor progress using the kubectl get service command with the --watch argument.

    kubectl get service store-front --watch
    

    The EXTERNAL-IP output for the store-front service initially shows as pending:

    NAME          TYPE           CLUSTER-IP    EXTERNAL-IP   PORT(S)        AGE
    store-front   LoadBalancer   10.0.100.10   <pending>     80:30025/TCP   4h4m
    
  3. Once the EXTERNAL-IP address changes from pending to an actual public IP address, use CTRL-C to stop the kubectl watch process.

    The following example output shows a valid public IP address assigned to the service:

    NAME          TYPE           CLUSTER-IP    EXTERNAL-IP    PORT(S)        AGE
    store-front   LoadBalancer   10.0.100.10   20.62.159.19   80:30025/TCP   4h5m
    
  4. Open a web browser to the external IP address of your service to see the Azure Store app in action.

    Screenshot of AKS Store sample application.

Clean up resources

Delete AKS resources

When you no longer need the resources created via Terraform, do the following steps:

  1. Run terraform plan and specify the destroy flag.

    terraform plan -destroy -out main.destroy.tfplan
    

    Key points:

    • The terraform plan command creates an execution plan, but doesn't execute it. Instead, it determines what actions are necessary to create the configuration specified in your configuration files. This pattern allows you to verify whether the execution plan matches your expectations before making any changes to actual resources.
    • The optional -out parameter allows you to specify an output file for the plan. Using the -out parameter ensures that the plan you reviewed is exactly what is applied.
    • To read more about persisting execution plans and security, see the security warning section.
  2. Run terraform apply to apply the execution plan.

    terraform apply main.destroy.tfplan
    

Delete service principal

  1. Get the service principal ID using the following command.

    sp=$(terraform output -raw sp)
    
  2. Delete the service principal using the az ad sp delete command.

    az ad sp delete --id $sp
    

Troubleshoot Terraform on Azure

Troubleshoot common problems when using Terraform on Azure.

Next steps