Create a chaos experiment that uses a Chaos Mesh fault to kill AKS pods with the Azure portal
You can use a chaos experiment to verify that your application is resilient to failures by causing those failures in a controlled environment. In this guide, you will cause periodic Azure Kubernetes Service pod failures on a namespace using a chaos experiment and Azure Chaos Studio. Running this experiment can help you defend against service unavailability when there are sporadic failures.
Azure Chaos Studio uses Chaos Mesh, a free, open-source chaos engineering platform for Kubernetes to inject faults into an AKS cluster. Chaos Mesh faults are service-direct faults that require Chaos Mesh to be installed on the AKS cluster. These same steps can be used to set up and run an experiment for any AKS Chaos Mesh fault.
- An Azure subscription. If you don't have an Azure subscription, create an Azure free account before you begin.
- An AKS cluster with a Linux node pool. If you do not have an AKS cluster, see the AKS quickstart using the Azure CLI, using Azure PowerShell, or using the Azure portal.
AKS Chaos Mesh faults are only supported on Linux node pools.
- At present Chaos Mesh faults don’t work with private clusters.
Set up Chaos Mesh on your AKS cluster
Before you can run Chaos Mesh faults in Chaos Studio, you need to install Chaos Mesh on your AKS cluster.
- Run the following commands in an Azure Cloud Shell window where you have the active subscription set to be the subscription where your AKS cluster is deployed. Replace
$CLUSTER_NAMEwith the resource group and name of your cluster resource.
az aks get-credentials -g $RESOURCE_GROUP -n $CLUSTER_NAME
helm repo add chaos-mesh https://charts.chaos-mesh.org helm repo update kubectl create ns chaos-testing helm install chaos-mesh chaos-mesh/chaos-mesh --namespace=chaos-testing --set chaosDaemon.runtime=containerd --set chaosDaemon.socketPath=/run/containerd/containerd.sock
- Verify that the Chaos Mesh pods are installed by running the following command:
kubectl get po -n chaos-testing
You should see output similar to the following (a chaos-controller-manager and one or more chaos-daemons):
NAME READY STATUS RESTARTS AGE chaos-controller-manager-69fd5c46c8-xlqpc 1/1 Running 0 2d5h chaos-daemon-jb8xh 1/1 Running 0 2d5h chaos-dashboard-98c4c5f97-tx5ds 1/1 Running 0 2d5h
Enable Chaos Studio on your AKS cluster
Chaos Studio cannot inject faults against a resource unless that resource has been onboarded to Chaos Studio first. You onboard a resource to Chaos Studio by creating a target and capabilities on the resource. AKS clusters only have one target type (service-direct), but other resources may have up to two target types - one for service-direct faults and one for agent-based faults. Each type of Chaos Mesh fault is represented as a capability (PodChaos, NetworkChaos, IOChaos, etc.).
- Open the Azure portal.
- Search for Chaos Studio (preview) in the search bar.
- Click on Targets and navigate to your AKS cluster.
- Check the box next to your AKS cluster and click Enable targets then Enable service-direct targets from the dropdown menu.
- A notification will appear indicating that the resource(s) selected were successfully enabled.
You have now successfully onboarded your AKS cluster to Chaos Studio. In the Targets view you can also manage the capabilities enabled on this resource. Clicking the Manage actions link next to a resource will display the capabilities enabled for that resource.
Create an experiment
With your AKS cluster now onboarded, you can create your experiment. A chaos experiment defines the actions you want to take against target resources, organized into steps, which run sequentially, and branches, which run in parallel.
Click on the Experiments tab in the Chaos Studio navigation. In this view, you can see and manage all of your chaos experiments. Click on Add an experiment
Fill in the Subscription, Resource Group, and Location where you want to deploy the chaos experiment. Give your experiment a Name. Click Next : Experiment designer >
You are now in the Chaos Studio experiment designer. The experiment designer allows you to build your experiment by adding steps, branches, and faults. Give a friendly name to your Step and Branch, then click Add fault.
Select AKS Chaos Mesh Pod Chaos from the dropdown, then fill in the Duration with the number of minutes you want the failure to last and jsonSpec with the information below:
To formulate your Chaos Mesh jsonSpec:
Visit the Chaos Mesh documentation for a fault type, for example, the PodChaos type.
Formulate the YAML configuration for that fault type using the Chaos Mesh documentation.
apiVersion: chaos-mesh.org/v1alpha1 kind: PodChaos metadata: name: pod-failure-example namespace: chaos-testing spec: action: pod-failure mode: all duration: '600s' selector: namespaces: - default
Remove any YAML outside of the
spec(including the spec property name), and remove the indentation of the spec details.
action: pod-failure mode: all duration: '600s' selector: namespaces: - default
Use a YAML-to-JSON converter like this one to convert the Chaos Mesh YAML to JSON and minimize it.
Paste the minimized JSON into the jsonSpec field in the portal.
Click Next: Target resources > 5. Select your AKS cluster, and click Next 6. Verify that your experiment looks correct, then click Review + create, then Create.
Give experiment permission to your AKS cluster
When you create a chaos experiment, Chaos Studio creates a system-assigned managed identity that executes faults against your target resources. This identity must be given appropriate permissions to the target resource for the experiment to run successfully.
- Navigate to your AKS cluster and click on Access control (IAM).
- Click Add then click Add role assignment.
- Search for Azure Kubernetes Service Cluster Admin Role and select the role. Click Next
- Click Select members and search for your experiment name. Select your experiment and click Select. If there are multiple experiments in the same tenant with the same name, your experiment name will be truncated with random characters added.
- Click Review + assign then Review + assign.
Run your experiment
You are now ready to run your experiment. To see the impact, we recommend opening your AKS cluster overview and going to Insights in a separate browser tab. Live data for the Active Pod Count will show the impact of running your experiment.
- In the Experiments view, click on your experiment, and click Start, then click OK.
- When the Status changes to Running, click Details for the latest run under History to see details for the running experiment.
Now that you have run an AKS Chaos Mesh service-direct experiment, you are ready to: