Capture a Windows container dump file from a Windows node in an AKS cluster

If a Windows container fails on a Microsoft Azure Kubernetes Service (AKS) cluster, you might have to examine the Windows container dump file to investigate the root cause. This article provides steps to capture a Windows container dump file from a Windows node in an AKS cluster. It also includes instructions to download the dump file to your local computer for further analysis.

Prerequisites

  • An AKS cluster. If you don't have an AKS cluster, create one by using Azure CLI or through the Azure portal.

  • Windows agent pools that are created after 3/13/2024 or a node image that was upgraded to AKS Windows image version 20240316 or a later version. Alternatively, verify whether the WindowsCSEScriptsPackage version is v0.0.39 or newer, which can be located in C:\AzureData\CustomDataSetupScript.log on the Windows nodes.

Step 1: Add annotations metadata to your deployment

Mount a host folder in the container, and add the annotations metadata in order to request that the Windows container store the dump file in a designated folder:

metadata:
  ...
  annotations:
    "io.microsoft.container.processdumplocation": "C:\\CrashDumps\\{container_id}"
    "io.microsoft.wcow.processdumptype": "mini"
    "io.microsoft.wcow.processdumpcount": "10"
spec:
  ...
  containers:
  - name: containername
    image: ...
    ...
    volumeMounts:
      - mountPath: C:\CrashDumps
        name: local-dumps
  volumes:
  - name: local-dumps
    hostPath:
      path: C:\k\containerdumps
      type: DirectoryOrCreate

Step 2: Reproduce the issue

Redeploy your deployment, and wait for the Windows container to fail. You can use kubectl describe pod -n [POD-NAMESPACE] [POD-NAME] to learn which AKS Windows node is hosting the pod.

Step 3: Connect to the Windows node

Establish a connection to the AKS cluster node. You authenticate either by using a Secure Shell (SSH) key or the Windows admin password in a Remote Desktop Protocol (RDP) connection. Both methods require that you create an intermediate connection. This is because you can't currently connect directly to the AKS Windows node. Whether you connect to a node through SSH or RDP, you have to specify the user name for the AKS nodes. By default, this user name is azureuser.

If you have an SSH key, create an SSH connection to the Windows node. The SSH key doesn't persist on your AKS nodes. The SSH key reverts to what was initially installed on the cluster during any of the following actions:

  • Restart
  • Version upgrade
  • Node image upgrade

Step 4: Transfer the dump file locally

After the container fails, identify the helper pod so that you can copy the dump file locally. Open a second console, and then get a list of pods by running the kubectl get pods command, as follows:

kubectl get pods
NAME                                                    READY   STATUS    RESTARTS   AGE
azure-vote-back-6c4dd64bdf-m4nk7                        1/1     Running   2          3d21h
azure-vote-front-85b4df594d-jhpzw                       1/1     Running   2          3d21h
node-debugger-aks-nodepool1-38878740-vmss000000-6ztp6   1/1     Running   0          3m58s

The helper pod has a prefix of node-debugger-aks, as shown in the third row. Replace the pod name, and then run the following Secure Copy (scp) commands to retrieve the dump files (.dmp) that are saved when the container fails:

scp -o 'ProxyCommand ssh -p 2022 -W %h:%p azureuser@127.0.0.1' azureuser@10.240.0.97:/C:/k/containerdumps/{container_id}/{application}.dmp .

You can list the C:\k\containerdumps folder to find the full path of the dump files after the connection is made to the Windows node.

Contact us for help

If you have questions or need help, create a support request, or ask Azure community support. You can also submit product feedback to Azure feedback community.