Capture a Windows container dump file from a Windows node in an AKS cluster
If a Windows container fails on a Microsoft Azure Kubernetes Service (AKS) cluster, you might have to examine the Windows container dump file to investigate the root cause. This article provides steps to capture a Windows container dump file from a Windows node in an AKS cluster. It also includes instructions to download the dump file to your local computer for further analysis.
Prerequisites
An AKS cluster. If you don't have an AKS cluster, create one by using Azure CLI or through the Azure portal.
Windows agent pools that are created after
3/13/2024
or a node image that was upgraded to AKS Windows image version20240316
or a later version. Alternatively, verify whether the WindowsCSEScriptsPackage version is v0.0.39 or newer, which can be located inC:\AzureData\CustomDataSetupScript.log
on the Windows nodes.
Step 1: Add annotations metadata to your deployment
Mount a host folder in the container, and add the annotations metadata in order to request that the Windows container store the dump file in a designated folder:
metadata:
...
annotations:
"io.microsoft.container.processdumplocation": "C:\\CrashDumps\\{container_id}"
"io.microsoft.wcow.processdumptype": "mini"
"io.microsoft.wcow.processdumpcount": "10"
spec:
...
containers:
- name: containername
image: ...
...
volumeMounts:
- mountPath: C:\CrashDumps
name: local-dumps
volumes:
- name: local-dumps
hostPath:
path: C:\k\containerdumps
type: DirectoryOrCreate
Step 2: Reproduce the issue
Redeploy your deployment, and wait for the Windows container to fail. You can use kubectl describe pod -n [POD-NAMESPACE] [POD-NAME]
to learn which AKS Windows node is hosting the pod.
Step 3: Connect to the Windows node
Establish a connection to the AKS cluster node. You authenticate either by using a Secure Shell (SSH) key or the Windows admin password in a Remote Desktop Protocol (RDP) connection. Both methods require that you create an intermediate connection. This is because you can't currently connect directly to the AKS Windows node. Whether you connect to a node through SSH or RDP, you have to specify the user name for the AKS nodes. By default, this user name is azureuser
.
If you have an SSH key, create an SSH connection to the Windows node. The SSH key doesn't persist on your AKS nodes. The SSH key reverts to what was initially installed on the cluster during any of the following actions:
- Restart
- Version upgrade
- Node image upgrade
Step 4: Transfer the dump file locally
After the container fails, identify the helper pod so that you can copy the dump file locally. Open a second console, and then get a list of pods by running the kubectl get pods
command, as follows:
kubectl get pods
NAME READY STATUS RESTARTS AGE
azure-vote-back-6c4dd64bdf-m4nk7 1/1 Running 2 3d21h
azure-vote-front-85b4df594d-jhpzw 1/1 Running 2 3d21h
node-debugger-aks-nodepool1-38878740-vmss000000-6ztp6 1/1 Running 0 3m58s
The helper pod has a prefix of node-debugger-aks
, as shown in the third row. Replace the pod name, and then run the following Secure Copy (scp) commands to retrieve the dump files (.dmp) that are saved when the container fails:
scp -o 'ProxyCommand ssh -p 2022 -W %h:%p azureuser@127.0.0.1' azureuser@10.240.0.97:/C:/k/containerdumps/{container_id}/{application}.dmp .
You can list the C:\k\containerdumps
folder to find the full path of the dump files after the connection is made to the Windows node.
Contact us for help
If you have questions or need help, create a support request, or ask Azure community support. You can also submit product feedback to Azure feedback community.