Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Symptoms
The akshci-telemetry pod in a AKS Arc cluster can over time consume a lot of CPU and memory resources. If metrics are enabled, you can verify the CPU and memory usage using the following kubectl
command:
kubectl -n kube-system top pod -l app=akshci-telemetry
You might see an output similar to this:
NAME CPU(cores) MEMORY(bytes)
akshci-telemetry-5df56fd5-rjqk4 996m 152Mi
Mitigation
To resolve this issue, set default resource limits for the pods in the kube-system
namespace.
Important notes
- Verify if you have any pods in the kube-system namespace that might require more memory than the default limit setting. If so, adjustments might be needed.
- The LimitRange is applied to the namespace; in this case, the
kube-system
namespace. The default resource limits also apply to new pods that don't specify their own limits. - Existing pods, including those that already have resource limits, aren't affected.
- New pods that don't specify their own resource limits are constrained by the limits set in the next section.
- After you set the resource limits and delete the telemetry pod, the new pod might eventually hit the memory limit and generate OOM (Out-Of-Memory) errors. This is a temporary mitigation.
To proceed with setting the resource limits, you can run the following script. While the script uses az aksarc get-credentials
, you can also use az connectedk8s proxy
to get the proxy kubeconfig and access the Kubernetes cluster.
Define the LimitRange YAML to set default CPU and memory limits
# Set the $cluster_name and $resource_group of the aksarc cluster
$cluster_name = ""
$resource_group = ""
# Connect to the aksarc cluster
az aksarc get-credentials -n $cluster_name -g $resource_group --admin -f "./kubeconfig-$cluster_name"
$limitRangeYaml = @'
apiVersion: v1
kind: LimitRange
metadata:
name: cpu-mem-resource-constraint
namespace: kube-system
spec:
limits:
- default: # this section defines default limits for containers that haven't specified any limits
cpu: 250m
memory: 250Mi
defaultRequest: # this section defines default requests for containers that haven't specified any requests
cpu: 10m
memory: 20Mi
type: Container
'@
$limitRangeYaml | kubectl apply --kubeconfig "./kubeconfig-$cluster_name" -f -
kubectl get pods -l app=akshci-telemetry -n kube-system --kubeconfig "./kubeconfig-$cluster_name"
kubectl delete pods -l app=akshci-telemetry -n kube-system --kubeconfig "./kubeconfig-$cluster_name"
sleep 5
kubectl get pods -l app=akshci-telemetry -n kube-system --kubeconfig "./kubeconfig-$cluster_name"
Validate if the resource limits were applied correctly
Check the resource limits in the pod's YAML configuration:
kubectl get pods -l app=akshci-telemetry -n kube-system --kubeconfig "./kubeconfig-$cluster_name" -o yaml
In the output, verify that the
resources
section includes the limits:resources: limits: cpu: 250m memory: 250Mi requests: cpu: 10m memory: 20Mi