Hello @Jake K !
This detail :
Reason: OOMKilled
Points to Memory
This typically occurs when the container exceeds the memory limits allocated to it and the operating system kills the process to free up resources.
If increasing the memory allocation for the container is not sufficient, you might need to consider scaling up the resources for your Azure Arc cluster. This could involve adding more nodes to the cluster or using nodes with higher memory capacity.
ALSO
The error message you provided indicates that the failure occurs within the inferenceserver
container. You can try accessing the logs of that container to gather more information about the error. You can use the Kubernetes command-line tool (kubectl
) to view the logs of the container. Run the following command to view the logs of the inferenceserver
container:
kubectl logs <inferenceserver-pod-name> -c inferenceserver
Replace <inferenceserver-pod-name>
with the actual name of the pod running the inferenceserver
container.
The container logs may provide more detailed error messages or stack traces that can help pinpoint the cause of the failure.
I hope this helps!
Kindly mark the answer as Accepted and Upvote in case it helped!
Regards