My WSL2 instance disappears within seconds of launching a GPU-based LLM program in python
After over a year of problem free GPU training in WSL2, now when I launch my GPU-based python scripts, the WSL instance dies within a few seconds, and seems to leave no events or logs behind as it dies...
I have reinstalled a brand new Ubuntu instance with only the minimal installs required to faciliate my GPU-based LLM use case, and this instance dies in exactly the same way as my original instance did.
Unless I can resolve this, I will need to switch to running Linux bare metal rather than Windows/WSL2... I contacted MSFT support, demonstrated the issue, and was reommended to ask a question here as they were unable to help.
I am running a slightly modified version of FastChat (https://github.com/lm-sys/FastChat)"github.com") using a fine-tuned model based on Facebooks LLama2 7B state model, and simply running the model with test prompts. This causes the WSL2 instance to die dependably every time.