Program on VM automatically crash after long idle

Le Nguyen Minh Huy 1 Reputation point
2021-03-16T05:38:09.03+00:00

Hi,

I am training machine learning model on Azure VM with NC6 promo GPU. Everything was fine at the beginning, but after a while I went back to check and realized my training program was stopped. Also, I got this message "client_loop: send disconnect: Broken pipe". Is there any solution for this problem since it cost me a lot of time and money.

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,075 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Ramr-msft 17,741 Reputation points
    2021-03-16T11:31:53.397+00:00

    @Le Nguyen Minh Huy Thanks for the question. We have forwarded to the product team to check on this issue. You can try the following.

    The sshd/server settings in /etc/ssh/sshd_config :
    TCPKeepAlive yes
    ClientAliveInterval 60
    ClientAliveCountMax 40000

    AND

    the ssh/client setting in ~/.ssh/config :
    ServerAliveInterval 60

    We would recommend to raise a Azure support desk ticket from Help+Support blade from Azure portal for your service resource. This will help you to share the details securely and work with an engineer who can provide more insights about the issue that if it can be replicated.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.