Long run of a Python code on Azure Linux VM

Alfons Nonell-Canals 0 Reputation points
2024-12-05T06:56:33.8866667+00:00

We have to run our own Python code which requires a very long execution time (days). We are trying to run it in and Azure Linux VM (Ubuntu LTS based).

When we run short tests, it works perfect but it fails when we set it up in production (long run).

We connect via ssh and we execute the code using nohup (to keep it working when the ssh connection is lost). After some time, we see the execution is killed with any error, it simply stops working. We also tried with screen command but it is also killed.

It seems all running jobs are killed some minutes after disconnect ssh (or ssh timeout).

We read there are some kind of timeouts for jobs to be killed minutes later the ssh connection is closed but we are not able to change it.

We run this kind of jobs in other machines from several years ago and, usually, a nohup or creen is enough.

Do you have any suggestion? We know there is an option to send the jobs from the Azure Web portal or client but we prefer to avoid it.

Thank you!

Azure Virtual Machines
Azure Virtual Machines
An Azure service that is used to provision Windows and Linux virtual machines.
9,013 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Prrudram-MSFT 28,201 Reputation points Moderator
    2024-12-05T10:30:06.9733333+00:00

    Hello @Alfons Nonell-Canals

    Thank you for reaching out with your query on Microsoft Learn Q&A. Happy to help

    First, you can try using tmux or screen to keep your sessions alive even after disconnecting from SSH. While you mentioned that screen didn't work, you might want to try tmux as an alternative. Both tools are designed to keep sessions alive even after disconnecting from SSH. Ensure that you properly detach the session before disconnecting.

    Next, you can modify the SSH configuration to prevent it from timing out. Edit the sshd_config available at etc/ ssh on your VM and set the following parameters:

    ClientAliveInterval 120
    ClientAliveCountMax 720
    

    This configuration will keep the SSH session alive for 24 hours. After making these changes, restart the SSH service with sudo systemctl restart sshd.

    Additionally, ensure that you are using nohup correctly by appending & at the end of your command to run it in the background:

    nohup python your_script.py &
    

    Sometimes, the system might be killing your process due to resource limits. You can check on that too.

    If the above methods don't work, consider using Azure Batch or Azure Kubernetes Service (AKS) for running long-running jobs. These services are designed to handle long-running and resource-intensive tasks

    I hope these suggestions help you keep your Python code running smoothly on your Azure Linux VM. If you have any further questions or need more assistance, feel free to ask!

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.