Resolve batch node creation delays when restarting or reimaging

This article discusses how to resolve batch node creation delays when you restart or reimage a node. Avoid problems in Microsoft Azure Batch that are caused by installing the large Python runtime and Python packages. This installation causes long delays and possible unexpected errors when a batch node is first added to a batch pool, or when the node is restarted or reimaged.

Symptoms

When you create a batch pool and add a node to the pool, or you restart or reimage the node, application installation takes too long to finish, or it fails after a lengthy installation time.

Cause

This issue occurs because a Python package is too large to use as a start task.

When you create a batch pool in Azure Batch and add a batch node, the recommended process is to use a start task to prepare the operating environment. This start task can do the following things:

  • Install the applications that your tasks run.

  • Start background processes when the batch node is first added to the pool, or when the node is restarted or reimaged.

However, for the Python language runtime and applications that require Python to run, the package might be so large that it takes a long time for the start task to install. Even after most of the installation has occurred, the installation might still fail because of an unexpected problem.

Solution

To fix this issue, match the virtual machine (VM) and batch account locations and OS versions, and preinstall Python and its packages before you capture a Gen1 image.

Use a prepackaged custom image to allocate batch nodes. For general information about this process, see Use a managed image to create a custom image pool.

To prevent lengthy installation times and help avoid installation failure, follow these general practices:

  • Specify the same location or region to use when you create the batch account and create the VM batch node.

  • Select an image that has Gen1 in the name, such as Windows Server 2019 Datacenter - Gen1 (Windows node) or Ubuntu Server 18.04 LTS - Gen1 (Linux node). A Gen1 image is necessary because some VM families don't support Gen2 images.

  • When you create a pool of nodes in the Azure portal, make sure that the Sku list in the Operating system section contains the system version that you specified.

  • Customize your Python installation so that it gets preinstalled on the image and works for all users.

System-specific procedures

The following tabbed sections describe the steps that you have to take for a Windows batch node or a Linux batch node.

Windows node: Install the required version of Python on the C drive, and manually append the system path

Procedure summary

After you create and start the VM, connect to the VM by using Remote Desktop Protocol (RDP). Then, install everything that you need on the VM, including the required version of the Python runtime (for example, Python 3.10.4), and edit the system path. Finally, capture the VM image, deploy it to a batch pool that uses the VM image, connect to the new batch pool node, and then test it to make sure the preinstalled Python runtime and packages work correctly.

Procedure steps

  1. Create a Windows VM in the Azure portal by specifying the following settings.

    Setting name Setting value
    Region The same region that's assigned to your batch account
    Image A Windows image that has Gen1 (not Gen2) in the name and is supported by the batch service
  2. Connect to the VM by using RDP.

  3. Run the Python Setup wizard, and then select the Customize installation option. Then, on the Advanced Options page, go to the Customize install location box, and specify a path on the C drive.

  4. Edit the system environment variable for Path so that it includes the Python installation path (such as C:\Python\) and the path to the Python installed packages (such as C:\Python\Scripts\).

    Note

    The Python Setup wizard adds these paths only to the user environment variable for Path. Therefore, you have to add the paths to the corresponding system environment variable. Otherwise, when you capture the VM image, the Python settings and the extra software packages that the user installed are deleted from the image.

    To append these paths to the system environment variable, follow these steps:

    1. Select Start, and then search for and select Settings.

    2. In the Settings app, select System > About > Advanced system settings.

    3. In the System Properties dialog box, select the Advanced tab, and then select Environment variables.

    4. In the Environment Variables dialog box, go to the System variables section, select the Path variable, and then select Edit.

    5. In the Edit environment variable dialog box, select New, and then enter the path to the Python installed packages. Then, repeat this step to enter the path to the Python runtime.

    6. Select OK three times to apply the changes in the three dialog boxes.

  5. Test the Python installation in a console.

    For example, after you run python --version to verify the version of Python that you installed, you can run the Python interpreter to try to import a package that hasn't been installed yet (such as numpy). After you get the expected ModuleNotFoundError exception, run the pip install <package-name> command to install the package, and then run the Python interpreter again to import that package. The import command should now succeed.

  6. Capture the VM image by following the steps in Create an image of a VM in the portal, but apply only the following settings on the Create an image page.

    Setting name Setting value
    Resource group Select from the list of resource groups, or select Create new to create a resource group.
    Target Azure compute gallery Select from the list of Azure compute galleries, or select Create new to create an Azure compute gallery.
    Target VM image definition Select Create new. In the Create a VM image definition page, enter only the VM image definition name that you want to give to the image. (The page will automatically provide the Publisher, Offer, and SKU settings.)
    Version number Enter a version number that you want to give to the VM image.
  7. After you apply the VM image settings, select Review + create to verify the settings, and then select Create to create the image. This step creates the following three resource types:

    • Azure Compute Gallery
    • Custom Image Definition
    • Custom Image Definition Version
  8. After the three resources are created, follow these steps to create a new batch pool that uses the VM image:

    1. In the Azure portal, search for and select Batch accounts.

    2. In the list of batch accounts, select your account.

    3. In the menu pane of your batch account, select Pools, and then select Add to create a batch pool.

    4. In the Add pool page, make the following settings, and then select OK.

      Setting name Setting value
      Pool ID The new name for your pool
      Image Type Custom image - Shared Image Gallery *
      Shared image gallery The name of the target Azure compute gallery that you specified when you created the VM image
      Image The VM image definition name that you specified when you created the VM image
      Version The version number that you specified when you created the VM image
      Operating system Windows
      OS distribution windowsserver
      OS sku The product code of the OS version that you selected (for example, microsoftwindowsserver-2019-datacenter)

      * Shared Image Gallery is another name for Azure Compute Gallery. This is one of the resources that was created during your VM image creation.

  9. After the batch pool node is allocated, go to the batch pool node page, and then select Connect in the heading menu.

  10. In the Connect pane, select the Generate a user option, select the Generate a random user button, and then select Download RDP file.

  11. Run the downloaded RDP file to connect to the new batch node.

  12. Test the Python installation again to make sure that the preinstalled version works correctly.

References

Contact us for help

If you have questions or need help, create a support request, or ask Azure community support. You can also submit product feedback to Azure feedback community.