Can I build the environment in the computing cluster using pip?

futo.mitsuishi 6 Reputation points
2021-08-10T15:34:12.357+00:00

I want to train AI model, and in the VM instance executing the command below worked well

pip install -r requirement.txt
python ~

Then in order to train the Ai model in the same environment in the VM computing cluster, in the Python 3.8 - AzureML notebook I executed below (I'm sorry I couldn't attach the screenshot)

import azureml.core
from azureml.core import Workspace
import os
from azureml.core import ScriptRunConfig
from azureml.core import Datastore
from azureml.core import Experiment
from azureml.core import Dataset
from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
from azureml.core import Environment
import datetime

cluster_name = 'high-2x-v100-1'
gpu_name = 'Standard_NC12s_v3'
experiment_name = 'training_agent_print'
hyperparameters = [
    '--max_train_time', '172800'
]
script_folder = './script_folder'

# workspace
ws = Workspace.from_config()
print(ws.name, ws.location, ws.resource_group, sep='\t')

# compute cluster
compute_name = os.environ.get("AML_COMPUTE_CLUSTER_NAME", cluster_name)
compute_min_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MIN_NODES", 0)
compute_max_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MAX_NODES", 4)
vm_size = os.environ.get("AML_COMPUTE_CLUSTER_SKU", gpu_name)

if compute_name in ws.compute_targets:
    compute_target = ws.compute_targets[compute_name]
    if compute_target and type(compute_target) is AmlCompute:
        print('found compute target. just use it. ' + compute_name)
else:
    print('creating a new compute target...')
    provisioning_config = AmlCompute.provisioning_configuration(vm_size=vm_size,
                                                                min_nodes=compute_min_nodes,
                                                                max_nodes=compute_max_nodes)
    compute_target = ComputeTarget.create(
        ws, compute_name, provisioning_config)

# environment
env = Environment.from_pip_requirements(name = "m8-pip-training", file_path = "./requirements.txt")
exp = Experiment(workspace=ws,name=experiment_name)

# run
src = ScriptRunConfig(source_directory=script_folder,
    script='main.py',
    arguments=hyperparameters,
    compute_target=compute_target,
    environment=env
)
run = exp.submit(config=src)

as a result, in the 20_image_build_log.txt file, I got the log as below

==> WARNING: A newer version of conda exists. <==
  current version: 4.9.2
  latest version: 4.10.3

Please update conda by running

    $ conda update -n base -c defaults conda


Pip subprocess error:
ERROR: Could not find a version that satisfies the requirement parlai==1.3.0 (from -r /azureml-environment-setup/condaenv.5svatkzc.requirements.txt (line 55)) (from versions: 0.1.20200409, 0.1.20200416, 0.1.20200610, 0.1.20200713, 0.1.20200716, 0.8.0, 0.9.0, 0.9.1, 0.9.2, 0.9.3, 0.9.4)
ERROR: No matching distribution found for parlai==1.3.0 (from -r /azureml-environment-setup/condaenv.5svatkzc.requirements.txt (line 55))


CondaEnvException: Pip failed

 [0mThe command '/bin/sh -c ldconfig /usr/local/cuda/lib64/stubs && conda env create -p /azureml-envs/azureml_ba289e67ead35c3dbaac125150111737 -f azureml-environment-setup/mutated_conda_dependencies.yml && rm -rf "$HOME/.cache/pip" && conda clean -aqy && CONDA_ROOT_DIR=$(conda info --root) && rm -rf "$CONDA_ROOT_DIR/pkgs" && find "$CONDA_ROOT_DIR" -type d -name __pycache__ -exec rm -rf {} + && ldconfig' returned a non-zero code: 1
2021/08/10 15:13:41 Container failed during run: acb_step_0. No retries remaining.
failed to run step ID: acb_step_0: exit status 1

Run ID: caj failed after 2m24s. Error: failed during run, err: exit status 1

Ans the experiment failed. I have 3 questions

  1. Why computing cluster is using conda to build image even though I export the file from pip?
  2. Can I build the environment using pip?
  3. As there is WARNING, if I can update the conda to latest version, the experiment might not faile. Can I update the conda in the computing cluster?

Thank you so much

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,341 questions
0 comments No comments
{count} vote

1 answer

Sort by: Most helpful
  1. Sietse Brouwer 6 Reputation points
    2021-08-11T10:13:07.957+00:00

    Summary:

    • Option 1: try to create a working Conda environment, either on your own computer or in the VM; run conda list --export my-conda-specification.yml, and specify your Environment with Environment.from_conda_specification('my-env-name', 'my-conda-specification.yml')
    • Option 2: create a Docker image, for example FROM mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04, and install your Python packages in there. Once it's working, publish the Docker image and tell Environment to use it.

    More details and some links below.

    1. Why computing cluster is using conda to build image even though I export the file from pip?

    Why MS uses Conda: unlike Pip, Conda can also control non-Python dependencies. Conda is also better at managing precompiled packages and tracking and solving their dependencies. (Under the hood, Conda uses Pip, which is why you're seeing "Pip subprocess error" in 20_image_build_log.txt.)

    It is not very hard to translate Pip's requirements.txt file to something Conda understands; I think Conda can even read requirements.txt directly. That is how it is possible that you can export a requirements.txt file from Pip, and Conda reads it.

    1. Can I build the environment using pip?

    There are two ways you can reproducibly specify the environment you need: either create a conda specification, or successfully use pip in Docker image and use the resulting Docker image.

    A. Create a conda specification that successfully builds the environment.

    If you have a working conda environment:

    • you can run activate it and run conda list --export conda-specification.txt to get the specification file (it will include any pip-installed dependencies!)
    • you can create a new environment from that file using conda create --name my_env_name --file conda-specification.txt
    • Hopefully it is also possible to run Environment.from_conda_specification('my-env-name', 'my-conda-specification.txt'). The reason I'm not sure is that conda list --export creates a plain text file, and Environment.from_conda_specification might expect a YAML file instead.

    If you're creating a YAML file to specify an environment, it probably looks like this (below) and is called something like conda-spec.yml.

       # conda-spec.yml  
       name: img-classification-part3-deploy-encrypted  
       dependencies:  
        - package1  # installed by `conda install`  
        - package2  # installed by conda  
        - pip:  
        - azureml-sdk  
          - matplotlib  
          - pandas  
          - azureml-opendatasets  
          - encrypted-inference==0.9  
          - azure-storage-blob  
    

    Creation, again, takes place via one of

    • conda create --name my_env_name --file my-conda-yaml.yml
    • Environment.from_conda_specification('my-env-name', 'my-conda-specification.txt')

    More details in these two URLs:

    B. Build a Docker image with a working environment, and tell Environment to use that Docker file.

    1. As there is WARNING, if I can update the conda to latest version, the experiment might not faile. Can I update the conda in the computing cluster?

    I don't specifically know if you can update conda in the cluster; but I know that updating Conda should not change which packages Conda finds or (tries to) install, so this probably will not help.

    I hope something of the above will help you. Good luck!

    ----

    EDIT 2021-08-17:

    • Use the correct command to export a conda env definition. I accidentally wrote the create command, instead...
    1 person found this answer helpful.

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.