Can I build the environment in the computing cluster using pip?

Question

Can I build the environment in the computing cluster using pip?

futo.mitsuishi 6

I want to train AI model, and in the VM instance executing the command below worked well

pip install -r requirement.txt
python ~

Then in order to train the Ai model in the same environment in the VM computing cluster, in the Python 3.8 - AzureML notebook I executed below (I'm sorry I couldn't attach the screenshot)

import azureml.core
from azureml.core import Workspace
import os
from azureml.core import ScriptRunConfig
from azureml.core import Datastore
from azureml.core import Experiment
from azureml.core import Dataset
from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
from azureml.core import Environment
import datetime

cluster_name = 'high-2x-v100-1'
gpu_name = 'Standard_NC12s_v3'
experiment_name = 'training_agent_print'
hyperparameters = [
    '--max_train_time', '172800'
]
script_folder = './script_folder'

# workspace
ws = Workspace.from_config()
print(ws.name, ws.location, ws.resource_group, sep='\t')

# compute cluster
compute_name = os.environ.get("AML_COMPUTE_CLUSTER_NAME", cluster_name)
compute_min_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MIN_NODES", 0)
compute_max_nodes = os.environ.get("AML_COMPUTE_CLUSTER_MAX_NODES", 4)
vm_size = os.environ.get("AML_COMPUTE_CLUSTER_SKU", gpu_name)

if compute_name in ws.compute_targets:
    compute_target = ws.compute_targets[compute_name]
    if compute_target and type(compute_target) is AmlCompute:
        print('found compute target. just use it. ' + compute_name)
else:
    print('creating a new compute target...')
    provisioning_config = AmlCompute.provisioning_configuration(vm_size=vm_size,
                                                                min_nodes=compute_min_nodes,
                                                                max_nodes=compute_max_nodes)
    compute_target = ComputeTarget.create(
        ws, compute_name, provisioning_config)

# environment
env = Environment.from_pip_requirements(name = "m8-pip-training", file_path = "./requirements.txt")
exp = Experiment(workspace=ws,name=experiment_name)

# run
src = ScriptRunConfig(source_directory=script_folder,
    script='main.py',
    arguments=hyperparameters,
    compute_target=compute_target,
    environment=env
)
run = exp.submit(config=src)

as a result, in the 20_image_build_log.txt file, I got the log as below

==> WARNING: A newer version of conda exists. <==
  current version: 4.9.2
  latest version: 4.10.3

Please update conda by running

    $ conda update -n base -c defaults conda


Pip subprocess error:
ERROR: Could not find a version that satisfies the requirement parlai==1.3.0 (from -r /azureml-environment-setup/condaenv.5svatkzc.requirements.txt (line 55)) (from versions: 0.1.20200409, 0.1.20200416, 0.1.20200610, 0.1.20200713, 0.1.20200716, 0.8.0, 0.9.0, 0.9.1, 0.9.2, 0.9.3, 0.9.4)
ERROR: No matching distribution found for parlai==1.3.0 (from -r /azureml-environment-setup/condaenv.5svatkzc.requirements.txt (line 55))


CondaEnvException: Pip failed

 [0mThe command '/bin/sh -c ldconfig /usr/local/cuda/lib64/stubs && conda env create -p /azureml-envs/azureml_ba289e67ead35c3dbaac125150111737 -f azureml-environment-setup/mutated_conda_dependencies.yml && rm -rf "$HOME/.cache/pip" && conda clean -aqy && CONDA_ROOT_DIR=$(conda info --root) && rm -rf "$CONDA_ROOT_DIR/pkgs" && find "$CONDA_ROOT_DIR" -type d -name __pycache__ -exec rm -rf {} + && ldconfig' returned a non-zero code: 1
2021/08/10 15:13:41 Container failed during run: acb_step_0. No retries remaining.
failed to run step ID: acb_step_0: exit status 1

Run ID: caj failed after 2m24s. Error: failed during run, err: exit status 1

Ans the experiment failed. I have 3 questions

Why computing cluster is using conda to build image even though I export the file from pip?
Can I build the environment using pip?
As there is WARNING, if I can update the conda to latest version, the experiment might not faile. Can I update the conda in the computing cluster?

Thank you so much

1 answer

Your answer

Answer 1

Summary:

Option 1: try to create a working Conda environment, either on your own computer or in the VM; run conda list --export my-conda-specification.yml, and specify your Environment with Environment.from_conda_specification('my-env-name', 'my-conda-specification.yml')
Option 2: create a Docker image, for example FROM mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04, and install your Python packages in there. Once it's working, publish the Docker image and tell Environment to use it.

More details and some links below.

Why computing cluster is using conda to build image even though I export the file from pip?

Why MS uses Conda: unlike Pip, Conda can also control non-Python dependencies. Conda is also better at managing precompiled packages and tracking and solving their dependencies. (Under the hood, Conda uses Pip, which is why you're seeing "Pip subprocess error" in 20_image_build_log.txt.)

It is not very hard to translate Pip's requirements.txt file to something Conda understands; I think Conda can even read requirements.txt directly. That is how it is possible that you can export a requirements.txt file from Pip, and Conda reads it.

Can I build the environment using pip?

There are two ways you can reproducibly specify the environment you need: either create a conda specification, or successfully use pip in Docker image and use the resulting Docker image.

A. Create a conda specification that successfully builds the environment.

If you have a working conda environment:

you can run activate it and run conda list --export conda-specification.txt to get the specification file (it will include any pip-installed dependencies!)
you can create a new environment from that file using conda create --name my_env_name --file conda-specification.txt
Hopefully it is also possible to run Environment.from_conda_specification('my-env-name', 'my-conda-specification.txt'). The reason I'm not sure is that conda list --export creates a plain text file, and Environment.from_conda_specification might expect a YAML file instead.

If you're creating a YAML file to specify an environment, it probably looks like this (below) and is called something like conda-spec.yml.

   # conda-spec.yml  
   name: img-classification-part3-deploy-encrypted  
   dependencies:  
    - package1  # installed by `conda install`  
    - package2  # installed by conda  
    - pip:  
    - azureml-sdk  
      - matplotlib  
      - pandas  
      - azureml-opendatasets  
      - encrypted-inference==0.9  
      - azure-storage-blob

Creation, again, takes place via one of

conda create --name my_env_name --file my-conda-yaml.yml
Environment.from_conda_specification('my-env-name', 'my-conda-specification.txt')

More details in these two URLs:

B. Build a Docker image with a working environment, and tell Environment to use that Docker file.

To get the Azure requirements, use FROM mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04 (or, if you need GPU, one of the image tags in https://github.com/Azure/AzureML-Containers#featured-tags)
Example of telling Environment to use a Docker image : https://azure.github.io/azureml-cheatsheets/docs/cheatsheets/python/v1/environment/

As there is WARNING, if I can update the conda to latest version, the experiment might not faile. Can I update the conda in the computing cluster?

I don't specifically know if you can update conda in the cluster; but I know that updating Conda should not change which packages Conda finds or (tries to) install, so this probably will not help.

I hope something of the above will help you. Good luck!

----

EDIT 2021-08-17:

Use the correct command to export a conda env definition. I accidentally wrote the create command, instead...

futo.mitsuishi 6 Reputation points

2021-08-12T18:07:39.09+00:00

Thank you for nice reply!!!

Option A failed, so I'll try B.
I have 1 more question with option A. In the conda environment , I implemented conda install pip and pip install -r requirement.txt to get the environment that I want to reproduce in cluster and export to yaml file. Then I create the environment with the yaml file. However it was figured out that the environment doesn't perfectly reproduce the environment I desired because there wasn't the module I installed.

My question is: Is there any case that exporting the yaml file and creating the environment based on it can't perfectly reproduce the environment? Is there any possible solution?

When I implemented pip install -r requirement.txt 1 dependancy conflict happened but pip solved this.

Can I ask some additional questions as well? I'd be happy to get an answer

all time I implement conda remove -n {myenv} --all, the tarminal gets in the environment azureml_py38. Is this an expected behavior? I always close the terminal once and re-open it.

When I implement conda env create -n {myenv} -f {yaml file} sometimes I got the error /anaconda/pkge/~ can't be deleted. Please remove manually Is this an expected behavior?

Thank you so much
futo.mitsuishi 6 Reputation points

2021-08-14T15:34:42.843+00:00

During trying options, I could notice the my real problem was not environmental issue and could be solved!
And I will refer to this when I bump into an environment problem next time!

T Thank you so much!!!!!!!
Sietse Brouwer 6 Reputation points

2021-08-17T12:31:23.52+00:00

Is there any case that exporting the yaml file and creating the environment based on it can't perfectly reproduce the environment? Is there any possible solution?

May I ask what command you used to export the environment? In my original answer, I wrongly said to use conda create --name {myenv} --file {myfile}, which creates an environment instead of exporting it; I have just now edited my answer to use conda list --export, but I am curious if you used the same command.

For me, conda list --export produces very specific dependency strings, and even includes packages that I installed via pip, like this fragment:

... sqlite=3.13.0=0 tqdm=4.62.1=pypi_0 # <-- I installed this one via pip, probably that is what `=pypi_0` means. traitlets=4.3.1=py27_0 wcwidth=0.1.7=py27_0 wheel=0.29.0=py27_0 zlib=1.2.8=3

In regards to the two additional questions you asked, I regret that I do not know the answers. I wish we could sit behind one computer, so that we could figure it out together and laugh together when we solved it; but alas, we are strangers on the Internet, connected only by copper and fiber, electricity and light, the thinnest of virtual threads.

Finally, in reply to your other comment below: congratulations on getting it working!

Kind regards,

Sietse

Share via

Can I build the environment in the computing cluster using pip?

1 answer

Your answer