Azure ML - Customizing a curated environment after cloning it doesn't seem to work

ThierryL 146 Reputation points
2022-03-17T02:27:07.33+00:00

Hello,

I am building an ML pipeline which runs data preparation and training scripts relying both on Scikit-learn and Tensorflow libraries.

Since Azure ML curated environments only include one library or the other, I followed instructions regarding how to customize a curated environment (https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-environments) to add additional libraries.

I am adding the 'tensorflow' package to an existing curated environment as follows:

USE_CURATED_ENV = True  

# Use and customize a curated environment provided by Azure  
if USE_CURATED_ENV :  

    curated_environment = Environment.get(workspace=ws, name="AzureML-sklearn-0.24-ubuntu18.04-py37-cpu")  
      
    # Clone the curated environment in order to add customized libraries  
    curated_clone = curated_environment.clone("customize_curated")  
      
    # Add necessary libraries to the existing curated environment  
    conda_dep = CondaDependencies()  
    conda_dep.add_conda_package("tensorflow")  
    curated_clone.python.conda_dependencies=conda_dep  
  
    # Associate the environment with the run configuration  
    aml_run_config.environment = curated_clone  

# Use a customized environment with specified packages only  
else:  
    aml_run_config.environment.python.user_managed_dependencies = False  
      
    # Add some packages relied on by data preparation step  
    aml_run_config.environment.python.conda_dependencies = CondaDependencies.create(  
        conda_packages=['pandas','scikit-learn','tensorflow'],   
        pip_packages=['azureml-sdk', 'azureml-dataset-runtime[fuse,pandas]'],   
        pin_sdk_version=False)  

However, when I run the pipeline, it fails on "import tensorflow", saying that such package doesn't exist.
I tried replacing conda_dep.add_conda_package("tensorflow") by conda_dep.add_pip_package("tensorflow"), but same error.

The alternative (when USE_CURATED_ENV = False) seems to work.
I don't understand why it doesn't work when cloning an existing curated environment.

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,338 questions
0 comments No comments
{count} vote

2 answers

Sort by: Most helpful
  1. romungi-MSFT 48,911 Reputation points Microsoft Employee Moderator
    2022-03-17T11:10:34.533+00:00

    @ThierryL-3166 I think you should be using the following for conda tensorflow package:

    add_tensorflow_conda_package(core_type='cpu', version=None)

    Similarly, for pip tensorflow package:

    add_tensorflow_pip_package(core_type='cpu', version=None)

    Once you add these, you can list the packages and check if it is part of the conda dependencies.

    if curated_clone.python.conda_dependencies is not None:  
        print("packages", curated_clone.python.conda_dependencies.serialize_to_string())  
    

    If an answer is helpful, please click on 130616-image.png or upvote 130671-image.png which might help other community members reading this thread.


  2. ThierryL 146 Reputation points
    2022-03-18T06:35:02.53+00:00

    Actually this didn't solve the problem.
    I can see 'tensorflow' is added to the conda dependencies, but it doesn't seem to be linked to my cloned environment.
    I registered the cloned environment to my workspace in order to see the resulting Dockerfile. It does't include tensorflow.

    curated_environment = Environment.get(workspace=ws, name="AzureML-sklearn-0.24-ubuntu18.04-py37-cpu")
    curated_clone = curated_environment.clone("customize_curated")
    conda_dep = CondaDependencies()
    conda_dep.add_tensorflow_conda_package(core_type='cpu', version='2.7')
    curated_clone.python.conda_dependencies=conda_dep
    curated_clone.register(workspace=ws)
    
    if curated_clone.python.conda_dependencies is not None:
        print("packages", curated_clone.python.conda_dependencies.serialize_to_string())
    

    The output of the 'print' is as follows:

    Running
    packages # Conda environment specification. The dependencies defined in this file will
    # be automatically provisioned for runs with userManagedDependencies=False.
    
    # Details about the Conda environment file format:
    # https://conda.io/docs/user-guide/tasks/manage-environments.html#create-env-file-manually
    
    name: project_environment
    dependencies:
      # The python interpreter version.
      # Currently Azure ML only supports 3.5.2 and later.
    - python=3.6.2
    
    - pip:
        # Required packages for AzureML execution, history, and data preparation.
      - azureml-defaults
    
    - tensorflow=2.7
    channels:
    - anaconda
    - conda-forge
    

    And the Dockerfile is:

    FROM mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20220314.v1
    
    ENV AZUREML_CONDA_ENVIRONMENT_PATH /azureml-envs/sklearn-0.24.1
    
    # Create conda environment
    RUN conda create -p $AZUREML_CONDA_ENVIRONMENT_PATH \
        python=3.7 pip=20.2.4
    
    # Prepend path to AzureML conda environment
    ENV PATH $AZUREML_CONDA_ENVIRONMENT_PATH/bin:$PATH
    
    # Install pip dependencies
    RUN pip install 'matplotlib>=3.3,<3.4' \
                    'psutil>=5.8,<5.9' \
                    'tqdm>=4.59,<4.60' \
                    'pandas>=1.1,<1.2' \
                    'scipy>=1.5,<1.6' \
                    'numpy>=1.10,<1.20' \
                    'ipykernel~=6.0' \
                    'azureml-core==1.39.0' \
                    'azureml-defaults==1.39.0' \
                    'azureml-mlflow==1.39.0.post1' \
                    'azureml-telemetry==1.39.0' \
                    'scikit-learn==0.24.1'
    
    # This is needed for mpi to locate libpython
    ENV LD_LIBRARY_PATH $AZUREML_CONDA_ENVIRONMENT_PATH/lib:$LD_LIBRARY_PATH
    

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.