Azure Machine Learning AutoML Online Endpoint Deploy Failure (ImageBuildFailure, LibMambaUnsatisfiableError)

cynicdog 30 Reputation points
2024-10-15T05:58:25.53+00:00

Hey everyone, I'm getting an error in my Azure Machine Learning pipeline script which was working fine and successfully deploying an online endpoint with an AutoML model attached to it a month ago. Looks like there has been an update in the conda environment configuration file (conda.yml) for AutoML, and the new versions of the libraries do not add up.

** Note:

My concern is not about manually modifying files or scripts; rather, it's that the conda.yml file generated during the AutoML learning process contains multiple version conflicts. This issue should not occur in the first place, as the configuration file used to be conflict-free just a month ago.

Error says:

Code: ImageBuildFailure
Message: Deployment failed due to no Environment Image available. Check Environment Build Log in ML Studio Workspace or Workspace storage for potential failures. Environment info: Name: CliV2AnonymousEnvironment, Version: 87cf64356c88c5359f71987fba138854, you may be able to find the build log under the storage account 'inbreinazureml6931662388' in the container 'aml-environment-image-build' at the path 'CliV2AnonymousEnvironment/87cf64356c88c5359f71987fba138854/imgbldrun_6cce906/image_build_log.txt'. Please see troubleshooting guide, available here: https://aka.ms/oe-tsg#error-imagebuildfailure

Following the troubleshooting guide, I found some information about a possible conda libraries version conflicts. It reads:

2024-10-15T00:37:41: LibMambaUnsatisfiableError: Encountered problems while solving:
2024-10-15T00:37:41:   - nothing provides libcublas >=12.1.0.26,<12.1.3.1 needed by pytorch-cuda-12.1-ha16c6d3_5
2024-10-15T00:37:41:   - nothing provides cuda-cudart >=11.8,<12.0 needed by pytorch-cuda-11.8-h7e8668a_3
2024-10-15T00:37:41: Could not solve for environment specs
2024-10-15T00:37:41: The following packages are incompatible
2024-10-15T00:37:41: ├─ blas 2.16**  is installable with the potential options
2024-10-15T00:37:41: │  ├─ blas 2.16 would require
2024-10-15T00:37:41: │  │  └─ liblapacke 3.8.0 16_mkl, which requires
2024-10-15T00:37:41: │  │     ├─ blas * mkl, which can be installed;
2024-10-15T00:37:41: │  │     └─ libblas 3.8.0 16_mkl, which requires
2024-10-15T00:37:41: │  │        └─ liblapack 3.8.0 16_mkl, which can be installed;
2024-10-15T00:37:41: │  ├─ blas 2.16 conflicts with any installable versions previously reported;
2024-10-15T00:37:41: │  └─ blas [2.10|2.11|...|2.9] would require
2024-10-15T00:37:41: │     └─ liblapacke [3.8.0 10_openblas|3.8.0 11_openblas|...|3.8.0 9_openblas] but there are no viable options
2024-10-15T00:37:41: │        ├─ liblapacke 3.8.0 would require
2024-10-15T00:37:41: │        │  └─ blas * openblas, which conflicts with any installable versions previously reported;
2024-10-15T00:37:41: │        └─ liblapacke 3.8.0 would require
2024-10-15T00:37:41: │           └─ blas * openblas, which conflicts with any installable versions previously reported;
2024-10-15T00:37:41: ├─ libuuid 2.38.1**  is requested and can be installed;
2024-10-15T00:37:41: ├─ llvm-openmp 15.0.7**  is requested and can be installed;
2024-10-15T00:37:41: ├─ mkl 2020.2**  is requested and can be installed;
2024-10-15T00:37:41: ├─ numpy 1.23.5**  is installable with the potential options
2024-10-15T00:37:41: │  ├─ numpy 1.23.5 would require
2024-10-15T00:37:41: │  │  └─ python_abi 3.10.* *_cp310, which can be installed;
2024-10-15T00:37:41: │  ├─ numpy 1.23.5 would require
2024-10-15T00:37:41: │  │  └─ python_abi 3.11.* *_cp311, which can be installed;
2024-10-15T00:37:41: │  ├─ numpy 1.23.5 would require
...

(Full log: image_build_log.txt)

The conda dependencies are specified in the conda.yml file I downloaded from the best AutoML model, which I then passed as an argument in the command. Here are the Azure CLI commands in my pipeline script:

        # Download the model artifact to the specified directory
        az ml job download --name $(cat ${{inputs.automl_best_model}}/job_name.txt) \
          --all --download-path /app/configs/downloaded_artifacts \
          --workspace-name $WS_NAME \
          --resource-group $RG_NAME

        az ml online-deployment create \
          --name $DEPLOYMENT_NAME \
          --endpoint-name $ENDPOINT_NAME \
          --file /app/configs/automl_deployment_settings.yml \
          --workspace-name $WS_NAME \
          --resource-group $RG_NAME

where the automl_deployment_settings.yml reads as:

$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: my-model-deployment
endpoint_name: my-endpoint
model:
  path: ./downloaded_artifacts/named-outputs/best_model/
code_configuration:
  code: ./
  scoring_script: score.py
environment: 
  conda_file: ./downloaded_artifacts/named-outputs/best_model/conda.yaml
  image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest
instance_type: Standard_E2s_v3
instance_count: 1

The complete conda.yml file looks like this:


channels:
- conda-forge
- anaconda
- pytorch
dependencies:
- python=3.9.19
- pip:
  - adal==1.2.7
  - annotated-types==0.7.0
  - applicationinsights==0.11.10
  - arch==5.6.0
  - argcomplete==3.5.0
  - asttokens==2.4.1
  - attrs==24.2.0
  - azure-common==1.1.28
  - azure-core==1.31.0
  - azure-graphrbac==0.61.1
  - azure-identity==1.18.0
  - azure-mgmt-authorization==4.0.0
  - azure-mgmt-containerregistry==10.3.0
  - azure-mgmt-core==1.4.0
  - azure-mgmt-keyvault==10.3.1
  - azure-mgmt-network==26.0.0
  - azure-mgmt-resource==23.1.1
  - azure-mgmt-storage==21.2.1
  - azure-storage-blob==12.19.0
  - azure-storage-queue==12.12.0
  - azureml-automl-core==1.57.0
  - azureml-automl-runtime==1.57.0
  - azureml-core==1.57.0.post1
  - azureml-dataprep==5.1.6
  - azureml-dataprep-native==41.0.0
  - azureml-dataprep-rslex==2.22.4
  - azureml-dataset-runtime==1.57.0
  - azureml-defaults==1.57.0.post1
  - azureml-inference-server-http==1.3.2
  - azureml-interpret==1.57.0
  - azureml-mlflow==1.57.0.post1
  - azureml-pipeline-core==1.57.0
  - azureml-responsibleai==1.57.0
  - azureml-telemetry==1.57.0
  - azureml-train-automl==1.57.0
  - azureml-train-automl-client==1.57.0
  - azureml-train-automl-runtime==1.57.0
  - azureml-train-core==1.57.0
  - azureml-train-restclients-hyperdrive==1.57.0
  - azureml-training-tabular==1.57.0.post1
  - backports-tempfile==1.0
  - backports-weakref==1.0.post1
  - bcrypt==4.2.0
  - blinker==1.8.2
  - bokeh==2.4.3
  - cachetools==5.5.0
  - certifi==2024.8.30
  - cffi==1.17.1
  - charset-normalizer==3.3.2
  - cmdstanpy==1.2.4
  - coloredlogs==15.0.1
  - comm==0.2.2
  - contextlib2==21.6.0
  - contourpy==1.3.0
  - convertdate==2.4.0
  - cryptography==43.0.1
  - cuda-libraries==12.1.0
  - cuda-runtime==12.1.0
  - cycler==0.12.1
  - cython==3.0.11
  - dask==2023.2.0
  - databricks-cli==0.18.0
  - dataclasses==0.6
  - debugpy==1.8.5
  - decorator==5.1.1
  - dice-ml==0.11
  - dill==0.3.8
  - distributed==2023.2.0
  - distro==1.9.0
  - docker==7.1.0
  - dotnetcore2==3.1.23
  - econml==0.15.1
  - entrypoints==0.4
  - ephem==4.1.5
  - erroranalysis==0.5.4
  - exceptiongroup==1.2.2
  - executing==2.1.0
  - fairlearn==0.7.0
  - fire==0.6.0
  - flask==2.3.2
  - flask-cors==4.0.2
  - flatbuffers==24.3.25
  - fonttools==4.53.1
  - fsspec==2024.9.0
  - fusepy==3.0.1
  - gensim==4.3.2
  - gitdb==4.0.11
  - gitpython==3.1.43
  - google-api-core==2.20.0
  - google-auth==2.35.0
  - googleapis-common-protos==1.65.0
  - gunicorn==22.0.0
  - humanfriendly==10.0
  - idna==3.10
  - importlib-metadata==7.2.1
  - importlib-resources==6.4.0
  - inference-schema==1.8
  - interpret-community==0.31.0
  - interpret-core==0.5.0
  - ipykernel==6.29.5
  - ipython==8.18.1
  - isodate==0.6.1
  - itsdangerous==2.2.0
  - jedi==0.19.1
  - jeepney==0.8.0
  - jmespath==0.10.0
  - jsonpickle==3.3.0
  - jsonschema==4.23.0
  - jsonschema-specifications==2023.12.1
  - jupyter-client==8.6.3
  - jupyter-core==5.7.2
  - keras2onnx==1.6.0
  - kiwisolver==1.4.7
  - knack==0.11.0
  - libcublas==12.1.0.26
  - libcufft==11.0.2.4
  - libcusolver==11.4.4.55
  - libcusparse==12.0.2.55
  - libnpp==12.0.2.50
  - libnvjpeg==12.1.1.14
  - lightgbm==3.2.1
  - locket==1.0.0
  - lunarcalendar==0.0.9
  - matplotlib==3.9.2
  - matplotlib-inline==0.1.7
  - ml-wrappers==0.5.6
  - mlflow-skinny==2.9.2
  - mltable==1.6.1
  - msal==1.31.0
  - msal-extensions==1.2.0
  - msgpack==1.1.0
  - msrest==0.7.1
  - msrestazure==0.6.4.post1
  - ndg-httpsclient==0.5.1
  - nest-asyncio==1.6.0
  - networkx==2.5
  - numba==0.56.4
  - oauthlib==3.2.2
  - onnx==1.16.1
  - onnxconverter-common==1.13.0
  - onnxmltools==1.11.2
  - onnxruntime==1.17.3
  - opencensus==0.11.4
  - opencensus-context==0.1.3
  - opencensus-ext-azure==1.1.13
  - packaging==23.2
  - paramiko==3.5.0
  - parso==0.8.4
  - partd==1.4.2
  - pathspec==0.12.1
  - patsy==0.5.6
  - pexpect==4.9.0
  - pillow==10.4.0
  - pkginfo==1.11.1
  - platformdirs==4.3.6
  - pmdarima==1.8.5
  - portalocker==2.10.1
  - prompt-toolkit==3.0.47
  - property-cached==1.6.4
  - prophet==1.1.4
  - proto-plus==1.24.0
  - protobuf==3.20.3
  - ptyprocess==0.7.0
  - pure-eval==0.2.3
  - pyarrow==14.0.2
  - pyasn1==0.6.1
  - pyasn1-modules==0.4.1
  - pycparser==2.22
  - pydantic==2.7.4
  - pydantic-core==2.18.4
  - pydantic-settings==2.5.2
  - pygments==2.18.0
  - pyjwt==2.9.0
  - pymeeus==0.5.12
  - pynacl==1.5.0
  - pyopenssl==24.2.1
  - pyparsing==3.1.4
  - pysocks==1.7.1
  - python-dateutil==2.9.0.post0
  - python-dotenv==1.0.1
  - pytz==2023.4
  - pyzmq==26.2.0
  - raiutils==0.4.2
  - referencing==0.35.1
  - requests==2.32.3
  - requests-oauthlib==2.0.0
  - responsibleai==0.36.0
  - rpds-py==0.20.0
  - rsa==4.9
  - s3transfer==0.5.2
  - scipy==1.10.1
  - secretstorage==3.3.3
  - semver==2.13.0
  - six==1.16.0
  - skl2onnx==1.15.0
  - sklearn-pandas==1.7.0
  - slicer==0.0.7
  - smart-open==6.4.0
  - smmap==5.0.1
  - sortedcontainers==2.4.0
  - sparse==0.15.4
  - sqlparse==0.5.1
  - stack-data==0.6.3
  - stanio==0.5.1
  - statsmodels==0.13.5
  - tabulate==0.9.0
  - tblib==3.0.0
  - termcolor==2.4.0
  - threadpoolctl==3.5.0
  - toolz==0.12.1
  - tornado==6.4.1
  - tqdm==4.66.5
  - traitlets==5.14.3
  - urllib3==1.26.20
  - wcwidth==0.2.13
  - werkzeug==3.0.4
  - wrapt==1.16.0
  - xgboost==1.5.2
  - zict==3.0.0
  - zipp==3.20.2
- blas=2.16
- boto3=1.20.19
- botocore=1.23.19
- bzip2=1.0.8
- ca-certificates=2024.8.30
- cloudpickle=2.2.1
- cuda-cudart=12.1.105
- cuda-cudart_linux-64=12.1.105
- cuda-cupti=12.1.105
- cuda-nvrtc=12.1.105
- cuda-nvtx=12.1.105
- cuda-opencl=12.1.105
- cuda-version=12.1
- filelock=3.16.1
- gmp=6.3.0
- gmpy2=2.1.5
- holidays=0.57
- intel-openmp=2022.0.1
- jinja2=3.1.4
- joblib=1.2.0
- libblas=3.8.0
- libcblas=3.8.0
- libcufile=1.6.1.9
- libcurand=10.3.2.106
- libffi=3.4.2
- libgcc=14.1.0
- libgfortran-ng=7.5.0
- libgfortran4=7.5.0
- liblapack=3.8.0
- liblapacke=3.8.0
- libnsl=2.0.1
- libnvjitlink=12.1.105
- libsqlite=3.46.1
- libstdcxx=14.1.0
- libuuid=2.38.1
- libxcrypt=4.4.36
- libzlib=1.3.1
- llvm-openmp=15.0.7
- llvmlite=0.39.1
- markupsafe=2.1.5
- mkl=2020.2
- mpc=1.3.1
- mpfr=4.2.1
- mpmath=1.3.0
- numpy=1.23.5
- ocl-icd=2.3.2
- pandas=1.3.5
- pip=24.2
- psutil=5.9.3
- py-cpuinfo=5.0.0
- python_abi=3.9
- pytorch=2.2.2
- pytorch-cuda=12.1
- pytorch-mutex=1.0
- pyyaml=6.0.2
- scikit-learn=1.5.1
- setuptools=74.1.2
- setuptools-git=1.2
- sympy=1.13.2
- torchtriton=2.2.0
- typing_extensions=4.12.2
- tzdata=2024a
- wheel=0.44.0
- yaml=0.2.5
name: project_environment

I'm asking if this is a known issue and would there be a resolution for the version conflicts for AutoML conda configuration, Thank you!

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,341 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 34,101 Reputation points Volunteer Moderator
    2024-10-15T08:07:49.3166667+00:00

    Review your conda.yaml file and ensure that the dependencies are compatible with each other. Since the error points to specific version conflicts with cuda, libcublas, and pytorch, you may need to adjust their versions to ensure they align. Try the following:

    • Update pytorch-cuda and cuda-cudart versions: Ensure that pytorch-cuda and cuda-cudart versions are compatible with each other. You may try setting both to version 12.1 or adjusting them to versions known to work together.
    • Check pytorch version: Make sure the pytorch version is compatible with the pytorch-cuda and cuda libraries. The latest versions of pytorch often require newer CUDA versions.
    • Check blas and liblapacke dependencies: These libraries may also be contributing to the conflicts. Consider pinning versions that are known to work together or switching to an alternative like openblas. Instead of specifying a custom environment and conda.yaml, you can use one of the pre-built environments provided by Azure ML. This avoids potential dependency conflicts, especially for GPU-enabled models. The following image might work for your deployment:
      
         image: mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:latest
      
      
      If you're already using a similar image, you can try switching to another supported image or checking if Azure has released a newer version of the same image. If you're tied to certain versions (for example for reproducibility), try building the environment locally using conda to identify and resolve conflicts before passing it to Azure ML. This gives more control over the resolution process. You can create and test the environment locally using:
      
         conda create --name myenv --file conda.yaml
      
      
      This will reveal potential conflicts directly in your development environment. If this issue only started occurring after recent updates, you might want to roll back the conda dependencies to versions that were previously working. Pin the package versions to those that were deployed successfully a month ago.

    Example of Adjusted conda.yaml:

    
    dependencies:
    
      - python=3.9.19
    
      - pytorch=2.1.0  # Adjusting to a compatible version
    
      - pytorch-cuda=11.8  # Ensure compatibility with PyTorch and CUDA
    
      - blas=2.16
    
      - liblapacke=3.8.0
    
      - cuda-cudart=11.8  # Matching with pytorch-cuda
    
      - pip:
    
          - azureml-automl-runtime==1.57.0  # AutoML runtime
    
          - numpy==1.23.5
    

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.