Share via

Batch endpoint does not work with a specific model

Guilherme Matheus 140 Reputation points Microsoft Employee
2025-10-15T17:37:31.8966667+00:00

I have a model that is working fine when I do training + batch inference. But when I do the same from another model I have, I get this error:

2025-10-14T17:05:53: #10 52.37 Collecting pyarrow<20,>=4.0.0 (from mlflow==2.22.2->-r /azureml-environment-setup/condaenv.byjt1f6w.requirements.txt (line 1))
2025-10-14T17:05:53: #10 52.37   Downloading pyarrow-6.0.0.tar.gz (769 kB)
2025-10-14T17:05:53: #10 52.37      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 769.6/769.6 kB 39.2 MB/s eta 0:00:00
2025-10-14T17:05:53: #10 52.37   Installing build dependencies: started
2025-10-14T17:05:53: #10 52.37   Installing build dependencies: finished with status 'error'
2025-10-14T17:05:53: #10 52.39 
2025-10-14T17:05:53: #10 52.39 failed
2025-10-14T17:05:53: #10 52.39 
2025-10-14T17:05:53: #10 52.39 CondaEnvException: Pip failed
2025-10-14T17:05:53: #10 52.39 
2025-10-14T17:05:55: #10 ERROR: process "/bin/sh -c ldconfig /usr/local/cuda/lib64/stubs && conda env create -p /azureml-envs/azureml_f43a770854f1e887af78e52cfb84206a -f azureml-environment-setup/mutated_conda_dependencies.yml && rm -rf \"$HOME/.cache/pip\" && conda clean -aqy && CONDA_ROOT_DIR=$(conda info --root) && rm -rf \"$CONDA_ROOT_DIR/pkgs\" && find \"$CONDA_ROOT_DIR\" -type d -name __pycache__ -exec rm -rf {} + && ldconfig" did not complete successfully: exit code: 1
2025-10-14T17:05:55: ------
2025-10-14T17:05:55:  > [ 6/10] RUN ldconfig /usr/local/cuda/lib64/stubs && conda env create -p /azureml-envs/azureml_f43a770854f1e887af78e52cfb84206a -f azureml-environment-setup/mutated_conda_dependencies.yml && rm -rf "$HOME/.cache/pip" && conda clean -aqy && CONDA_ROOT_DIR=$(conda info --root) && rm -rf "$CONDA_ROOT_DIR/pkgs" && find "$CONDA_ROOT_DIR" -type d -name __pycache__ -exec rm -rf {} + && ldconfig:
2025-10-14T17:05:55: 52.37 Collecting pyarrow<20,>=4.0.0 (from mlflow==2.22.2->-r /azureml-environment-setup/condaenv.byjt1f6w.requirements.txt (line 1))
2025-10-14T17:05:55: 52.37   Downloading pyarrow-6.0.0.tar.gz (769 kB)
2025-10-14T17:05:55: 52.37      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 769.6/769.6 kB 39.2 MB/s eta 0:00:00
2025-10-14T17:05:55: 52.37   Installing build dependencies: started
2025-10-14T17:05:55: 52.37   Installing build dependencies: finished with status 'error'
2025-10-14T17:05:55: 52.39 
2025-10-14T17:05:55: 52.39 failed
2025-10-14T17:05:55: 52.39 
2025-10-14T17:05:55: 52.39 CondaEnvException: Pip failed
2025-10-14T17:05:55: 52.39 
2025-10-14T17:05:55: ------
2025-10-14T17:05:55: Dockerfile:8
2025-10-14T17:05:55: --------------------
2025-10-14T17:05:55:    6 |     RUN if dpkg --compare-versions `conda --version | grep -oE '[^ ]+$'` lt 4.4.11; then conda install conda==4.4.11; fi
2025-10-14T17:05:55:    7 |     COPY azureml-environment-setup/mutated_conda_dependencies.yml azureml-environment-setup/mutated_conda_dependencies.yml
2025-10-14T17:05:55:    8 | >>> RUN ldconfig /usr/local/cuda/lib64/stubs && conda env create -p /azureml-envs/azureml_f43a770854f1e887af78e52cfb84206a -f azureml-environment-setup/mutated_conda_dependencies.yml && rm -rf "$HOME/.cache/pip" && conda clean -aqy && CONDA_ROOT_DIR=$(conda info --root) && rm -rf "$CONDA_ROOT_DIR/pkgs" && find "$CONDA_ROOT_DIR" -type d -name __pycache__ -exec rm -rf {} + && ldconfig
2025-10-14T17:05:55:    9 |     # AzureML Conda environment name: azureml_f43a770854f1e887af78e52cfb84206a
2025-10-14T17:05:55:   10 |     ENV PATH /azureml-envs/azureml_f43a770854f1e887af78e52cfb84206a/bin:$PATH
2025-10-14T17:05:55: --------------------
2025-10-14T17:05:55: ERROR: failed to solve: process "/bin/sh -c ldconfig /usr/local/cuda/lib64/stubs && conda env create -p /azureml-envs/azureml_f43a770854f1e887af78e52cfb84206a -f azureml-environment-setup/mutated_conda_dependencies.yml && rm -rf \"$HOME/.cache/pip\" && conda clean -aqy && CONDA_ROOT_DIR=$(conda info --root) && rm -rf \"$CONDA_ROOT_DIR/pkgs\" && find \"$CONDA_ROOT_DIR\" -type d -name __pycache__ -exec rm -rf {} + && ldconfig" did not complete successfully: exit code: 1


2025-10-14T17:05:55: CalledProcessError(1, ['docker', 'build', '-f', 'azureml-environment-setup/Dockerfile', '.', '-t', 'gmatheus01rcrmlw.azurecr.io/azureml/azureml_72d9e2d0364abe0ef860b2a28faeafa4', '-t', 'gmatheus01rcrmlw.azurecr.io/azureml/azureml_72d9e2d0364abe0ef860b2a28faeafa4:1'])

2025-10-14T17:05:55: Building docker image failed with exit code: 1

2025-10-14T17:05:55: Logging out of Docker registry: gmatheus01rcrmlw.azurecr.io
2025-10-14T17:05:55: Removing login credentials for https://index.docker.io/v1/


2025-10-14T17:05:55: Traceback (most recent call last):
  File "/mnt/azureml/cr/j/3ae789d2668641d0beb32b232a29395c/exe/wd/docker_utilities.py", line 152, in _docker_build_or_error
    docker_execute_function(docker_command, build_command, print_command_args=True)
  File "/mnt/azureml/cr/j/3ae789d2668641d0beb32b232a29395c/exe/wd/docker_utilities.py", line 23, in docker_execute_function
    return killable_subprocess.check_call(command_args, *popen_args,
  File "/mnt/azureml/cr/j/3ae789d2668641d0beb32b232a29395c/exe/wd/killable_subprocess.py", line 261, in check_call
    raise subprocess.CalledProcessError(process.returncode, cmd)
subprocess.CalledProcessError: Command '['docker', 'build', '-f', 'azureml-environment-setup/Dockerfile', '.', '-t', 'gmatheus01rcrmlw.azurecr.io/azureml/azureml_72d9e2d0364abe0ef860b2a28faeafa4', '-t', 'gmatheus01rcrmlw.azurecr.io/azureml/azureml_72d9e2d0364abe0ef860b2a28faeafa4:1']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "script.py", line 162, in <module>
    docker_utilities._docker_build_or_error(
  File "/mnt/azureml/cr/j/3ae789d2668641d0beb32b232a29395c/exe/wd/docker_utilities.py", line 156, in _docker_build_or_error
    _write_error_and_exit(error_msg, error_file_path=error_file_path)
  File "/mnt/azureml/cr/j/3ae789d2668641d0beb32b232a29395c/exe/wd/docker_utilities.py", line 217, in _write_error_and_exit
    sys.exit(1)
SystemExit: 1

I don't know if that is the problem, but I was getting error of pyarrow version:

User's image

But after I updated my environment, I got the error I shared above. But, the interesting thing is that I am not using my custom environment in this batch deployment because we don't have a scoring script, so we wanted to use the auto-generated scoring script instead.

My failed batch:

User's image

By the way, this job was submitted by ADF using REST API.

Azure Machine Learning
0 comments No comments

1 answer

Sort by: Most helpful
  1. Aryan Parashar 3,690 Reputation points Microsoft External Staff Moderator
    2025-10-23T11:55:20.4666667+00:00

    Hi Guilherme,

    When you submitted the job through ADF, the pipeline attempted to build the environment defined in your YAML file inside a Docker container within the Azure ML workspace.

    All dependencies specified in your YAML must be compatible with the base Docker image and the Python version being used during the build.

    Since your previous model worked fine, try running this new model locally with the YAML dependencies before submitting it through ADF.

    Additionally you can try one of the following ways:

    1. Use a predefined (curated) environment available in the Azure ML workspace.
    2. Submit the pipeline job using the Azure ML Python SDK instead of the REST API.
    3. Fix your YAML file and adjust the structure of the job submission through ADF (REST API) to ensure that all dependencies are compatible.

    Thankyou for reaching out to The Microsoft QNA Portal.

    Was this answer helpful?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.