How do I fix a persistent Image Build Failure in Azure Machine Learning Batch Endpoint

Question

How do I fix a persistent Image Build Failure in Azure Machine Learning Batch Endpoint

Thomas McGuckin 0

I'm encountering a persistent error when trying to run a batch inference job on my Azure Machine Learning batch endpoint. The job fails during the image build process with the following error message in the 20_image_build_log.txt file:

2024-10-03T12:03:35: Pip subprocess error:

2024-10-03T12:03:35: /usr/bin/bash: /azureml-envs/azureml-automl-dnn-gpu/lib/libtinfo.so.6: no version information available (required by /usr/bin/bash)

2024-10-03T12:03:35: WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='automlresources-prod.azureedge.net', port=443): Read timed out. (read timeout=15)")': /spacy-tokenizer-for-bilstm/en_core_web_sm-3.4.1.tar.gz

2024-10-03T12:03:35: ERROR: Ignored the following versions that require a different python version: 0.16.0a1 Requires-Python >=3.10

AND

2024-10-03T12:03:35: ERROR: Could not find a version that satisfies the requirement tzdata==2024a (from versions: 1.2016.5a1, 1.2016.6, 1.2016.7, 1.2016.8, 2020.1rc0, 2020.1rc1, 2020.1, 2020.2, 2020.3, 2020.4, 2020.5, 2021.1, 2021.2, 2021.2.post0, 2021.3, 2021.4, 2021.5, 2022.1, 2022.2, 2022.3, 2022.4, 2022.5, 2022.6, 2022.7, 2023.1, 2023.2, 2023.3, 2023.4, 2024.1, 2024.2)

2024-10-03T12:03:35: ERROR: No matching distribution found for tzdata==2024a

I've tried the following troubleshooting steps: * Reinstalling Git Bash * Updating conda and its packages * Installing the tzdata package (version 2024a) * Clearing conda and pip caches However, the error persists. I'm using Windows 10 with Git Bash and the Azure CLI. My batch endpoint was created using Azure AutoML. Has anyone else encountered this issue or have any suggestions for resolving it? Any help would be greatly appreciated!

3 answers

Your answer

Answer 1

It looks like you're encountering an issue with the image build process for your Azure Machine Learning batch endpoint due to dependencies, particularly with tzdata==2024a and connection timeouts for packages like spacy-tokenizer-for-bilstm. Here are a few potential solutions you can try to resolve this:

1. Specify Compatible Package Versions

The error message indicates that the specific version of tzdata==2024a is causing issues. You can try using a different compatible version of the tzdata package by specifying it manually in your environment's requirements.txt or conda.yml file.

Example:


   tzdata==2023.4

2. Upgrade Python Version

Since the error message mentions that some versions of packages require a different Python version (specifically Python >= 3.10), it might be worthwhile to check the Python version you're using in the environment. You can either upgrade the Python version in your environment or adjust the version of dependencies to match your current Python setup.

You can define the Python version in the conda.yml file:


   dependencies:

     - python=3.9

3. Timeout Fix: Increase the Pip Timeout

The connection timeout error (ReadTimeoutError) can be addressed by increasing the pip timeout or retry settings. You can add the following to your pip configuration:


   pip install --default-timeout=100

Alternatively, increase the timeout value by setting it directly in your environment creation command.

4. Review Proxy and Network Settings

The timeout might also be related to network or proxy settings. Ensure that your environment can access external resources like automlresources-prod.azureedge.net. If you're behind a corporate firewall, you might need to configure the proxy settings for both conda and pip to allow outgoing HTTPS connections.

5. Rebuild the Environment

You can try manually building the environment locally to isolate issues with dependencies:


   conda env create -f environment.yml

Once the environment is built successfully, try running the image build again in Azure ML.

6. Clear Cache & Reinstall Packages

You mentioned clearing the cache, but double-check that both pip and conda caches are thoroughly cleared, as lingering cache issues can sometimes cause persistent problems:


   pip cache purge

   conda clean --all

7. Check AutoML Dependencies

Since you're using AutoML, make sure your AutoML environment is up to date. Sometimes, AutoML-specific dependencies may cause compatibility issues, and updating them can help:


   az ml environment list

   az ml environment update --name <your_env_name> --version <version>

If none of these steps resolve the issue, it could be worth trying to create a custom Docker image with the necessary dependencies pre-installed, then using that image for your Azure ML batch job. This would allow you to have more control over the build process.

Let me know if you need more specific help with any of these steps!

Answer 2

Thomas McGuckin 0

The answer requires detailed programming experience, for example, modifying an environment file This whole Azure autoML was sold as a no or low code platform, going into Iinux environment for me is a black hole

romungi-MSFT 48,906 Reputation points Microsoft Employee Moderator

2024-10-07T08:28:40.9766667+00:00

@Thomas McGuckin Are there any firewall or network restrictions placed on the workspace or storage account that is being used?

Answer 3

Hello Thomas McGuckin,

Welcome to the Microsoft Q&A and thank you for posting your questions here.

I understand that you are having Image Build Failure in Azure Machine Learning Batch Endpoint.

If you have cleared all existing cached environments and rebuild your environment from scratch as you said build your Docker image separately to ensure all dependencies are correctly installed. This way, the image only needs to be pulled during deployment.

   docker build -t my-azureml-image 
   docker push my-azureml-image

Then, follow what @Amira Bedhiafi recommended above.

After, if the issue persists contact Azure Support and ensure you are using paid subscription plan.

I hope this is helpful! Do not hesitate to let me know if you have any other questions.

Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

Share via

How do I fix a persistent Image Build Failure in Azure Machine Learning Batch Endpoint

3 answers

1. Specify Compatible Package Versions

2. Upgrade Python Version

3. Timeout Fix: Increase the Pip Timeout

4. Review Proxy and Network Settings

5. Rebuild the Environment

6. Clear Cache & Reinstall Packages

7. Check AutoML Dependencies

Your answer