How do I fix a persistent Image Build Failure in Azure Machine Learning Batch Endpoint

Thomas McGuckin 0 Reputation points
2024-10-05T14:03:49.1366667+00:00

I'm encountering a persistent error when trying to run a batch inference job on my Azure Machine Learning batch endpoint. The job fails during the image build process with the following error message in the 20_image_build_log.txt file:

2024-10-03T12:03:35: Pip subprocess error:

2024-10-03T12:03:35: /usr/bin/bash: /azureml-envs/azureml-automl-dnn-gpu/lib/libtinfo.so.6: no version information available (required by /usr/bin/bash)

2024-10-03T12:03:35: WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='automlresources-prod.azureedge.net', port=443): Read timed out. (read timeout=15)")': /spacy-tokenizer-for-bilstm/en_core_web_sm-3.4.1.tar.gz

2024-10-03T12:03:35: ERROR: Ignored the following versions that require a different python version: 0.16.0a1 Requires-Python >=3.10

AND

2024-10-03T12:03:35: ERROR: Could not find a version that satisfies the requirement tzdata==2024a (from versions: 1.2016.5a1, 1.2016.6, 1.2016.7, 1.2016.8, 2020.1rc0, 2020.1rc1, 2020.1, 2020.2, 2020.3, 2020.4, 2020.5, 2021.1, 2021.2, 2021.2.post0, 2021.3, 2021.4, 2021.5, 2022.1, 2022.2, 2022.3, 2022.4, 2022.5, 2022.6, 2022.7, 2023.1, 2023.2, 2023.3, 2023.4, 2024.1, 2024.2)

2024-10-03T12:03:35: ERROR: No matching distribution found for tzdata==2024a

I've tried the following troubleshooting steps: * Reinstalling Git Bash * Updating conda and its packages * Installing the tzdata package (version 2024a) * Clearing conda and pip caches However, the error persists. I'm using Windows 10 with Git Bash and the Azure CLI. My batch endpoint was created using Azure AutoML. Has anyone else encountered this issue or have any suggestions for resolving it? Any help would be greatly appreciated!

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,243 questions
0 comments No comments
{count} votes

3 answers

Sort by: Most helpful
  1. Amira Bedhiafi 31,391 Reputation points
    2024-10-06T12:26:03.1033333+00:00

    It looks like you're encountering an issue with the image build process for your Azure Machine Learning batch endpoint due to dependencies, particularly with tzdata==2024a and connection timeouts for packages like spacy-tokenizer-for-bilstm. Here are a few potential solutions you can try to resolve this:

    1. Specify Compatible Package Versions

    The error message indicates that the specific version of tzdata==2024a is causing issues. You can try using a different compatible version of the tzdata package by specifying it manually in your environment's requirements.txt or conda.yml file.

    Example:

    
       tzdata==2023.4
    
    

    2. Upgrade Python Version

    Since the error message mentions that some versions of packages require a different Python version (specifically Python >= 3.10), it might be worthwhile to check the Python version you're using in the environment. You can either upgrade the Python version in your environment or adjust the version of dependencies to match your current Python setup.

    You can define the Python version in the conda.yml file:

    
       dependencies:
    
         - python=3.9
    
    

    3. Timeout Fix: Increase the Pip Timeout

    The connection timeout error (ReadTimeoutError) can be addressed by increasing the pip timeout or retry settings. You can add the following to your pip configuration:

    
       pip install --default-timeout=100
    
    

    Alternatively, increase the timeout value by setting it directly in your environment creation command.

    4. Review Proxy and Network Settings

    The timeout might also be related to network or proxy settings. Ensure that your environment can access external resources like automlresources-prod.azureedge.net. If you're behind a corporate firewall, you might need to configure the proxy settings for both conda and pip to allow outgoing HTTPS connections.

    5. Rebuild the Environment

    You can try manually building the environment locally to isolate issues with dependencies:

    
       conda env create -f environment.yml
    
    

    Once the environment is built successfully, try running the image build again in Azure ML.

    6. Clear Cache & Reinstall Packages

    You mentioned clearing the cache, but double-check that both pip and conda caches are thoroughly cleared, as lingering cache issues can sometimes cause persistent problems:

    
       pip cache purge
    
       conda clean --all
    
    

    7. Check AutoML Dependencies

    Since you're using AutoML, make sure your AutoML environment is up to date. Sometimes, AutoML-specific dependencies may cause compatibility issues, and updating them can help:

    
       az ml environment list
    
       az ml environment update --name <your_env_name> --version <version>
    
    

    If none of these steps resolve the issue, it could be worth trying to create a custom Docker image with the necessary dependencies pre-installed, then using that image for your Azure ML batch job. This would allow you to have more control over the build process.

    Let me know if you need more specific help with any of these steps!

    1 person found this answer helpful.
    0 comments No comments

  2. Thomas McGuckin 0 Reputation points
    2024-10-06T16:07:27.8966667+00:00

    The answer requires detailed programming experience, for example, modifying an environment file This whole Azure autoML was sold as a no or low code platform, going into Iinux environment for me is a black hole


  3. Sina Salam 19,136 Reputation points
    2024-10-07T19:55:41.36+00:00

    Hello Thomas McGuckin,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that you are having Image Build Failure in Azure Machine Learning Batch Endpoint.

    If you have cleared all existing cached environments and rebuild your environment from scratch as you said build your Docker image separately to ensure all dependencies are correctly installed. This way, the image only needs to be pulled during deployment.

       docker build -t my-azureml-image 
       docker push my-azureml-image
    

    Then, follow what @Amira Bedhiafi recommended above.

    After, if the issue persists contact Azure Support and ensure you are using paid subscription plan.

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.