Troubleshooting prebuilt docker images for inference

Article
08/28/2024

Learn how to troubleshoot problems you may see when using prebuilt docker images for inference with Azure Machine Learning.

Important

Using Python package extensibility for prebuilt Docker images with Azure Machine Learning is currently in preview. Preview functionality is provided "as-is", with no guarantee of support or service level agreement. For more information, see the Supplemental terms of use for Microsoft Azure previews.

Model deployment failed

If model deployment fails, you won't see logs in Azure Machine Learning studio and service.get_logs() will return None. If there is a problem in the init() function of score.py, service.get_logs() will return logs for the same.

So you'll need to run the container locally using one of the commands shown below and replace <MCR-path> with an image path. For a list of the images and paths, see Prebuilt Docker images for inference.

Mounting extensibility solution

Go to the directory containing score.py and run:

docker run -it -v $(pwd):/var/azureml-app -e AZUREML_EXTRA_PYTHON_LIB_PATH="myenv/lib/python3.7/site-packages" <mcr-path>

requirements.txt extensibility solution

Go to the directory containing score.py and run:

docker run -it -v $(pwd):/var/azureml-app -e AZUREML_EXTRA_REQUIREMENTS_TXT="requirements.txt" <mcr-path>

Enable local debugging

The local inference server allows you to quickly debug your entry script (score.py). In case the underlying score script has a bug, the server will fail to initialize or serve the model. Instead, it will throw an exception & the location where the issues occurred. Learn more about Azure Machine Learning inference HTTP Server

For common model deployment issues

For problems when deploying a model from Azure Machine Learning to Azure Container Instances (ACI) or Azure Kubernetes Service (AKS), see Troubleshoot model deployment.

init() or run() failing to write a file

HTTP server in our Prebuilt Docker Images run as non-root user, it may not have access right to all directories. Only write to directories you have access rights to. For example, the /tmp directory in the container.

Extra Python packages not installed

Check if there's a typo in the environment variable or file name.
Check the container log to see if pip install -r <your_requirements.txt> is installed or not.
Check if source directory is set correctly in the inference config constructor.
If installation not found and log says "file not found", check if the file name shown in the log is correct.
If installation started but failed or timed out, try to install the same requirements.txt locally with same Python and pip version in clean environment (that is, no cache directory; pip install --no-cache-dir -r requriements.txt). See if the problem can be reproduced locally.

Mounting solution failed

Check if there's a typo in the environment variable or directory name.
The environment variable must be set to the relative path of the score.py file.
Check if source directory is set correctly in the inference config constructor.
The directory needs to be the "site-packages" directory of the environment.
If score.py still returns ModuleNotFound and the module is supposed to be in the directory mounted, try to print the sys.path in init() or run() to see if any path is missing.

Building an image based on the prebuilt Docker image failed

If failed during apt package installation, check if the user has been set to root before running the apt command? (Make sure switch back to non-root user)

Run doesn't complete on GPU local deployment

GPU base images can't be used for local deployment, unless the local deployment is on an Azure Machine Learning compute instance. GPU base images are supported only on Microsoft Azure Services such as Azure Machine Learning compute clusters and instances, Azure Container Instance (ACI), Azure VMs, or Azure Kubernetes Service (AKS).

Image built based on the prebuilt Docker image can't boot up

The non-root user needs to be dockeruser. Otherwise, the owner of the following directories must be set to the user name you want to use when running the image:
```
/var/runit
/var/log
/var/lib/nginx
/run
/opt/miniconda
/var/azureml-app
```
If the ENTRYPOINT has been changed in the new built image, then the HTTP server and related components need to be loaded by runsvdir /var/runit

Share via