Troubleshooting with a local model deployment

Try a local model deployment as a first step in troubleshooting deployment to Azure Container Instances (ACI) or Azure Kubernetes Service (AKS). Using a local web service makes it easier to spot and fix common Azure Machine Learning Docker web service deployment errors.


  • An Azure subscription. Try the free or paid version of Azure Machine Learning.
  • Option A (Recommended) - Debug locally on Azure Machine Learning Compute Instance
  • Option B - Debug locally on your compute
  • Option C - Enable local debugging with Azure Machine Learning inference HTTP server.
    • The Azure Machine Learning inference HTTP server (preview) is a Python package that allows you to easily validate your entry script ( in a local development environment. If there's a problem with the scoring script, the server will return an error. It will also return the location where the error occurred.
    • The server can also be used when creating validation gates in a continuous integration and deployment pipeline. For example, start the server with thee candidate script and run the test suite against the local endpoint.

Azure Machine learning inference HTTP server

The local inference server allows you to quickly debug your entry script ( In case the underlying score script has a bug, the server will fail to initialize or serve the model. Instead, it will throw an exception & the location where the issues occurred. Learn more about Azure Machine Learning inference HTTP Server

  1. Install the azureml-inference-server-http package from the pypi feed:

    python -m pip install azureml-inference-server-http
  2. Start the server and set as the entry script:

    azmlinfsrv --entry_script
  3. Send a scoring request to the server using curl:

    curl -p


Learn frequently asked questions about Azure machine learning Inference HTTP server.

Debug locally

You can find a sample local deployment notebook in the MachineLearningNotebooks repo to explore a runnable example.


Local web service deployments are not supported for production scenarios.

To deploy locally, modify your code to use LocalWebservice.deploy_configuration() to create a deployment configuration. Then use Model.deploy() to deploy the service. The following example deploys a model (contained in the model variable) as a local web service:

APPLIES TO: Python SDK azureml v1

from azureml.core.environment import Environment
from azureml.core.model import InferenceConfig, Model
from azureml.core.webservice import LocalWebservice

# Create inference configuration based on the environment definition and the entry script
myenv = Environment.from_conda_specification(name="env", file_path="myenv.yml")
inference_config = InferenceConfig(entry_script="", environment=myenv)
# Create a local deployment, using port 8890 for the web service endpoint
deployment_config = LocalWebservice.deploy_configuration(port=8890)
# Deploy the service
service = Model.deploy(
    ws, "mymodel", [model], inference_config, deployment_config)
# Wait for the deployment to complete
# Display the port that the web service is available on

If you are defining your own conda specification YAML, list azureml-defaults version >= 1.0.45 as a pip dependency. This package is needed to host the model as a web service.

At this point, you can work with the service as normal. The following code demonstrates sending data to the service:

import json

test_sample = json.dumps({'data': [
    [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]

test_sample = bytes(test_sample, encoding='utf8')

prediction =

For more information on customizing your Python environment, see Create and manage environments for training and deployment.

Update the service

During local testing, you may need to update the file to add logging or attempt to resolve any problems that you've discovered. To reload changes to the file, use reload(). For example, the following code reloads the script for the service, and then sends data to it. The data is scored using the updated file:


The reload method is only available for local deployments. For information on updating a deployment to another compute target, see how to update your webservice.



The script is reloaded from the location specified by the InferenceConfig object used by the service.

To change the model, Conda dependencies, or deployment configuration, use update(). The following example updates the model used by the service:

service.update([different_model], inference_config, deployment_config)

Delete the service

To delete the service, use delete().

Inspect the Docker log

You can print out detailed Docker engine log messages from the service object. You can view the log for ACI, AKS, and Local deployments. The following example demonstrates how to print the logs.

# if you already have the service object handy

# if you only know the name of the service (note there might be multiple services with the same name but different version number)

If you see the line Booting worker with pid: <pid> occurring multiple times in the logs, it means, there isn't enough memory to start the worker. You can address the error by increasing the value of memory_gb in deployment_config

Next steps

Learn more about deployment: