AzureML endpoint - gunicorn worker timeout

Matteo 26 Reputation points
2021-08-27T13:30:44.527+00:00

Hello everyone,
I am trying to deploy a large model using AzureML endpoint.
The model is made up of many sub-models which get loaded by the init() method as described in the documentation here.
The model is trained and then registered in AzureML.

When I deploy the model I can see in the logs that the gunicorn worker resets themselves after 300 seconds, so the whole ensemble of sub-models never have time to completely be loaded.

Is there a way to manually set the timeout of the gunicorn workers?

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,651 questions
{count} votes

4 answers

Sort by: Most helpful
  1. Matteo 26 Reputation points
    2021-08-31T08:14:45.617+00:00

    Hi @Ramr-msft thanks for the time.
    So I have a notebook that I use for deploying. The steps are the following:
    First I connect to the Workspace and select a registered model and a registered env.

    ws = Workspace.from_config(path="./config.json")  
    model = Model(ws, 'Model_Name')  
    env = Environment.get(workspace=ws, name="condaenv")  
    

    Then I create a inference config and deploy config with this cell:

    inference_config = InferenceConfig(  
        environment=env,  
        source_directory="../lib",  
        entry_script="azureml/score.py",  
    )  
    deploy_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=4)  
    

    My score.py has the required two methods init() and run() plus a couple of methods to postprocess the inferred data:

    def init():  
    global model  
    model = MyModel.load(os.getenv('AZUREML_MODEL_DIR'))  
      
    def run(request):  
    predictions = model.predict(request)  
    return postprocess(predictons)  
    

    The actual deployment is then done by running the next cell:

    service = Model.deploy(  
        ws,  
        "model-service",  
        [model],  
        inference_config,  
        deploy_config,   
        overwrite=True,  
        show_output=True,  
    )  
      
    service.wait_for_deployment(show_output=True)  
    

    So then AzureML work his magic stuff until then it runs the init method in score.py.
    Problem starts here, since the loading of all the sub-models takes more than 300s and the workers timeout and are killed, then a new one is spawned but then the cycle starts again.

    In other project (which did not involve using AzureML endpoints) I had a more direct access to gunicorn configuration and I could change the timeout or set the loading of the module to complete before workers were spawn.

    Is something similar possible with AzureML endpoints?

    2 people found this answer helpful.

  2. Funzo69 1 Reputation point
    2021-10-31T20:35:13.437+00:00

    @Matteo @Ramr-msft Was there ever a solution found to this problem?


  3. Daksh Sinha 0 Reputation points
    2024-04-15T18:25:25.88+00:00

    I found a workaround for this. Package the model before deployment.

    0 comments No comments

  4. Tommaso Berritto 0 Reputation points
    2024-04-23T13:21:07.82+00:00

    Anyone found a solution? I have a similar iussue

    0 comments No comments