AzureML endpoint - gunicorn worker timeout

Question

AzureML endpoint - gunicorn worker timeout

Matteo 31

Hello everyone,
I am trying to deploy a large model using AzureML endpoint.
The model is made up of many sub-models which get loaded by the init() method as described in the documentation here.
The model is trained and then registered in AzureML.

When I deploy the model I can see in the logs that the gunicorn worker resets themselves after 300 seconds, so the whole ensemble of sub-models never have time to completely be loaded.

Is there a way to manually set the timeout of the gunicorn workers?

Ramr-msft 17,826 Reputation points

2021-08-30T11:28:34.157+00:00

@Matteo Thanks for the question. Can you please share the steps that you performed for deploy.
Also please share the settings that are available from command line you can type.: gunicorn -h

5 answers

Your answer

Ramr-msft 17,826 Reputation points

2021-08-30T11:28:34.157+00:00

@Matteo Thanks for the question. Can you please share the steps that you performed for deploy.
Also please share the settings that are available from command line you can type.: gunicorn -h

Answer 1

Hi @Ramr-msft thanks for the time.
So I have a notebook that I use for deploying. The steps are the following:
First I connect to the Workspace and select a registered model and a registered env.

ws = Workspace.from_config(path="./config.json")  
model = Model(ws, 'Model_Name')  
env = Environment.get(workspace=ws, name="condaenv")

Then I create a inference config and deploy config with this cell:

inference_config = InferenceConfig(  
    environment=env,  
    source_directory="../lib",  
    entry_script="azureml/score.py",  
)  
deploy_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=4)

My score.py has the required two methods init() and run() plus a couple of methods to postprocess the inferred data:

def init():  
global model  
model = MyModel.load(os.getenv('AZUREML_MODEL_DIR'))  
  
def run(request):  
predictions = model.predict(request)  
return postprocess(predictons)

The actual deployment is then done by running the next cell:

service = Model.deploy(  
    ws,  
    "model-service",  
    [model],  
    inference_config,  
    deploy_config,   
    overwrite=True,  
    show_output=True,  
)  
  
service.wait_for_deployment(show_output=True)

So then AzureML work his magic stuff until then it runs the init method in score.py.
Problem starts here, since the loading of all the sub-models takes more than 300s and the workers timeout and are killed, then a new one is spawned but then the cycle starts again.

In other project (which did not involve using AzureML endpoints) I had a more direct access to gunicorn configuration and I could change the timeout or set the loading of the module to complete before workers were spawn.

Is something similar possible with AzureML endpoints?

Siva Prasad 0 Reputation points

2023-03-20T17:35:06.3566667+00:00

Did you find any solution to this problem ?

Answer 2

Funzo69 1

@Matteo @Ramr-msft Was there ever a solution found to this problem?

Siva Prasad 0 Reputation points

2023-03-20T17:35:42.24+00:00

do you have any solution to this problem ?
Daksh Sinha 0 Reputation points

2024-04-13T01:42:33.56+00:00

@Ramr-msft is there any update on this? I'm running into a similar issue

Answer 3

Daksh Sinha 0

I found a workaround for this. Package the model before deployment.

Answer 4

Tommaso Berritto 0

Anyone found a solution? I have a similar iussue

Answer 5

Clark Shi (Allegis Group Services) 0 Microsoft External Staff

I have met the same problem, when it takes more than 5 minutes in init(), the gunicorn will terminate the process and the deployment cannot be done.

Any idea to configure the gunicorn timeout in the process of endpoint deployment?

Share via

AzureML endpoint - gunicorn worker timeout

5 answers

Your answer