Hi @Ramr-msft thanks for the time.
So I have a notebook that I use for deploying. The steps are the following:
First I connect to the Workspace and select a registered model and a registered env.
ws = Workspace.from_config(path="./config.json")
model = Model(ws, 'Model_Name')
env = Environment.get(workspace=ws, name="condaenv")
Then I create a inference config and deploy config with this cell:
inference_config = InferenceConfig(
environment=env,
source_directory="../lib",
entry_script="azureml/score.py",
)
deploy_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=4)
My score.py has the required two methods init()
and run()
plus a couple of methods to postprocess the inferred data:
def init():
global model
model = MyModel.load(os.getenv('AZUREML_MODEL_DIR'))
def run(request):
predictions = model.predict(request)
return postprocess(predictons)
The actual deployment is then done by running the next cell:
service = Model.deploy(
ws,
"model-service",
[model],
inference_config,
deploy_config,
overwrite=True,
show_output=True,
)
service.wait_for_deployment(show_output=True)
So then AzureML work his magic stuff until then it runs the init method in score.py.
Problem starts here, since the loading of all the sub-models takes more than 300s and the workers timeout and are killed, then a new one is spawned but then the cycle starts again.
In other project (which did not involve using AzureML endpoints) I had a more direct access to gunicorn configuration and I could change the timeout or set the loading of the module to complete before workers were spawn.
Is something similar possible with AzureML endpoints?