@G Cocci Thanks, You can use the AKS recipes for real time inference and alternatively take the ParallelRunStep approach for handling offline batch inferences for many models.
The approaches are captured in the solution accelerator here:
https://github.com/microsoft/solution-accelerator-many-models
Here are the steps to Group all models into a single routing endpoint.
We can now group all the services into a single entry point, so that we don't have to handle each endpoint separately. For that, we'll register the endpoints object as a model, and deploy it as a webservice. This webservice will receive the incoming requests and route them to the appropiate model service, acting as the unique entry point for outside requests.
5.1 Register endpoints dict as an AML model
import joblib
joblib.dump(models_deployed, 'models_deployed.pkl')
dep_model = Model.register(
workspace=ws,
model_path ='models_deployed.pkl',
model_name='deployed_models_info',
tags={'ModelType': '_meta_'},
description='Dictionary of the service endpoint where each model is deployed'
)
5.2 Deploy routing webservice
from azureml.core import Environment
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.runconfig import DEFAULT_CPU_IMAGE
routing_env = Environment(name="many_models_routing_environment")
routing_env_deps = CondaDependencies.create(pip_packages=['azureml-defaults', 'joblib'])
routing_env.python.conda_dependencies = routing_env_deps
routing_infconfig = InferenceConfig(
entry_script='routing_webservice.py',
source_directory='./scripts',
environment=routing_env
)
Reuse deployment config with lower capacity
deployment_config.cpu_cores = 0.1
deployment_config.memory_gb = 0.5
routing_service = Model.deploy(
workspace=ws,
name='routing-manymodels',
models=[dep_model],
inference_config=routing_infconfig,
deployment_config=deployment_config,
deployment_target=deployment_target,
overwrite=True
)
routing_service.wait_for_deployment(show_output=True)
assert routing_service.state == 'Healthy'
print('Routing endpoint deployed with URL: {}'.format(routing_service.scoring_uri))