Best practice for model deployment (Real-time Endpoints vs Compute Inference Cluster)

Question

Hello community!
Reaching out for some help here.

ManagedOnlineDeployment vs KubernetesOnlineDeployment

Goal:

Host a large number of distinct models on Azure ML.

Description:

After throughout investigation, I found out that there are two ways to host a pre-trained real-time model (i.e., run inference) on Azure ML.

Real-time Endpoints - Managed Online Deployment
Compute Inference cluster - kubernetes-online-endpoints
The differences between the two options are detailed here.
I want to host a large number of distinct models (i.e., endpoints) while having the best price/performance/ease-of-deployment ratio.

Details:

What I tried
I have 4 running VMs as a result of my creation of 4 real-time endpoints. Those endpoints use Curated Environments that are provided by Microsoft.

Issues
1. When I want to create a custom environment out of a docker file and then use it as a base image for a certain endpoint, it is a long process:
  Build Image > Push Image to CR > Create Custom Environment in AzureML > Create and Deploy Endpoint
  If something goes wrong, it only shows when I finish the whole pipeline. It just doesn't feel like the correct way of deploying a model.
  This process is needed when I cannot use one of the curated environments because I need some dependency that cannot be imported using the conda.yml file
  For example:

RUN apt-get update -y && apt-get install build-essential cmake pkg-config -y
RUN python setup.py build_ext --inplace

Although I'm using 1 instance per endpoint (Instance count = 1), each endpoint creates its dedicated VM which will cost me a lot in the long run (i.e., when I have lots of endpoints), now it is costing me around 20$ per day.

Note: Each endpoint has a distinct set of dependencies/versions...

Questions
1- Am I following the best practice? Or do I need to drastically change my deployment strategy (Move from ManagedOnlineDeployment to KubernetesOnlineDeployment or even another option that I don't know of)?
2- Is there a way to host all the endpoints on a single VM? Rather than creating a VM for each endpoint. To make it affordable.
3- Is there a way to host the endpoints and get charged per transaction?

General recommendations and clarification questions are more than welcome.

Thank you!

Answer

@Ammar Mohanna If you are using these deployments for testing scenarios then you can deploy them as a web service and choose Azure container instances for deployment. This is only available for certain frameworks and if you see an option for your model when you click on Deploy button to deploy as a web service you can follow the steps to provide the required details and then deploy them to ACI. Azure Container Instanceis used for testing or development. Use ACI for low-scale CPU-based workloads that requireless than 48 GB of RAM.

I do not think using a single VM under managed online endpoints is available. You can try the ACI option which is cheaper and easier to use for simple deployments.

The pricing is based on compute used and there is no Azure Machine Learning surcharge. The classic version of the Azure Machine Learning workspace used to offer per transaction pricing but this is not applicable to the current workspaces.

I hope this helps!!

Best practice for model deployment (Real-time Endpoints vs Compute Inference Cluster)

1 answer