Best practice for model deployment (Real-time Endpoints vs Compute Inference Cluster)

Ammar Mohanna 26 Reputation points
2022-11-24T12:01:23.107+00:00

Hello community!
Reaching out for some help here.

ManagedOnlineDeployment vs KubernetesOnlineDeployment

Goal:

Host a large number of distinct models on Azure ML.

Description:

After throughout investigation, I found out that there are two ways to host a pre-trained real-time model (i.e., run inference) on Azure ML.

Details:

  • What I tried
    I have 4 running VMs as a result of my creation of 4 real-time endpoints. Those endpoints use Curated Environments that are provided by Microsoft.
    263904-epsquotas.png
    263922-realtimeendpoints.png

  • Issues
    1. When I want to create a custom environment out of a docker file and then use it as a base image for a certain endpoint, it is a long process:
      Build Image > Push Image to CR > Create Custom Environment in AzureML > Create and Deploy Endpoint
      If something goes wrong, it only shows when I finish the whole pipeline. It just doesn't feel like the correct way of deploying a model.
      This process is needed when I cannot use one of the curated environments because I need some dependency that cannot be imported using the conda.yml file
      For example:

RUN apt-get update -y && apt-get install build-essential cmake pkg-config -y
RUN python setup.py build_ext --inplace

  1. Although I'm using 1 instance per endpoint (Instance count = 1), each endpoint creates its dedicated VM which will cost me a lot in the long run (i.e., when I have lots of endpoints), now it is costing me around 20$ per day.

  • Note: Each endpoint has a distinct set of dependencies/versions...

  • Questions
    1- Am I following the best practice? Or do I need to drastically change my deployment strategy (Move from ManagedOnlineDeployment to KubernetesOnlineDeployment or even another option that I don't know of)?
    2- Is there a way to host all the endpoints on a single VM? Rather than creating a VM for each endpoint. To make it affordable.
    3- Is there a way to host the endpoints and get charged per transaction?

General recommendations and clarification questions are more than welcome.

Thank you!

Azure Container Registry
Azure Container Registry
An Azure service that provides a registry of Docker and Open Container Initiative images.
384 questions
Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,561 questions
Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
1,853 questions
{count} votes

1 answer

Sort by: Most helpful
  1. romungi-MSFT 41,961 Reputation points Microsoft Employee
    2022-11-25T05:27:57.867+00:00

    @Ammar Mohanna If you are using these deployments for testing scenarios then you can deploy them as a web service and choose Azure container instances for deployment. This is only available for certain frameworks and if you see an option for your model when you click on Deploy button to deploy as a web service you can follow the steps to provide the required details and then deploy them to ACI. Azure Container Instanceis used for testing or development. Use ACI for low-scale CPU-based workloads that requireless than 48 GB of RAM.

    I do not think using a single VM under managed online endpoints is available. You can try the ACI option which is cheaper and easier to use for simple deployments.

    The pricing is based on compute used and there is no Azure Machine Learning surcharge. The classic version of the Azure Machine Learning workspace used to offer per transaction pricing but this is not applicable to the current workspaces.

    I hope this helps!!

    1 person found this answer helpful.
    0 comments No comments