Model deployment into Production Strategies

Article
03/06/2024

Our primary deployment artifact in this study is a model (a binary file or a set of files in general) and related inference (scoring) code. We make the following assumptions:

A model has been trained in Azure Machine Learning service and is available in the Model Repository in the development environment.
Several models may need to be used in the inference pipeline. Therefore, no limits are placed on execution time for the inference workload.
The Production environment will likely be different from the development environment. Also, some services may not be available in the development environment.

Different solutions are often used for training and hosting AI models. For example, models trained on the Azure ML may not be run on Azure ML for inferencing.

The following table summarizes options that can work for an Azure ML-managed deployment approach and for a custom deployment approach. The table below summarizes this comparison: (Click title link to learn more about each type of scoring.)

All our inferencing workloads can be divided into three separate scoring groups:

Scoring type	Managed by Azure ML	Custom deployment
Real-time scoring	Azure Kubernetes Service (AKS)/Arc Kubernetes, Azure ML Online Endpoints	Azure Functions, Azure Apps, Azure Container Instances (ACI), Unmanaged Kubernetes/Azure Container Apps/IoT Edge
Near real-time scoring	N/A	Azure Durable Functions, AKS with Keda/a Queue service/Triton/Azure Function runtime
Batch scoring	Azure ML Pipelines, Batch Scoring Endpoint	Batch Scoring using Databricks More options like KubeFlow or even Durable Azure Functions are possible, but they are not common for batch scoring with more limitations and much complexity

Deployment options

Different options are available that have their own advantages. (See scoring links above.)

Production vs. development subscriptions

It is a common situation that development and training take place in one subscription, but deployment of the inferencing service into production is taking place in another subscription. NOTE: Batch Online Endpoint and Online Endpoints cannot be deployed outside the subscription where Azure ML is located. Thus, if you want to use Online Endpoints, you need to deploy a separate instance of Azure ML Workspace, copy your model into this workspace during the deployment and execute the deployment from there. A separate Azure ML Workspace is required for Azure ML Pipelines as well.

Deployment summary

	Batch Inferencing	Near Real-time	Real-time
Azure ML Pipelines	Recommended approach for any workload	NA	NA
Batch online endpoints	Single step inferencing workloads	NA	NA
Databricks	Custom Workloads integrated with MLFlow and other Spark features	NA	NA
Managed AKS	NA	NA	Effective way when customer wants to manage and control infrastructure. AKS can be in a different subscription
Azure Container Instances	NA	NA	Efficient way to deploy approved, tested, and versioned images/environments regardless of trained and deployed subscriptions
Online endpoints	NA	NA	Recommended way for real-time scoring. Infrastructure managed by Azure ML. Simple deployment process with scaling abilities based on Azure ML Compute
Azure Functions	NA	NA	Simple workloads and small models, but Azure ML is not required
Azure Durable Functions	NA	CPU-based workloads, small models	NA
Unmanaged AKS with KEDA	NA	Custom workloads with any complexity, but deep tech knowledge is required	Can be used for custom images when Azure ML is not available

For more information

More generalized content outlining different Azure compute options available can be found here:

Share via

Model deployment into Production Strategies

Deployment options

Production vs. development subscriptions

Deployment summary

For more information

Feedback

Feedback

Additional resources

Share via

Model deployment into Production Strategies

Technologies-related matrix

Deployment options

Production vs. development subscriptions

Deployment summary

For more information

Feedback

Feedback

Additional resources

Model deployment into Production Strategies