Deploy AI and machine learning computing on-premises and to the edge

Azure Container Registry
Azure IoT Edge
Azure Machine Learning
Azure Stack Edge

This reference architecture illustrates how to use Azure Stack Edge to extend rapid machine learning inference from the cloud to on-premises or edge scenarios. Azure Stack Hub delivers Azure capabilities such as compute, storage, networking, and hardware-accelerated machine learning to any edge location.


Architecture diagram: on-premises data training a model in Azure Machine Learning, with model deployed back to the edge for inference.

Download a Visio file of this architecture.


The architecture consists of the following steps:

  • Azure Machine Learning. Machine Learning lets you build, train, deploy, and manage machine learning models in a cloud-based environment. These models can then deploy to Azure services, including (but not limited to) Azure Container Instances, Azure Kubernetes Service (AKS), and Azure Functions.
  • Azure Container Registry. Container Registry is a service that creates and manages the Docker Registry. Container Registry builds, stores, and manages Docker container images and can store containerized machine learning models.
  • Azure Stack Edge. Azure Stack Edge is an edge computing device that's designed for machine learning inference at the edge. Data is preprocessed at the edge before transfer to Azure. Azure Stack Edge includes compute acceleration hardware that's designed to improve performance of AI inference at the edge.
  • Local data. Local data references any data that's used in the training of the machine learning model. The data can be in any local storage solution, including Azure Arc deployments.


Scenario details

Potential use cases

This solution is ideal for the telecommunications industry. Typical uses for extending inference include when you need to:

  • Run local, rapid machine learning inference against data as it's ingested and you have a significant on-premises hardware footprint.
  • Create long-term research solutions where existing on-premises data is cleaned and used to generate a model. The model is then used both on-premises and in the cloud; it's retrained regularly as new data arrives.
  • Build software applications that need to make inferences about users, both at a physical location and online.


Ingesting, transforming, and transferring data stored locally

Azure Stack Edge can transform data sourced from local storage before transferring that data to Azure. This transformation is done by an Azure IoT Edge device that's deployed on the Azure Stack Edge device. These IoT Edge devices are associated with an Azure IoT Hub resource on the Azure cloud platform.

Each IoT Edge module is a Docker container that does a specific task in an ingest, transform, and transfer workflow. For example, an IoT Edge module can collect data from an Azure Stack Edge local share and transform the data into a format that's ready for machine learning. Then, the module transfers the transformed data to an Azure Stack Edge cloud share. You can add custom or built-in modules to your IoT Edge device or develop custom IoT Edge modules..


IoT Edge modules are registered as Docker container images in Container Registry.

In the Azure Stack Edge resource on the Azure cloud platform, the cloud share is backed by an Azure Blob storage account resource. All data in the cloud share will automatically upload to the associated storage account. You can verify the data transformation and transfer by either mounting the local or cloud share, or by traversing the Azure Storage account.

Training and deploying a model

After preparing and storing data in Blob storage, you can create a Machine Learning dataset that connects to Azure Storage. A dataset represents a single copy of your data in storage that's directly referenced by Machine Learning.

You can use the Machine Learning command-line interface (CLI), the R SDK, the Python SDK, designer, or Visual Studio Code to build the scripts that are required to train your model.

After training and readying the model to deploy, you can deploy it to various Azure services, including but not limited to:


For this reference architecture, the model deploys to Azure Stack Edge to make the model available for inference on-premises. The model also deploys to Container Registry to ensure that the model is available for inference across the widest variety of Azure services.

Inference with a newly deployed model

Azure Stack Edge can quickly run machine learning models locally against data on-premises by using its built-in compute acceleration hardware. This computation occurs entirely at the edge. The result is rapid insights from data by using hardware that's closer to the data source than a public cloud region.

Additionally, Azure Stack Edge continues to transfer data to Machine Learning for continuous retraining and improvement by using a machine learning pipeline that's associated with the model that's already running against data stored locally.


These considerations implement the pillars of the Azure Well-Architected Framework, which is a set of guiding tenets that can be used to improve the quality of a workload. For more information, see Microsoft Azure Well-Architected Framework.


  • Consider placing your Azure Stack Edge resource in the same Azure region as other Azure services that will access it. To optimize upload performance, consider placing your Azure Blob storage account in the region where your appliance has the best network connection.
  • Consider Azure ExpressRoute for a stable, redundant connection between your device and Azure.


  • Administrators can verify that the data source from local storage has transferred to the Azure Stack Edge resource correctly. They can verify by mounting the Server Message Block (SMB)/Network File System (NFS) file share or connecting to the associated Blob storage account by using Azure Storage Explorer.
  • Use Machine Learning datasets to reference your data in Blob storage while training your model. Referencing storage eliminates the need to embed secrets, data paths, or connection strings in your training scripts.
  • In your Machine Learning workspace, register and track ML models to track differences between your models at different points in time. You can similarly mirror the versioning and tracking metadata in the tags that you use for the Docker container images that deploy to Container Registry.


  • Review the MLOps lifecycle management approach for Machine Learning. For example, use GitHub or Azure Pipelines to create a continuous integration process that automatically trains and retrains a model. Training can be triggered either when new data populates the dataset or a change is made to the training scripts.
  • The Azure Machine Learning workspace will automatically register and manage Docker container images for machine learning models and IoT Edge modules.

Cost optimization

Cost optimization is about looking at ways to reduce unnecessary expenses and improve operational efficiencies. For more information, see Overview of the cost optimization pillar.

Next steps

Product documentation

Microsoft Learn modules: