Video ingestion and object detection on the edge and in the cloud

Azure Stack Edge
Azure Kubernetes Service (AKS)
Azure SQL Edge
Azure Container Registry

This article describes how to use a mobile robot with a live streaming camera to implement various use cases. The solution implements a system that runs locally on Azure Stack Edge to ingest and process the video stream and Azure AI services that perform object detection.

Architecture

Diagram that shows an architecture for video ingestion and object detection.

Download a Visio file of this architecture.

Workflow

This workflow describes how the system processes the incoming data:

  1. A camera that's installed on the robot streams video in real time by using Real Time Streaming Protocol (RTSP).

  2. A container in the Kubernetes cluster on Azure Stack Edge reads the incoming stream and splits video into separate images. An open-source software tool called FFmpeg ingests and processes the video stream.

  3. Images are stored in the local Azure Stack Edge storage account.

  4. Each time a new key frame is saved in the storage account, an AI Vision container picks it up. For information about the separation of logic into multiple containers, see Scenario details.

  5. When it loads a key frame from the storage container, the AI Vision container sends it to Azure AI services in the cloud. This architecture uses Azure AI Vision, which enables object detection via image analysis.

  6. The results of image analysis (detected objects and a confidence rating) are sent to the anomaly detection container.

  7. The anomaly detection container stores the results of image analysis and anomaly detection in the local Azure SQL Database instance of Azure Stack Edge for future reference. Using a local instance of the database improves access time, which helps to minimize delays in data access.

  8. Data processing is run to detect any anomalies in the incoming real-time video stream. If anomalies are detected, a front-end UI shows an alert.

Components

  • Azure Stack Edge is an Azure managed device that brings the compute, storage, and intelligence of Azure to the edge. This architecture uses it to host Azure services on-premises, close to the location where anomaly detection occurs, which reduces latency.

  • Azure Kubernetes Service (AKS) on Azure Stack Edge. Azure Kubernetes Service (AKS) is a managed Kubernetes service that you can use to deploy and manage containerized applications. In this architecture, we're using a version of AKS that runs on Azure Stack Edge device to manage containers responsible for system's logic.

  • Azure Arc is a bridge that extends Azure services to the edge. By utilizing Azure Arc in this architecture, we're able to control edge services through the cloud portal.

  • Azure AI Vision is a unified service that offers computer vision capabilities. In this architecture, the image analysis feature is used to detect objects in key frames of the video stream.

  • Azure Blob Storage is a Microsoft object storage solution for the cloud. In this architecture, it's used to store images of key frames that are extracted from the video stream.

  • Azure SQL Edge is a small-footprint, edge-optimized SQL engine with built-in AI. In this architecture, we specifically went for the edge version of SQL engine to store image analysis metadata, keeping it close to the service that consumes and processes.

  • Azure Container Registry is a registry of Docker and Open Container Initiative (OCI) images, with support for all OCI artifacts. In this architecture, the registry stores Docker container images for anomaly detection and AI vision containers.

  • Azure Key Vault is a service to provide secure key management in the cloud. In this architecture, it's used to store secrets and keys to allow system's logic to interact with external services where managed identity is not available.

  • Azure Monitor is a comprehensive monitoring solution for collecting, analyzing, and responding to monitoring data from your cloud and on-premises environments. In this architecture, this service is the primary observability platform for the workload.

Scenario details

This architecture demonstrates a system that processes a real-time video stream, compares the extracted real-time data with a set of reference data, and makes decisions based on the results. For example, it could be used to provide scheduled inspections of a fenced perimeter around a secured location.

The architecture uses Azure Stack Edge to ensure that the most resource-intensive processes are performed on-premises, close to the source of the video. This design significantly improves the response time of the system, which is important when an immediate response to an anomaly is critical.

Because the parts of the system are deployed as independent containers in a Kubernetes cluster, you can scale only the required subsystems according to demand. For example, if you increase the number of cameras for the video feed, you can scale the container that's responsible for video ingestion and processing to handle the demand but keep the rest of the cluster at the original level.

Offloading the object detection functionality to Azure AI services significantly reduces the expertise that you need to deploy this architecture. Unless your requirements for object detection are highly specialized, the out-of-the-box approach you get from the Image Analysis service is sufficient and doesn't require knowledge of machine learning.

Potential use cases

  • Monitoring the security of a perimeter

  • Detecting an unsafe working environment in a factory

  • Detecting anomalies in an automated assembly line

  • Detecting a lack of de-icing fluid on aircraft

Considerations

These considerations implement the pillars of the Azure Well-Architected Framework, which is a set of guiding tenets that you can use to improve the quality of a workload. For more information, see Microsoft Azure Well-Architected Framework.

Reliability

Reliability ensures your application can meet the commitments you make to your customers. For more information, see Overview of the reliability pillar.

One of the biggest advantages of using Azure Stack Edge is that you get fully managed components on your on-premises hardware. All fully managed Azure components are automatically resilient at a regional level.

In addition, running the system in a Kubernetes cluster enables you to offload the responsibility for keeping the subsystems healthy to the Kubernetes orchestration system.

Security

Security provides assurances against deliberate attacks and the abuse of your valuable data and systems. For more information, see Overview of the security pillar.

Microsoft Entra managed identities provide security for all components of this architecture. Using managed identities eliminates the need to store secrets in code or configuration files. It simplifies access control, credential management, and role assignment.

Cost optimization

Cost optimization is about reducing unnecessary expenses and improving operational efficiencies. For more information, see Overview of the cost optimization pillar.

To see a pricing example for this scenario, use the Azure pricing calculator. The most expensive components in the scenario are Azure Stack Edge and Azure Kubernetes Service. These services provide capacity for scaling the system to address increased demand in the future.

The cost of using Azure AI services for object detection varies based on how long the system runs. The preceding pricing example is based on a system that produces one image per second and operates for 8 hours per day. One FPS is sufficient for this scenario. However, if your system needs to run for longer periods of time, the cost of using Azure AI services is higher:

Performance efficiency

Performance efficiency is the ability of your workload to scale to meet the demands placed on it by users in an efficient manner. For more information, see Performance efficiency pillar overview.

Because the code is deployed in a Kubernetes cluster, you can take advantage of the benefits of this powerful orchestration system. Because the various subsystems are separated into containers, you can scale only the most demanding parts of the application. At a basic level, with one incoming video feed, the system can contain just one node in a cluster. This design significantly simplifies the initial configuration. As demand for data processing grows, you can easily scale the cluster by adding nodes.

Contributors

This article is maintained by Microsoft. It was originally written by the following contributors.

Principal author:

Other contributors:

To see non-public LinkedIn profiles, sign in to LinkedIn.

Next steps

Product documentation:

Guided learning path: