Create an image recognition solution with Azure IoT Edge, Azure AI Custom Vision, and Azure Speech
Build an Azure IoT Edge image recognition solution for a self-checkout scenario. The exported Azure AI Custom Vision classification model runs locally in an IoT Edge module, and the Camera Capture module calls Azure Speech (via a Foundry resource) at runtime to synthesize item labels as audio.
Learning objectives
In this module, you will:
- Describe how Azure IoT Edge, Azure IoT Hub, a camera, and containerized modules support image recognition at the edge
- Use a prebuilt, exported Azure AI Custom Vision classification model packaged in an IoT Edge module
- Create the required Azure resources, including an IoT Hub device identity and a Foundry resource for Speech in Southeast Asia, unless you update the sample code and deployment template to make the Speech region configurable
- Build and deploy the solution to an Azure IoT Edge device by using Visual Studio Code
- Verify module status and monitor events from the edge device and Azure IoT Hub
Note
This lab uses the Azure IoT Edge tools for Visual Studio Code extension because the existing lab assets and steps depend on that workflow. Microsoft documentation states that the Visual Studio Code IoT Edge extension is in maintenance mode; Microsoft currently identifies the Azure IoT Edge Dev Tool CLI (iotedgedev) as the preferred development tool for new IoT Edge module development. Note that the public iotedgedev PyPI release has not been updated since November 2022 (v3.3.7); check release status before adopting it for new production development.
Important
Azure AI Custom Vision is planned for retirement on September 25, 2028. Existing Azure Custom Vision customers are supported until then, but new production plans that create or export Custom Vision models should evaluate migration options or alternatives.
Produced in partnership with the University of Oxford – Ajit Jaokar, Artificial Intelligence: Cloud and Edge Implementations course.
Prerequisites
- An Azure subscription
- Basic knowledge of Azure IoT Edge concepts
- Basic awareness of the Azure AI services used in this lab: Azure Speech text to speech and Azure AI Custom Vision model export
- A Linux computer or IoT Edge device running supported Ubuntu Server 24.04 or 22.04 and targeting amd64 for this lab
- A USB camera and speaker or audio output connected to the Linux computer or IoT Edge device
- Visual Studio Code with the Azure IoT Hub extension, the Azure IoT Edge Tools extension (in maintenance mode), and the Container Tools extension (the replacement for the previous Docker extension) installed for this exercise's Visual Studio Code workflow. VS Code's built-in JSON support is sufficient; the community-maintained JSON Tools extension (
eriklynd.json-tools) is optional if you prefer its formatting features. The Azure Account extension was deprecated in January 2025, but the legacy Azure IoT Hub and Azure IoT Edge VS Code extensions may still install it transitively as a dependency; let Azure Account install if those extensions require it. - Docker-compatible container development tooling, with Moby engine installed as part of the IoT Edge runtime setup