Control IoT devices with a voice assistant app

Bot Service
IoT Hub
Language Understanding
Speech

Solution ideas

This article is a solution idea. If you'd like us to expand the content with more information, such as potential use cases, alternative services, implementation considerations, or pricing guidance, let us know by providing GitHub feedback.

This solution idea describes how to create voice conversational interfaces with internet-of-things (IoT) devices. You can combine Azure Speech Service, Language Understanding Service (LUIS), and the Azure Bot Framework to create natural, human-like interfaces that control IoT devices through Azure IoT Hub.

Potential use cases

  • Control internet-accessible home devices like televisions and refrigerators by voice command.
  • Use voice and natural language to report issues with IoT-connected devices.

Architecture

Diagram showing the architecture of a voice assistant app.

Download a Visio file of this architecture.

Dataflow

  1. Through a voice device, the user asks the voice assistant app to turn on the exterior house lights.

  2. The app connects to the Direct Line Speech Bot Service channel by using the Azure Speech SDK. When keyword recognition confirms certain keywords, Direct Line Speech transcribes the speech to text and sends the text to the Bot Service app hosted on Azure App Service.

  3. The Bot Service connects to the Language Understanding (LUIS) service. LUIS determines the intent of the user's request, TurnOnLight.

  4. LUIS returns the intent to the Bot Service.

  5. If the devices are connected to Azure IoT Hub, Bot Service relays the request through Azure IoT Hub to turn on the exterior lights. Bot Service uses the IoT Hub API to send the command to the devices by using direct methods, updating the device twin's desired property, or sending a cloud to device message.

    If the devices are connected to a third-party IoT installation, Bot Service connects through the third-party API to send a command to the devices.

  6. The Bot Service returns the results of the command to the user by generating a response. The text-to-speech service turns the response into audio and passes it back to the voice assistant app with Direct Line Speech.

  7. Application Insights gathers runtime telemetry for bot performance and usage development.

Components

  • Bot Service provides an integrated environment for bot development.
  • Speech Service offers industry-leading speech capabilities such as speech-to-text, text-to-speech, speech translation, and speaker recognition.
  • Language Understanding Service (LUIS) applies custom machine-learning intelligence to conversational, natural language text to predict meaning and pull out relevant information.
  • IoT Hub is a central cloud message hub for bi-directional communications between IoT applications and devices.
  • Application Insights is a feature of Azure Monitor that provides extensible application performance management and monitoring for live web apps.

Contributors

This article is maintained by Microsoft. It was originally written by the following contributors.

Principal author:

Next steps