Azure Percept Audio device overview

Important

Retirement of Azure Percept DK:

Update 22 February 2023: A firmware update for the Percept DK Vision and Audio accessory components (also known as Vision and Audio SOM) is now available here, and will enable the accessory components to continue functioning beyond the retirement date.

The Azure Percept public preview will be evolving to support new edge device platforms and developer experiences. As part of this evolution the Azure Percept DK and Audio Accessory and associated supporting Azure services for the Percept DK will be retired March 30th, 2023.

Effective March 30th, 2023, the Azure Percept DK and Audio Accessory will no longer be supported by any Azure services including Azure Percept Studio, OS updates, containers updates, view web stream, and Custom Vision integration. Microsoft will no longer provide customer success support and any associated supporting services. For more information, please visit the Retirement Notice Blog Post.

Azure Percept Audio is an accessory device that adds speech AI capabilities to Azure Percept DK. It contains a preconfigured audio processor and a four-microphone linear array, enabling you to use voice commands, keyword spotting, and far field speech with the help of Azure Cognitive Services. It is integrated out-of-the-box with Azure Percept DK, Azure Percept Studio, and other Azure edge management services.

Azure Percept Audio components

Azure Percept Audio contains the following major components:

  • Production-ready Azure Percept Audio device (SoM) with a four-microphone linear array and audio processing via XMOS Codec
  • Developer (interposer) board: 2x buttons, 3x LEDs, Micro USB, and 3.5 mm audio jack
  • Required cables: FPC cable, USB Micro Type-B to USB-A
  • Welcome card
  • Mechanical mounting plate with integrated 80/20 1010 series mount

Compute capabilities ​

Azure Percept Audio passes audio input through the speech stack that runs on the CPU of the Azure Percept DK carrier board in a hybrid edge-cloud manner. Therefore, Azure Percept Audio requires a carrier board with an OS that supports the speech stack in order to perform. ​

The audio processing is done as follows: ​

  • Azure Percept Audio: captures and converts the audio and sends it to the DK and audio jack.

  • Azure Percept DK: the speech stack performs beam forming and echo cancellation and processes the incoming audio to optimize for speech. After processing, it performs keyword spotting.

  • Cloud: processes natural language commands and phrases, keyword verification, and retraining. ​

  • Offline: if the device is offline, it will detect the keyword and capture internet connection status telemetry. An increased false accept rate for keyword spotting may be observed as keyword verification in the cloud cannot be performed. ​

Getting started

Build a no-code prototype

Build a no-code speech solution in Azure Percept Studio using Azure Percept voice assistant templates for hospitality, healthcare, inventory, and automotive scenarios.

Manage your no-code speech solution

Additional technical information