Training
Module
Add Azure AI services to your mixed reality project - Training
This course explores the use of Azure speech services by integrating to a hololens2 application. You can also deploy your project to a HoloLens.
This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Important
The Mixed Reality Academy tutorials were designed with HoloLens (1st gen), Unity 2017, and Mixed Reality Immersive Headsets in mind. As such, we feel it is important to leave these tutorials in place for developers who are still looking for guidance in developing for those devices. These tutorials will not be updated with the latest toolsets or interactions being used for HoloLens 2 and may not be compatible with newer versions of Unity. They will be maintained to continue working on the supported devices. A new series of tutorials has been posted for HoloLens 2.
Voice input gives us another way to interact with our holograms. Voice commands work in a very natural and easy way. Design your voice commands so that they are:
In MR Basics 101, we used the KeywordRecognizer to build two simple voice commands. In MR Input 212, we'll dive deeper and learn how to:
In this course, we'll revisit Model Explorer, which we built in MR Input 210 and MR Input 211.
Important
The videos embedded in each of the chapters below were recorded using an older version of Unity and the Mixed Reality Toolkit. While the step-by-step instructions are accurate and current, you may see scripts and visuals in the corresponding videos that are out-of-date. The videos remain included for posterity and because the concepts covered still apply.
Course | HoloLens | Immersive headsets |
---|---|---|
MR Input 212: Voice | ✔️ | ✔️ |
Note
If you want to look through the source code before downloading, it's available on GitHub.
When Unity is done, a File Explorer window will appear.
If deploying to HoloLens:
If deploying to an immersive headset:
Note
You might notice some red errors in the Visual Studio Errors panel. It is safe to ignore them. Switch to the Output panel to view actual build progress. Errors in the Output panel will require you to make a fix (most often they are caused by a mistake in a script).
In this chapter, you'll learn about designing voice commands. When creating voice commands:
(If you already built/deployed this project in Visual Studio during set-up, then you can open that instance of VS and click 'Reload All' when prompted).
Note
The Microphone capability must be declared for an app to record from the microphone. This is done for you already in MR Input 212, but keep this in mind for your own projects.
Communicator.cs is responsible for setting the proper button states on the communicator device. This will allow our users to record a message, play it back, and send the message to the astronaut. It will also start and stop an animated wave form, to acknowledge to the user that their voice was heard.
// TODO: 2.a Delete the following two lines:
RecordButton.SetActive(false);
MessageUIRenderer.gameObject.SetActive(false);
In this chapter, we'll use the Dictation Recognizer to create a message for the astronaut. When using the Dictation Recognizer, keep in mind that:
Note
The Microphone capability must be declared for an app to record from the microphone. This is done for you already in MR Input 212, but keep this in mind for your own projects.
We're going to edit MicrophoneManager.cs to use the Dictation Recognizer. This is what we'll add:
Let's get started. Complete all coding exercises for 3.a in MicrophoneManager.cs, or copy and paste the finished code found below:
// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License. See LICENSE in the project root for license information.
using System.Collections;
using System.Text;
using UnityEngine;
using UnityEngine.UI;
using UnityEngine.Windows.Speech;
namespace Academy
{
public class MicrophoneManager : MonoBehaviour
{
[Tooltip("A text area for the recognizer to display the recognized strings.")]
[SerializeField]
private Text dictationDisplay;
private DictationRecognizer dictationRecognizer;
// Use this string to cache the text currently displayed in the text box.
private StringBuilder textSoFar;
// Using an empty string specifies the default microphone.
private static string deviceName = string.Empty;
private int samplingRate;
private const int messageLength = 10;
// Use this to reset the UI once the Microphone is done recording after it was started.
private bool hasRecordingStarted;
void Awake()
{
/* TODO: DEVELOPER CODING EXERCISE 3.a */
// 3.a: Create a new DictationRecognizer and assign it to dictationRecognizer variable.
dictationRecognizer = new DictationRecognizer();
// 3.a: Register for dictationRecognizer.DictationHypothesis and implement DictationHypothesis below
// This event is fired while the user is talking. As the recognizer listens, it provides text of what it's heard so far.
dictationRecognizer.DictationHypothesis += DictationRecognizer_DictationHypothesis;
// 3.a: Register for dictationRecognizer.DictationResult and implement DictationResult below
// This event is fired after the user pauses, typically at the end of a sentence. The full recognized string is returned here.
dictationRecognizer.DictationResult += DictationRecognizer_DictationResult;
// 3.a: Register for dictationRecognizer.DictationComplete and implement DictationComplete below
// This event is fired when the recognizer stops, whether from Stop() being called, a timeout occurring, or some other error.
dictationRecognizer.DictationComplete += DictationRecognizer_DictationComplete;
// 3.a: Register for dictationRecognizer.DictationError and implement DictationError below
// This event is fired when an error occurs.
dictationRecognizer.DictationError += DictationRecognizer_DictationError;
// Query the maximum frequency of the default microphone. Use 'unused' to ignore the minimum frequency.
int unused;
Microphone.GetDeviceCaps(deviceName, out unused, out samplingRate);
// Use this string to cache the text currently displayed in the text box.
textSoFar = new StringBuilder();
// Use this to reset the UI once the Microphone is done recording after it was started.
hasRecordingStarted = false;
}
void Update()
{
// 3.a: Add condition to check if dictationRecognizer.Status is Running
if (hasRecordingStarted && !Microphone.IsRecording(deviceName) && dictationRecognizer.Status == SpeechSystemStatus.Running)
{
// Reset the flag now that we're cleaning up the UI.
hasRecordingStarted = false;
// This acts like pressing the Stop button and sends the message to the Communicator.
// If the microphone stops as a result of timing out, make sure to manually stop the dictation recognizer.
// Look at the StopRecording function.
SendMessage("RecordStop");
}
}
/// <summary>
/// Turns on the dictation recognizer and begins recording audio from the default microphone.
/// </summary>
/// <returns>The audio clip recorded from the microphone.</returns>
public AudioClip StartRecording()
{
// 3.a Shutdown the PhraseRecognitionSystem. This controls the KeywordRecognizers
PhraseRecognitionSystem.Shutdown();
// 3.a: Start dictationRecognizer
dictationRecognizer.Start();
// 3.a Uncomment this line
dictationDisplay.text = "Dictation is starting. It may take time to display your text the first time, but begin speaking now...";
// Set the flag that we've started recording.
hasRecordingStarted = true;
// Start recording from the microphone for 10 seconds.
return Microphone.Start(deviceName, false, messageLength, samplingRate);
}
/// <summary>
/// Ends the recording session.
/// </summary>
public void StopRecording()
{
// 3.a: Check if dictationRecognizer.Status is Running and stop it if so
if (dictationRecognizer.Status == SpeechSystemStatus.Running)
{
dictationRecognizer.Stop();
}
Microphone.End(deviceName);
}
/// <summary>
/// This event is fired while the user is talking. As the recognizer listens, it provides text of what it's heard so far.
/// </summary>
/// <param name="text">The currently hypothesized recognition.</param>
private void DictationRecognizer_DictationHypothesis(string text)
{
// 3.a: Set DictationDisplay text to be textSoFar and new hypothesized text
// We don't want to append to textSoFar yet, because the hypothesis may have changed on the next event
dictationDisplay.text = textSoFar.ToString() + " " + text + "...";
}
/// <summary>
/// This event is fired after the user pauses, typically at the end of a sentence. The full recognized string is returned here.
/// </summary>
/// <param name="text">The text that was heard by the recognizer.</param>
/// <param name="confidence">A representation of how confident (rejected, low, medium, high) the recognizer is of this recognition.</param>
private void DictationRecognizer_DictationResult(string text, ConfidenceLevel confidence)
{
// 3.a: Append textSoFar with latest text
textSoFar.Append(text + ". ");
// 3.a: Set DictationDisplay text to be textSoFar
dictationDisplay.text = textSoFar.ToString();
}
/// <summary>
/// This event is fired when the recognizer stops, whether from Stop() being called, a timeout occurring, or some other error.
/// Typically, this will simply return "Complete". In this case, we check to see if the recognizer timed out.
/// </summary>
/// <param name="cause">An enumerated reason for the session completing.</param>
private void DictationRecognizer_DictationComplete(DictationCompletionCause cause)
{
// If Timeout occurs, the user has been silent for too long.
// With dictation, the default timeout after a recognition is 20 seconds.
// The default timeout with initial silence is 5 seconds.
if (cause == DictationCompletionCause.TimeoutExceeded)
{
Microphone.End(deviceName);
dictationDisplay.text = "Dictation has timed out. Please press the record button again.";
SendMessage("ResetAfterTimeout");
}
}
/// <summary>
/// This event is fired when an error occurs.
/// </summary>
/// <param name="error">The string representation of the error reason.</param>
/// <param name="hresult">The int representation of the hresult.</param>
private void DictationRecognizer_DictationError(string error, int hresult)
{
// 3.a: Set DictationDisplay text to be the error string
dictationDisplay.text = error + "\nHRESULT: " + hresult;
}
/// <summary>
/// The dictation recognizer may not turn off immediately, so this call blocks on
/// the recognizer reporting that it has actually stopped.
/// </summary>
public IEnumerator WaitForDictationToStop()
{
while (dictationRecognizer != null && dictationRecognizer.Status == SpeechSystemStatus.Running)
{
yield return null;
}
}
}
}
Note
The Microphone capability must be declared for an app to record from the microphone. This is done for you already in MR Input 212, but keep this in mind for your own projects.
In our SRGS file, we have three types of rules:
Congratulations! You have now completed MR Input 212: Voice.
Training
Module
Add Azure AI services to your mixed reality project - Training
This course explores the use of Azure speech services by integrating to a hololens2 application. You can also deploy your project to a HoloLens.