SpeechRecognizer class

 

The SpeechRecognizer class provides the means to start, stop, and monitor speech recognition in an application. You can bind it to a SpeechRecognizerUx control or create a custom UI to expose its methods and events.

Syntax

public sealed class SpeechRecognizer : IDisposable, ISpeechRecognizerStateControl

The SpeechRecognizer class has the following members.

Constructors

Name

Description

SpeechRecognizer(string, SpeechAuthorizationParameters)

Initializes a new instance of the SpeechRecognizer class.

Methods

Name

Description

RecognizeSpeechToTextAsync()

Starts a speech recognition session, which captures and interprets user speech, and then returns the results as a SpeechRecognitionResult object.

StopListeningAndProcessAudio()

Interrupts the current audio capture and starts analysis on the captured audio data.

RequestCancelOperationl()

Interrupts speech recognition and returns control to the caller. This method can be called at any point in the speech recognition process.

Dispose()

Removes the current SpeechRecognizer instance and all speech data artifacts from memory.

Events

Name

Description

AudioCaptureStateChanged

Raised when the current speech recognition session moves from one state to another.

AudioLevelChanged

Raised when the user changes their speaking volume. Use the SpeechRecognitionAudioLevelChangedEventArgs object associated with this event to get the current audio level.

RecognizerResultReceived

Raised when the SpeechRecognizer identifies a possible interpretation of user speech.

Example

The following code sample creates a complete custom speech UI, using a series of StackPanel objects (XAML) or DIVs (HTML) that are made visible at different times to reflect different phases of the speech recognition process. Before adding the code to your application, you must complete the steps described in How to: Enable a project for the Bing Speech Recognition Control.

<Page
    x:Class="SpeechCustomUi.MainPage"
    xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
    xmlns:local="using:SpeechCustomUi"
    xmlns:d="http://schemas.microsoft.com/expression/blend/2008"
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
    mc:Ignorable="d">

    <!--This application demonstrates a complete custom speech recognition UI-->
    <Grid Background="{StaticResource ApplicationPageBackgroundThemeBrush}">
        <Grid.ColumnDefinitions>
            <ColumnDefinition Width="632*"/>
            <ColumnDefinition Width="51*"/>
        </Grid.ColumnDefinitions>

        <!--Panel to show at application start and after cancel.-->
        <StackPanel x:Name="StartPanel" Visibility="Visible">
            <TextBlock Text="Click the microphone and get ready to say something" 
                       FontSize="50" HorizontalAlignment="Center" 
                       VerticalAlignment="Center" />
            <!--Starts speech recognition, but may not be ready immediately. -->
            <AppBarButton x:Name="SpeakButton" Icon="Microphone" Click="SpeakButton_Click"></AppBarButton>
        </StackPanel>

        <!--Panel to show while initializing the SpeechRecognizer.
            This panel may not be seen if initialization happens quickly.-->
        <StackPanel x:Name="InitPanel" Visibility="Collapsed" >
            <TextBlock Text="Ready, set..." FontSize="50" 
                       HorizontalAlignment="Center" VerticalAlignment="Center" />
        </StackPanel>

        <!--Panel to show while listening for user speech.-->        
        <StackPanel x:Name="ListenPanel" Visibility="Collapsed" >
            <TextBlock Text="Speak!" FontSize="80" 
                       HorizontalAlignment="Center" />
            <!--Shows at different opacity levels depending on speech volume.-->            
            <TextBlock x:Name="VolumeMeter" Text="Volume" FontSize="60" 
                       HorizontalAlignment="Center" Margin="0,80,0,0" />
            <!--Click when done speaking, or wait for app to recognize end of 
                speech.-->
            <AppBarButton x:Name="StopButton" Icon="Stop" 
                          HorizontalAlignment="Center" Margin="0,70, 0, 0" 
                          Click="StopButton_Click">Done
            </AppBarButton>
        </StackPanel>

        <!--Panel to show while interpreting speech input.-->        
        <StackPanel x:Name="ThinkPanel" Visibility="Collapsed" >
            <TextBlock Text="Thinking..." FontSize="60" 
                       HorizontalAlignment="Center" />
            <TextBlock Text="You might have said:" FontSize="40" 
                       HorizontalAlignment="Center" Margin="0,50,0,0" />
            <!--Shows possible text before deciding on final interpretation.
                May flash too quickly to see for easy phrases.-->
            <TextBlock x:Name="IntermediateResults" FontSize="40" 
                       HorizontalAlignment="Center" Margin="0,30,0,0" />
        </StackPanel>

        <!--Panel to show when speech recognition complete.
            May also be shown in case of exceptions.-->
        <StackPanel x:Name="CompletePanel" Visibility="Collapsed" >
            <TextBlock Text="Done." FontSize="60" 
                       HorizontalAlignment="Center" />
            <!--Displays confidence level of final result.-->
            <TextBlock x:Name="ConfidenceText" FontSize="40" 
                       HorizontalAlignment="Center" Margin="0,50,0,0" />
            <!--Displays final result text.-->
            <TextBlock x:Name="FinalResult" FontSize="40" 
                       HorizontalAlignment="Center" Margin="0,30,0,0" />
            <TextBlock x:Name="AlternatesTitle" Text="But you might have said:" 
                       FontSize="40" HorizontalAlignment="Center" 
                       Margin="0,50,0,0" />
            <!--Displays alternate results. Copies selected text to 
                FinalResult.-->
            <ListBox x:Name="AlternatesListBox" HorizontalAlignment="Center" 
                     SelectionChanged="AlternatesListBox_SelectionChanged" />
        </StackPanel>

        <!--Cancel button, to be shown in all states except for application
            start -->
        <AppBarButton x:Name="CancelButton" Icon="Cancel" Click="CancelButton_Click"
                      VerticalAlignment="Bottom" HorizontalAlignment="Center"
                      Visibility="Collapsed">
        </AppBarButton>
    </Grid>
</Page>
<!DOCTYPE html>
<!--This application demonstrates a complete custom speech recognition UI-->
<html>
<head>
    <meta charset="utf-8" />
    <title>SpeechCustomUI_JS</title>

    <!-- WinJS references -->
    <link href="//Microsoft.WinJS.2.0/css/ui-dark.css" rel="stylesheet" />
    <script src="//Microsoft.WinJS.2.0/js/base.js"></script>
    <script src="//Microsoft.WinJS.2.0/js/ui.js"></script>

    <!-- SpeechCustomUI_JS references -->
    <link href="/css/default.css" rel="stylesheet" />
    <script src="/js/default.js"></script>

    <link href="Bing.Speech/css/voiceuicontrol.css" rel="stylesheet" />
    <script src="Bing.Speech/js/voiceuicontrol.js"></script>

    <style>
        body {
            text-align: center;
        }
        .panel {
            display: none;
        }
        .instructionText {
            font-size: x-large;
        }
        .explanatory {
            font-size: large;
        }
        .spokenText {
            font-size: large;
            font-style: italic;
        }
        .bigText {
            font-size: xx-large;
        }
        .biggestText {
            font-size: xx-large;
            font-weight: bold;
        }
        .buttonDiv {
            width: 2em;
            margin: 0 auto;
            font-size: xx-large;
            width: 2em;
        }
        .subTitle {
            font-size: medium;
        }
        .listBox {
            margin: 0 auto;
            display: block;
        }

    </style>
</head>
<body onload="Body_OnLoad();" >

    <!--Panel to show at application start and after cancel.-->  
    <div id="StartPanel" class="panel">
        <p class="instructionText">
            Click the microphone and get ready to say something
        </p>
        <div>
            <!--Starts speech recognition, but may not be ready immediately. -->
            <div id="SpeakButton" onclick="SpeakButton_Click();" class="buttonDiv">
                &#xe1d6;
            </div> 
        </div>
    </div>

    <!--Panel to show while initializing the SpeechRecognizer.
        This panel may not be seen if initialization happens quickly.-->
    <div id="InitPanel" class="panel">
        <p class="instructionText">
            Ready, set...
        </p>
    </div>

    <!--Panel to show while listening for user speech.-->    
        <div id="ListenPanel" class="panel">
        <p class="biggestText">
            Speak!
        </p>
        <!--Shows at different opacity levels depending on speech volume.--> 
        <div id="VolumeMeter" class="bigText" style="opacity:0">
            Volume
        </div>

        <!--Click when done speaking, or wait for app to recognize end of 
            speech.-->
        <div>
            <div id="StopButton" onclick="StopButton_Click();" 
                 class="buttonDiv">
                &#xe15b;
            </div>
            <div class="subTitle">
                Stop
            </div>
        </div>
    </div>

    <!--Panel to show while interpreting speech input.--> 
    <div id="ThinkPanel" class="panel">
        <p class="instructionText">
            Thinking...
        </p>
    </div>

    <!--Panel to show when speech recognition complete.
        May also be shown in case of exceptions.-->
    <div id="CompletePanel" class="panel">
        <p class="instructionText">
            Done.
        </p>
        <br />
        <!--Displays confidence level of final result.-->
        <div id="ConfidenceText" class="explanatory"></div>
        <!--Displays final result text.-->
        <div id="ResultText" class="spokenText"></div>
        <br />
        <!--Displays alternate results. Copies selected text to
        FinalResult.-->
        <div id="AlternatesArea">
            <div class="explanatory">But you might have said:</div>
            <div>
                <select id="AlternatesListBox" class="spokenText listBox" 
                        onchange="AlternatesListBox_SelectionChanged();">
                </select>
            </div>
        </div id="AlternatesArea">
    </div>

    <!--Shows possible text before deciding on final interpretation.
        May flash too quickly to see for easy phrases.-->
    <div id="IntResults" class="panel">
        <div class="instructionText">You might have said...</div>
        <div id="IntermediateResults" class="spokenText"></div>
    </div>

    <!--Cancel button, to be shown in all states except for application
        start -->
    <div style="position: absolute; bottom: 0; width: 100%">
        <div id="CancelButton" onclick="CancelButton_Click();" 
             class="buttonDiv">
            &#xe10a;
            <div class="subTitle">
                Cancel
            </div>
        </div>
    </div>

</body>
</html>

The <AppBarButton> XAML element is only supported in Windows 8.1 and later. If your XAML app will support Windows 8, you must either replace the <AppBarButton> elements with regular Button elements and define your own styles, or do the following additional steps.

To recreate the AppBarButtons

  1. From Solution explorer, expand the Common folder and open StandardStyles.xaml.

  2. The middle portion of the file consists of <Style> elements which have been commented out. These <Style> elements define standard styles for use in Windows Store applications, and are identified by the x:Key attribute.

    Uncomment the style definitions for MicrophoneAppBarButtonStyle, StopAppBarButtonStyle, and ClosePaneAppBarButtonStyle, and then save and close the file.

  3. In MainPage.xaml, replace the AppBarButton elements with the following Button elements.

    <Button x:Name="SpeakButton" Click="SpeakButton_Click" 
            Style="{StaticResource MicrophoneAppBarButtonStyle}" 
            HorizontalAlignment="Center" />
    
    <Button x:Name="StopButton" Click="StopButton_Click"
            Style="{StaticResource StopAppBarButtonStyle}"  
            AutomationProperties.Name="Done"  
            HorizontalAlignment="Center" Margin="0,70, 0, 0" />
    
    <Button x:Name="CancelButton" Visibility="Collapsed" Content="&#xE10A;" 
            Style="{StaticResource ClosePaneAppBarButtonStyle}" 
            AutomationProperties.Name="Cancel" HorizontalAlignment="Center" 
            VerticalAlignment="Bottom" Click="CancelButton_Click" />
    

Example

The following code loads a SpeechRecognizer object and handles its events. It shows or hides the different panels to reflect UI state changes, and then displays the final text in ResultText and alternate text in AlternatesListBox. Fill in your own ClientID and ClientSecret values before building the project.

using System;
using Windows.UI.Xaml;
using Windows.UI.Xaml.Controls;
using Bing.Speech;

namespace SpeechCustomUi
{
    public sealed partial class MainPage : Page
    {
        public MainPage()
        {
            this.InitializeComponent();
            this.Loaded += MainPage_Loaded;
        }

        SpeechRecognizer SR;
        private void MainPage_Loaded(object sender, RoutedEventArgs e)
        {
            // Apply credentials from the Windows Azure Data Marketplace.
            var credentials = new SpeechAuthorizationParameters();
            credentials.ClientId = "<YOUR CLIENT ID>";
            credentials.ClientSecret = "<YOUR CLIENT SECRET>";

            // Initialize the speech recognizer.
            SR = new SpeechRecognizer("en-US", credentials);

            // Add speech recognition event handlers.
            SR.AudioCaptureStateChanged += SR_AudioCaptureStateChanged;
            SR.AudioLevelChanged += SR_AudioLevelChanged;
            SR.RecognizerResultReceived += SR_RecognizerResultReceived;
        }

        void SR_RecognizerResultReceived(SpeechRecognizer sender,
            SpeechRecognitionResultReceivedEventArgs args)
        {
            IntermediateResults.Text = args.Text;
        }

        void SR_AudioLevelChanged(SpeechRecognizer sender,
            SpeechRecognitionAudioLevelChangedEventArgs args)
        {
            var v = args.AudioLevel;
            if (v > 0) VolumeMeter.Opacity = v / 50;
            else VolumeMeter.Opacity = Math.Abs((v - 50) / 100);
        }

        void SR_AudioCaptureStateChanged(SpeechRecognizer sender,
            SpeechRecognitionAudioCaptureStateChangedEventArgs args)
        {
            // Show the panel that corresponds to the current state.
            switch (args.State)
            {
                case SpeechRecognizerAudioCaptureState.Complete:
                    if (uiState == "ListenPanel" || uiState == "ThinkPanel")
                    {
                        SetPanel(CompletePanel);  
                    }
                    break;
                case SpeechRecognizerAudioCaptureState.Initializing:
                    SetPanel(InitPanel);
                    break;
                case SpeechRecognizerAudioCaptureState.Listening:
                    SetPanel(ListenPanel);
                    break;
                case SpeechRecognizerAudioCaptureState.Thinking:
                    SetPanel(ThinkPanel);
                    break;
                default:
                    break;
            }
        }

        string uiState = "";
        private void SetPanel(StackPanel panel)
        {
            // Hide all the panels.
            InitPanel.Visibility = Visibility.Collapsed;
            ListenPanel.Visibility = Visibility.Collapsed;
            ThinkPanel.Visibility = Visibility.Collapsed;
            CompletePanel.Visibility = Visibility.Collapsed;
            StartPanel.Visibility = Visibility.Collapsed;

            // Show the selected panel and the cancel button.
            panel.Visibility = Visibility.Visible;
            CancelButton.Visibility = Visibility.Visible;

            uiState = panel.Name;
        }


        private async void SpeakButton_Click(object sender, RoutedEventArgs e)
        {
            // Always use a try block because RecognizeSpeechToTextAsync
            // depends on a web service.
            try
            {
                // Start speech recognition.
                var result = await SR.RecognizeSpeechToTextAsync();

                // Display the text.
                FinalResult.Text = result.Text;

                // Show the TextConfidence.
                ShowConfidence(result.TextConfidence);

                // Fill a string array with the alternate results.
                var alternates = result.GetAlternates(5);
                if (alternates.Count > 1)
                {
                    string[] s = new string[alternates.Count];
                    for (int i = 1; i < alternates.Count; i++)
                    {
                        s[i] = alternates[i].Text;
                    }

                    // Populate the alternates ListBox with the array.
                    AlternatesListBox.ItemsSource = s;
                    AlternatesTitle.Visibility = Visibility.Visible;
                }
                else
                {
                    AlternatesTitle.Visibility = Visibility.Collapsed;
                }

                //AlternatesListBox.ItemsSource = result.GetAlternates(5);
            }
            catch (Exception ex)
            {
                // If there's an exception, show it in the Complete panel.
                if (ex.GetType() != typeof(OperationCanceledException))
                {
                    FinalResult.Text = string.Format("{0}: {1}",
                                ex.GetType().ToString(), ex.Message);
                    SetPanel(CompletePanel); 
                }
            }
        }

        private void ShowConfidence(SpeechRecognitionConfidence confidence)
        {
            switch (confidence)
            {
                case SpeechRecognitionConfidence.High:
                    ConfidenceText.Text = "I am almost sure you said:";
                    break;
                case SpeechRecognitionConfidence.Medium:
                    ConfidenceText.Text = "I think you said:";
                    break;
                case SpeechRecognitionConfidence.Low:
                    ConfidenceText.Text = "I think you might have said:";
                    break;
                case SpeechRecognitionConfidence.Rejected:
                    ConfidenceText.Text = "I'm sorry, I couldn't understand you."
                    + " Please click the Cancel button and try again.";
                    break;
            }
        }

        private void CancelButton_Click(object sender, RoutedEventArgs e)
        {
            // Cancel the current speech session and return to start.
            SR.RequestCancelOperation();
            SetPanel(StartPanel);
            CancelButton.Visibility = Visibility.Collapsed;
        }

        private void StopButton_Click(object sender, RoutedEventArgs e)
        {
            // Stop listening and move to Thinking state.
            SR.StopListeningAndProcessAudio();
        }

        private void AlternatesListBox_SelectionChanged(object sender, 
            SelectionChangedEventArgs e)
        {
            // Check in case the ListBox is still empty.
            if (null != AlternatesListBox.SelectedItem)
            {
                // Put the selected text in FinalResult and clear ConfidenceText.
                FinalResult.Text = AlternatesListBox.SelectedItem.ToString();
                ConfidenceText.Text = ""; 
            }
        }
    }
}
var speechRecognizer;
function Body_OnLoad() {
    // Show start panel.
    document.getElementById("StartPanel").style.display = "block";

    // Apply credentials from the Windows Azure Data Marketplace.
    var credentials = new Bing.Speech.SpeechAuthorizationParameters();
    credentials.clientId = "<YOUR CLIENT ID>";
    credentials.clientSecret = "<YOUR CLIENT SECRET>";

    // Initialize the speech recognizer.
    speechRecognizer = new Bing.Speech.SpeechRecognizer("en-US", credentials);

    // Add speech recognition event handlers.
    speechRecognizer.onaudiocapturestatechanged = SpeechRecognizer_AudioCaptureStateChanged;
    speechRecognizer.onaudiolevelchanged = SpeechRecognizer_AudioLevelChanged;
    speechRecognizer.onrecognizerresultreceived = SpeechRecognizer_RecognizerResultReceived;
}

var cancelled;
function SpeechRecognizer_AudioCaptureStateChanged(args) {
    // Show the div that corresponds to the current state.
    switch (args.state) {
        case SpeechRecognizerAudioCaptureState.Complete:
            document.getElementById("IntResults").style.display = "none";
            if (!cancelled) SetPanel("CompletePanel");
            break;
        case SpeechRecognizerAudioCaptureState.Initializing:
            SetPanel("InitPanel");
            break;
        case SpeechRecognizerAudioCaptureState.Listening:
            SetPanel("ListenPanel");
            break;
        case SpeechRecognizerAudioCaptureState.Thinking:
            SetPanel("ThinkPanel");
            break;
        default:
            break;
    }
}

function SetPanel(panelId) {
    // Hide all the Panels.
    document.getElementById("InitPanel").style.display = "none";
    document.getElementById("ListenPanel").style.display = "none";
    document.getElementById("ThinkPanel").style.display = "none";
    document.getElementById("CompletePanel").style.display = "none";
    document.getElementById("StartPanel").style.display = "none";

    // Show the selected Div and the cancel button.
    document.getElementById(panelId).style.display = "block";
}

function SpeakButton_Click() {
    // Reset the cancel state.
    this.cancelled = false;
    document.getElementById("CancelButton").style.display = "block";
    document.getElementById("StopButton").style.display = "block";

    // Clear the alternates list.
    document.getElementById("AlternatesListBox").innerHTML = "";
    document.getElementById("AlternatesArea").style.display = "none";

    // Declare a string to hold the result text.
    var s = "";

    // Start speech recognition.
    speechRecognizer.recognizeSpeechToTextAsync()
            .then(
                // Write the result to the string.
                function (result) {

                    /* result.text should return a string, but if the user speaks too quietly
                    or is unclear, result.text will return an error object, so we have 
                    to catch it here to prevent interruption. */
                    if (typeof (result.text) == "string") {
                        s = result.text;

                        // Show text confidence.
                        ShowConfidence(result.textConfidence)

                        // If there are alternate results, put them into AlternatesListBox.
                        var alternates = result.getAlternates(5);
                        if (alternates.length > 1) {
                            for (var i = 0; i < alternates.length; i++) {
                                var opt = document.createElement("option");
                                opt.innerHTML = alternates[i].text;
                                document.getElementById("AlternatesListBox").appendChild(opt);
                                document.getElementById("AlternatesArea").style.display = "block";
                            }
                        }
                    }
                    else {
                        // Handle speech that is too quiet or unclear.
                        s = "I'm sorry. I couldn't understand you."
                    }
                },
                // If there's another error, write the error number and message to the string.
                function (error) {
                    s = "Error: (" + error.number + ") " + error.message;
                }
            )
        .done(
        // Write the string to ResultText.
        function (result) {
            document.getElementById("ResultText").innerHTML = window.toStaticHTML(s);
        }
    );
}

function AlternatesListBox_SelectionChanged(sender, e) {
    // Set ResultText to display the selected alternate.
    var alts = document.getElementById("AlternatesListBox");
    var item = alts.childNodes[alts.selectedIndex];
    document.getElementById("ResultText").innerText = item.textContent;
    document.getElementById("ConfidenceText").style.display = "none";
}

function CancelButton_Click(sender, e) {
    // Set the cancelled flag and hide the cancel button.
    this.cancelled = true;
    document.getElementById("CancelButton").style.display = "none";

    // Cancel the current speech session and return to start.
    speechRecognizer.requestCancelOperation();
    SetPanel("StartPanel");
}

function StopButton_Click(sender, e) {
    // Clear the stop button and stop the audio stream.
    document.getElementById("StopButton").style.display = "none";
    speechRecognizer.stopListeningAndProcessAudio();
}

function SpeechRecognizer_AudioLevelChanged(args) {
    // Set the opacity of the volume meter to match the sound coming in.
    var volumeMeter = document.getElementById("VolumeMeter");
    var v = args.audioLevel;
    if (v > 0) volumeMeter.style.opacity = v / 50;
    else volumeMeter.style.opacity = Math.abs((v - 50) / 100);
}

function SpeechRecognizer_RecognizerResultReceived(args) {
    // Write intermediate results to the screen as they come in.
    document.getElementById("IntResults").style.display = "block";
    if (typeof (args.text) == "string") {
        document.getElementById("IntermediateResults").innerText = args.text;
    }
}

function ShowConfidence(confidence) {
    var confidenceText = document.getElementById("ConfidenceText");
    confidenceText.style.display = "block";

    switch (confidence) {
        case Bing.Speech.SpeechRecognitionConfidence.high:
            confidenceText.innerText = "I am almost sure you said:";
            break;
        case Bing.Speech.SpeechRecognitionConfidence.medium:
            confidenceText.innerText = "I think you said:";
            break;
        case Bing.Speech.SpeechRecognitionConfidence.low:
            confidenceText.innerText = "I think you might have said:";
            break;
        case Bing.Speech.SpeechRecognitionConfidence.rejected:
            confidenceText.innerText = "I'm sorry, I couldn't understand you."
            + "\nPlease click the Cancel button and try again.";
            break;
    }
}

Requirements

Minimum Supported Client

Windows 8

Required Extensions

Bing.Speech

Namespace

Bing.Speech