自訂語音辨識

20 分鐘

如果我們想要比預設 Windows 語音辨識做得更好，就需要撰寫專為處理整句輸入而設計的應用程式特定語音辨識系統。

這需要撰寫相當多的程式碼，因此改為花點時間改進應用程式的 AutomationProperties.Name 屬性，並針對 Windows 語音辨識器重新測試式此屬性，可能較為值得。的確，我們已經有一個此輸入法可存取的系統，儘管有些笨拙。但為了在特製化的內容中進行真正流暢的語音輸入，我們將必須將此系統變成自訂系統。

參考資料

如需完整的命令清單，請參考 Windows 語音辨識命令。

取得從麥克風接受輸入並對該輸入執行語音辨識的權限

我們必須先設定數個權限和功能，才能開始使用自訂語音辨識。

在 Visual Studio 中，於載入計算機專案之後，開啟 Package.appxmainifest 檔案，然後選取 [功能]。開啟 [麥克風] 功能。

Setting the microphone capability.

設定此功能可提供對麥克風音訊摘要的存取。儲存並關閉資訊清單檔。
這是應用程式的所有必要步驟，但並非讓語音辨識能夠運作所需的所有步驟。使用者必須為應用程式啟用麥克風和語音辨識，而後者預設為停用。開發人員在測試時同時也是使用者，因此請在 Windows 搜尋列中輸入「隱私權設定」。

Setting the privacy settings.

選取 [語音] 並確定 [線上語音辨識] 已開啟。選取 [麥克風] 並確定 [允許應用程式存取您的麥克風] 已開啟。將設定視窗關閉或最小化。

我們稍後將嘗試關閉這些設定，這只是為了測試我們已經在應用程式中正確地處理這些情況。

新增程式碼以將單字和片語與 UI 元素做比對

必須新增相當多程式碼，才能支援自訂語音辨識器，但讓我們從使用陳述式和全域變數開始著手。

將下列 using 陳述式新增至您程式碼的最上方。

using Windows.Media.SpeechRecognition;
using Windows.Media.Capture;

新增下列全域變數和一個新的列舉。

        enum eElements
        {
            Button,
            ToggleSwitch,
            Unknown
        }

        bool isRecognitionAvailable;
        SpeechRecognizer speechRecognizer;

若要處理上述權限問題，請將下列類別新增至您的程式碼。一個對 RequestMicrophonePermission 的呼叫將會檢查所有必要權限。這是泛型程式碼，可用於您為 Windows 10 開發的任何應用程式，用以支援透過麥克風進行語音辨識的權限，但此程式碼並不處理 Cortana/聽寫隱私權。

        public class AudioCapturePermissions
        {
            // If no microphone is present, an exception is thrown with the following HResult value.
            private static readonly int NoCaptureDevicesHResult = -1072845856;

            /// <summary>
            ///  Note that this method only checks the Settings->Privacy->Microphone setting, it does not handle
            /// the Cortana/Dictation privacy check.
            /// </summary>
            /// <returns>True, if the microphone is available.</returns>
            public async static Task<bool> RequestMicrophonePermission()
            {
                try
                {
                    // Request access to the audio capture device.
                    var settings = new MediaCaptureInitializationSettings
                    {
                        StreamingCaptureMode = StreamingCaptureMode.Audio,
                        MediaCategory = MediaCategory.Speech,
                    };
                    var capture = new MediaCapture();

                    await capture.InitializeAsync(settings);
                }
                catch (TypeLoadException)
                {
                    // Thrown when a media player is not available.
                    var messageDialog = new Windows.UI.Popups.MessageDialog("Media player components are unavailable.");
                    await messageDialog.ShowAsync();
                    return false;
                }
                catch (UnauthorizedAccessException)
                {
                    // Thrown when permission to use the audio capture device is denied.
                    var messageDialog = new Windows.UI.Popups.MessageDialog("Permission to use the audio capture device is denied.");
                    await messageDialog.ShowAsync();
                    return false;
                }
                catch (Exception exception)
                {
                    // Thrown when an audio capture device is not present.
                    if (exception.HResult == NoCaptureDevicesHResult)
                    {
                        var messageDialog = new Windows.UI.Popups.MessageDialog("No Audio Capture devices are present on this system.");
                        await messageDialog.ShowAsync();
                        return false;
                    }
                    else
                    {
                        throw;
                    }
                }
                return true;
            }
        }

理想的做法是擁有一個開啟和關閉語音辨識功能的設定。在 MainPage.xaml 檔案中定義另一個切換開關。請注意，我們要設定鍵盤快速鍵 L (代表「接聽程式」) 來觸發切換。將此新增在緊接著 ListConstants 項目之前。

        <ToggleSwitch x:Name="ToggleSpeechRecognition"
            Margin="685,409,0,0"
            HorizontalAlignment="Left"
            VerticalAlignment="Top"
            Header="Speech recognition"
            IsOn="False"
            Toggled="ToggleSpeechRecognition_Toggled">
            <ToggleSwitch.KeyboardAccelerators>
                <KeyboardAccelerator Key="L" Modifiers="None" />
            </ToggleSwitch.KeyboardAccelerators>
        </ToggleSwitch>

現在，再次於 MainPage.xaml.cs 檔案中，定義 XAML 中指定的 ToggleSpeechRecognition_Toggled 事件，以及一些支援方法。

        private async Task InitSpeechRecognition()
        {
            isRecognitionAvailable = await AudioCapturePermissions.RequestMicrophonePermission();

            if (isRecognitionAvailable)
            {
                // Create an instance of SpeechRecognizer.
                speechRecognizer = new SpeechRecognizer();

                // Compile the dictation grammar by default.
                await speechRecognizer.CompileConstraintsAsync();

                speechRecognizer.UIOptions.ShowConfirmation = true;
            }
            else
            {
                ToggleSpeechRecognition.IsOn = false;
                isRecognitionAvailable = false;
            }
        }

        private async void ToggleSpeechRecognition_Toggled(object sender, RoutedEventArgs e)
        {
            if (ToggleSpeechRecognition.IsOn)
            {
                await InitSpeechRecognition();
                await StartListening();
            }
            else
            {
                isRecognitionAvailable = false;
            }
        }

        private async Task StartListening()
        {
            if (isRecognitionAvailable)
            {
                try
                {
                    // Start recognition.
                    var speechRecognitionResult = await speechRecognizer.RecognizeWithUIAsync();
                    ParseSpokenCalculationAsync(speechRecognitionResult.Text);

                    // Turn off the Toggle each time.
                    ToggleSpeechRecognition.IsOn = false;
                }
                catch (Exception ex)
                {
                    var messageDialog = new Windows.UI.Popups.MessageDialog(ex.Message);
                    await messageDialog.ShowAsync();
                    ToggleSpeechRecognition.IsOn = false;
                    isRecognitionAvailable = false;
                }
            }
        }

ParseSpokenCalculation 方法會獲得一個已經過語音辨識的字串作為輸入。為了處理此字串，我們必須新增大量的應用程式特定程式碼。

此程式碼會接受使用者說出的句子，並嘗試將來自該句子的片語和單字與應用程式的按鈕、切換開關或常值做比對。系統會忽略不相符的單字。下列程式碼是此問題的暴力解決方法。

將下列程式碼貼到您的應用程式中。

        private bool FindConstantFromSpeech(string spokenText, ref string value)
        {
            bool isLocated = false;
            int n = 0;
            string[] nameValue;

            // Remove the word "constant" from the start of the spoken text.
            spokenText = spokenText.Remove(0, spokenText.IndexOf(' ')).Trim();

            while (n < ListConstants.Items.Count && !isLocated)
            {
                nameValue = ListConstants.Items[n].ToString().Split('=');

                if (spokenText == nameValue[0].Trim().ToLower())
                {
                    value = nameValue[1].Trim();
                    isLocated = true;
                }
                else
                {
                    ++n;
                }
            }
            return isLocated;
        }

        private async void SayCurrentCalculationAsync()
        {
            if (TextDisplay.Text.Length == 0)
            {
                await SayAsync("The current calculation is empty.");
            }
            else
            {
                await SayAsync($"The current calculation is: {TextDisplay.Text}.");
            }
        }

        private async void ParseSpokenCalculationAsync(string spokenText)
        {
            spokenText = spokenText.ToLower().Trim();
            if (spokenText.Length == 0)
            {
                return;
            }

            // First check for specific control phrases.
            if (spokenText == "say memory")
            {
                await SayAsync($"The current memory is: {TextMemory.Text}.");
            }
            else
                if (spokenText == "say calculation")
            {
                SayCurrentCalculationAsync();
            }
            else
                 if (spokenText.StartsWith("const"))
            {
                string value = "";
                if (FindConstantFromSpeech(spokenText, ref value))
                {
                    MathEntry(value, "Number");
                    SayCurrentCalculationAsync();
                }
                else
                {
                    await SayAsync("Sorry, I did not recognize that constant.");
                }
            }
            else
            {
                // Ensure + is a word in its own right.
                // Sometimes the speech recognizer will enter "+N" and we need "+ N".
                spokenText = spokenText.Replace("+", "+ ");
                spokenText = spokenText.Replace("  ", " ");

                double d;
                string[] words = spokenText.Split(' ');
                int w = 0;
                ToggleSwitch ts;
                object obj;
                var eType = eElements.Unknown;

                while (w < words.Length)
                {
                    try
                    {
                        // Is the word a number?
                        d = double.Parse(words[w]);
                        MathEntry(d.ToString(), "Number");
                    }
                    catch
                    {
                        try
                        {
                            // Is the word a ratio?
                            string[] ratio = words[w].Split('/');
                            d = double.Parse(ratio[0]) / double.Parse(ratio[1]);
                            MathEntry(d.ToString(), "Number");
                        }
                        catch
                        {
                            // Check if a word or phrase refers to a button, test phrases up to 4 words long.
                            // There are only buttons in gridButtons, so no need to test for anything else.
                            obj = FindElementFromString(GridButtons.Children, words, w, 4, ref w, ref eType);
                            if (obj != null)
                            {
                                Button_Click(obj, null);
                            }
                            else
                            {
                                // Controls can be up to three words in our app.
                                obj = FindElementFromString(GridCalculator.Children, words, w, 3, ref w, ref eType);
                                if (obj != null)
                                {
                                    switch (eType)
                                    {
                                        case eElements.Button:
                                            Button_Click(obj, null);
                                            break;

                                        case eElements.ToggleSwitch:
                                            ts = (ToggleSwitch)obj;
                                            ts.IsOn = !ts.IsOn;
                                            break;

                                        default:
                                            break;
                                    }
                                }
                            }
                        }
                    }
                    ++w;
                }
                if (mode != Emode.CalculateDone)
                {
                    SayCurrentCalculationAsync();
                }
            }
        }

        private bool IsMatchingElementText(eElements elementType, object obj, string textToMatch)
        {
            string name = "";
            string accessibleName = "";

            switch (elementType)
            {
                case eElements.Button:
                    var b = (Button)obj;
                    name = b.Content.ToString().ToLower();
                    accessibleName = b.GetValue(AutomationProperties.NameProperty).ToString().ToLower();
                    break;

                case eElements.ToggleSwitch:
                    var ts = (ToggleSwitch)obj;
                    name = ts.Header.ToString().ToLower();
                    accessibleName = ts.GetValue(AutomationProperties.NameProperty).ToString().ToLower();
                    break;
            }

            // Return true if the name or accessibleName matches the spoken text.
            if ((textToMatch == name && name.Length > 0) || (textToMatch == accessibleName && accessibleName.Length > 0))
            {
                return true;
            }

            return false;
        }

        private object FindElementFromString(UIElementCollection elements, string[] words, int startIndex, int maxConcatenatedWords, ref int updatedIndex, ref eElements elementType)
        {
            // Return true if the spoken text matches the text for a button.
            int n;
            Button b;
            ToggleSwitch ts;

            // Longer phrazes take precendence over shorter ones, so start with the longest allowed and work down.
            for (int c = maxConcatenatedWords; c > 0; c--)
            {
                if (startIndex + c - 1 < words.Length)
                {
                    // Build the phraze from the following words.
                    string txt = words[startIndex];
                    for (n = 1; n < c; n++)
                    {
                        txt += " " + words[startIndex + n];
                    }

                    // Test the word or phrase against the content/tag/name of each button.
                    for (int i = 0; i < elements.Count; i++)
                    {
                        // Is the UI element a button?
                        try
                        {
                            b = (Button)elements[i];
                            if (IsMatchingElementText(eElements.Button, (object)b, txt))
                            {
                                updatedIndex = startIndex + c - 1;
                                elementType = eElements.Button;
                                return (object)b;
                            }
                        }
                        catch
                        {
                            // UI element is not a button, is it a ToggleSwitch?
                            try
                            {
                                ts = (ToggleSwitch)elements[i];
                                if (IsMatchingElementText(eElements.ToggleSwitch, (object)ts, txt))
                                {
                                    updatedIndex = startIndex + c - 1;
                                    elementType = eElements.ToggleSwitch;
                                    return (object)ts;
                                }
                            }
                            catch
                            {
                                // Ignore the UI element.
                            }
                        }
                    }
                }
            }
            updatedIndex = startIndex;
            return null;
        }

注意

若要輸入常數，請說出「常數」，然後說出完整的常數名稱。「說出記憶」會清楚說出記憶內容。「說出計算」會清楚說出目前的計算內容。

建置並執行應用程式，然後開啟語音辨識切換開關。
備妥麥克風之後，說出「1.23456 乘以 2.789 是多少」。您說出的話應該會顯示在標題為「正在聆聽」的對話方塊上，此對話方塊會在您停止說話後快速關閉。接著，計算應該就會出現在顯示器中。

Speaking a natural addition.

注意

如果 [正在聆聽] 對話方塊顯示「抱歉，沒聽清楚您說的話」，則請按空格鍵來重新顯示「接聽程式」。

嘗試使用您的聲音來輸入一些簡單的計算。

注意

直接按 L 鍵並說出「清除」，以抹除任意的計算。

您可以分段建構計算，因為只需稍微停頓，便足以關閉「接聽程式」並剖析輸入。例如，說出「正弦 30 乘以」。接著，選取 L 鍵，然後在「接聽程式」重新出現時說出「餘弦 30」。接著，再次選取 L 並說出「等於多少」。您應該會得到結果。
請注意，「是多少」、「的」、「該」等非搜尋字可以包含在句子中，但會被正確地忽略。
嘗試編造將測試「所有」按鈕和切換開關 (包括 Clr、Del 按鈕、記憶儲存按鈕、常數及所有其他項目) 的方程式 (即使沒有任何數學意義也沒關係！)。這應該有助於確保程式碼會正確地處理它們全部。

繼續

自訂語音辨識

參考資料

取得從麥克風接受輸入並對該輸入執行語音辨識的權限

新增程式碼以將單字和片語與 UI 元素做比對

意見反應