사용자 지정 음성 인식

20분

기본 Windows 음성 인식보다 더 잘 수행하려면 전체 문장 입력을 처리하도록 디자인된 앱 특정 음성 인식 시스템을 코딩해야 합니다.

이는 상당히 많은 코딩 작업이므로, 대신에 앱의 AutomationProperties.Name 속성을 개선하고 Windows 음성 인식기에 대해 다시 테스트하는 것이 좋을 수 있습니다. 약간 어설프지만 이 입력 방법에 액세스할 수 있는 시스템이 이미 준비되어 있습니다. 하지만 이 특수화된 컨텍스트에서 원활한 음성 입력을 위해 이 사용자 지정 시스템을 만들어야 합니다.

참조

명령의 전체 목록은 Windows 음성 인식 명령을 참조하세요.

마이크에서 입력을 가져와 해당 입력에 대한 음성 인식을 실행하기 위한 권한 얻기

사용자 지정 음성 인식 사용을 시작하기 전에 먼저 여러 가지 권한 및 기능을 설정해야 합니다.

Visual Studio에서 계산기 프로젝트가 로드되면 Package.appxmainifest 파일을 열고 기능을 선택합니다. 마이크 기능을 켭니다.

Setting the microphone capability.

이 기능을 설정하면 마이크의 오디오 피드에 대한 액세스 권한이 제공됩니다. 매니페스트 파일을 저장하고 닫습니다.
이것은 앱에 필요한 모든 것이지만, 음성 인식이 작동하는 데 필요한 모든 것은 아닙니다. 사용자는 앱에 대한 마이크 및 음성 인식을 사용하도록 설정해야 하며 음성 인식은 기본적으로 사용하지 않도록 설정됩니다. 개발자는 테스트할 때 사용자이기도 하므로 Windows 검색 창에 “개인 정보 설정”을 입력합니다.

Setting the privacy settings.

음성을 선택하고 온라인 음성 인식이 켜져 있는지 확인합니다. 마이크를 선택하고 앱이 마이크에 액세스하도록 허용이 켜져 있는지 확인합니다. 설정 창을 닫거나 최소화합니다.

앱에서 이 상황을 제대로 처리하는지 테스트하기 위해 나중에 이 설정을 끌 것입니다.

단어 및 구를 일치시키는 코드를 UI 요소에 추가

사용자 지정 음성 인식기를 지원하려면 매우 많은 코드를 추가해야 하지만, 먼저 using 문 및 전역 변수를 사용해 보겠습니다.

코드 맨 위에 다음 using 문을 추가합니다.

using Windows.Media.SpeechRecognition;
using Windows.Media.Capture;

다음 전역 변수 및 하나의 새 열거형을 추가합니다.

        enum eElements
        {
            Button,
            ToggleSwitch,
            Unknown
        }

        bool isRecognitionAvailable;
        SpeechRecognizer speechRecognizer;

위에서 설명한 권한 문제를 처리하려면 코드에 다음 클래스를 추가합니다. RequestMicrophonePermission을 한 번 호출하면 필요한 모든 권한이 확인됩니다. 이 코드는 제네릭 코드이며 마이크를 통해 음성 인식 권한을 지원하기 위해 Windows 10용으로 개발하는 앱에서 사용할 수 있지만, Cortana/받아쓰기 개인 정보를 처리하지는 않습니다.

        public class AudioCapturePermissions
        {
            // If no microphone is present, an exception is thrown with the following HResult value.
            private static readonly int NoCaptureDevicesHResult = -1072845856;

            /// <summary>
            ///  Note that this method only checks the Settings->Privacy->Microphone setting, it does not handle
            /// the Cortana/Dictation privacy check.
            /// </summary>
            /// <returns>True, if the microphone is available.</returns>
            public async static Task<bool> RequestMicrophonePermission()
            {
                try
                {
                    // Request access to the audio capture device.
                    var settings = new MediaCaptureInitializationSettings
                    {
                        StreamingCaptureMode = StreamingCaptureMode.Audio,
                        MediaCategory = MediaCategory.Speech,
                    };
                    var capture = new MediaCapture();

                    await capture.InitializeAsync(settings);
                }
                catch (TypeLoadException)
                {
                    // Thrown when a media player is not available.
                    var messageDialog = new Windows.UI.Popups.MessageDialog("Media player components are unavailable.");
                    await messageDialog.ShowAsync();
                    return false;
                }
                catch (UnauthorizedAccessException)
                {
                    // Thrown when permission to use the audio capture device is denied.
                    var messageDialog = new Windows.UI.Popups.MessageDialog("Permission to use the audio capture device is denied.");
                    await messageDialog.ShowAsync();
                    return false;
                }
                catch (Exception exception)
                {
                    // Thrown when an audio capture device is not present.
                    if (exception.HResult == NoCaptureDevicesHResult)
                    {
                        var messageDialog = new Windows.UI.Popups.MessageDialog("No Audio Capture devices are present on this system.");
                        await messageDialog.ShowAsync();
                        return false;
                    }
                    else
                    {
                        throw;
                    }
                }
                return true;
            }
        }

음성 인식 기능을 켜고 끄는 설정을 포함하는 것이 좋습니다. MainPage.xaml 파일에서 다른 토글 스위치를 정의합니다. 토글을 트리거하도록 액셀러레이터 키 L(“수신기”)을 설정합니다. ListConstants 항목 바로 앞에 이 항목을 추가합니다.

        <ToggleSwitch x:Name="ToggleSpeechRecognition"
            Margin="685,409,0,0"
            HorizontalAlignment="Left"
            VerticalAlignment="Top"
            Header="Speech recognition"
            IsOn="False"
            Toggled="ToggleSpeechRecognition_Toggled">
            <ToggleSwitch.KeyboardAccelerators>
                <KeyboardAccelerator Key="L" Modifiers="None" />
            </ToggleSwitch.KeyboardAccelerators>
        </ToggleSwitch>

이제 xaml로 명명된 ToggleSpeechRecognition_Toggled 이벤트 및 다시 MainPage.xaml.cs 파일에서 몇몇 지원 메서드를 정의합니다.

        private async Task InitSpeechRecognition()
        {
            isRecognitionAvailable = await AudioCapturePermissions.RequestMicrophonePermission();

            if (isRecognitionAvailable)
            {
                // Create an instance of SpeechRecognizer.
                speechRecognizer = new SpeechRecognizer();

                // Compile the dictation grammar by default.
                await speechRecognizer.CompileConstraintsAsync();

                speechRecognizer.UIOptions.ShowConfirmation = true;
            }
            else
            {
                ToggleSpeechRecognition.IsOn = false;
                isRecognitionAvailable = false;
            }
        }

        private async void ToggleSpeechRecognition_Toggled(object sender, RoutedEventArgs e)
        {
            if (ToggleSpeechRecognition.IsOn)
            {
                await InitSpeechRecognition();
                await StartListening();
            }
            else
            {
                isRecognitionAvailable = false;
            }
        }

        private async Task StartListening()
        {
            if (isRecognitionAvailable)
            {
                try
                {
                    // Start recognition.
                    var speechRecognitionResult = await speechRecognizer.RecognizeWithUIAsync();
                    ParseSpokenCalculationAsync(speechRecognitionResult.Text);

                    // Turn off the Toggle each time.
                    ToggleSpeechRecognition.IsOn = false;
                }
                catch (Exception ex)
                {
                    var messageDialog = new Windows.UI.Popups.MessageDialog(ex.Message);
                    await messageDialog.ShowAsync();
                    ToggleSpeechRecognition.IsOn = false;
                    isRecognitionAvailable = false;
                }
            }
        }

ParseSpokenCalculation 메서드에 음성 인식 문자열이 입력으로 제공됩니다. 이 문자열을 처리하려면 많은 앱 특정 코드를 추가해야 합니다.

이 코드는 들리는 문장을 사용하고 해당 문자의 구와 단어를 앱의 단추, 토클 스위치 또는 상수에 일치시킵니다. 일치하지 않는 단어는 무시됩니다. 다음 코드는 문제에 대한 무차별 암호 대입 접근 방식입니다.

앱에 다음 코드를 붙여넣습니다.

        private bool FindConstantFromSpeech(string spokenText, ref string value)
        {
            bool isLocated = false;
            int n = 0;
            string[] nameValue;

            // Remove the word "constant" from the start of the spoken text.
            spokenText = spokenText.Remove(0, spokenText.IndexOf(' ')).Trim();

            while (n < ListConstants.Items.Count && !isLocated)
            {
                nameValue = ListConstants.Items[n].ToString().Split('=');

                if (spokenText == nameValue[0].Trim().ToLower())
                {
                    value = nameValue[1].Trim();
                    isLocated = true;
                }
                else
                {
                    ++n;
                }
            }
            return isLocated;
        }

        private async void SayCurrentCalculationAsync()
        {
            if (TextDisplay.Text.Length == 0)
            {
                await SayAsync("The current calculation is empty.");
            }
            else
            {
                await SayAsync($"The current calculation is: {TextDisplay.Text}.");
            }
        }

        private async void ParseSpokenCalculationAsync(string spokenText)
        {
            spokenText = spokenText.ToLower().Trim();
            if (spokenText.Length == 0)
            {
                return;
            }

            // First check for specific control phrases.
            if (spokenText == "say memory")
            {
                await SayAsync($"The current memory is: {TextMemory.Text}.");
            }
            else
                if (spokenText == "say calculation")
            {
                SayCurrentCalculationAsync();
            }
            else
                 if (spokenText.StartsWith("const"))
            {
                string value = "";
                if (FindConstantFromSpeech(spokenText, ref value))
                {
                    MathEntry(value, "Number");
                    SayCurrentCalculationAsync();
                }
                else
                {
                    await SayAsync("Sorry, I did not recognize that constant.");
                }
            }
            else
            {
                // Ensure + is a word in its own right.
                // Sometimes the speech recognizer will enter "+N" and we need "+ N".
                spokenText = spokenText.Replace("+", "+ ");
                spokenText = spokenText.Replace("  ", " ");

                double d;
                string[] words = spokenText.Split(' ');
                int w = 0;
                ToggleSwitch ts;
                object obj;
                var eType = eElements.Unknown;

                while (w < words.Length)
                {
                    try
                    {
                        // Is the word a number?
                        d = double.Parse(words[w]);
                        MathEntry(d.ToString(), "Number");
                    }
                    catch
                    {
                        try
                        {
                            // Is the word a ratio?
                            string[] ratio = words[w].Split('/');
                            d = double.Parse(ratio[0]) / double.Parse(ratio[1]);
                            MathEntry(d.ToString(), "Number");
                        }
                        catch
                        {
                            // Check if a word or phrase refers to a button, test phrases up to 4 words long.
                            // There are only buttons in gridButtons, so no need to test for anything else.
                            obj = FindElementFromString(GridButtons.Children, words, w, 4, ref w, ref eType);
                            if (obj != null)
                            {
                                Button_Click(obj, null);
                            }
                            else
                            {
                                // Controls can be up to three words in our app.
                                obj = FindElementFromString(GridCalculator.Children, words, w, 3, ref w, ref eType);
                                if (obj != null)
                                {
                                    switch (eType)
                                    {
                                        case eElements.Button:
                                            Button_Click(obj, null);
                                            break;

                                        case eElements.ToggleSwitch:
                                            ts = (ToggleSwitch)obj;
                                            ts.IsOn = !ts.IsOn;
                                            break;

                                        default:
                                            break;
                                    }
                                }
                            }
                        }
                    }
                    ++w;
                }
                if (mode != Emode.CalculateDone)
                {
                    SayCurrentCalculationAsync();
                }
            }
        }

        private bool IsMatchingElementText(eElements elementType, object obj, string textToMatch)
        {
            string name = "";
            string accessibleName = "";

            switch (elementType)
            {
                case eElements.Button:
                    var b = (Button)obj;
                    name = b.Content.ToString().ToLower();
                    accessibleName = b.GetValue(AutomationProperties.NameProperty).ToString().ToLower();
                    break;

                case eElements.ToggleSwitch:
                    var ts = (ToggleSwitch)obj;
                    name = ts.Header.ToString().ToLower();
                    accessibleName = ts.GetValue(AutomationProperties.NameProperty).ToString().ToLower();
                    break;
            }

            // Return true if the name or accessibleName matches the spoken text.
            if ((textToMatch == name && name.Length > 0) || (textToMatch == accessibleName && accessibleName.Length > 0))
            {
                return true;
            }

            return false;
        }

        private object FindElementFromString(UIElementCollection elements, string[] words, int startIndex, int maxConcatenatedWords, ref int updatedIndex, ref eElements elementType)
        {
            // Return true if the spoken text matches the text for a button.
            int n;
            Button b;
            ToggleSwitch ts;

            // Longer phrazes take precendence over shorter ones, so start with the longest allowed and work down.
            for (int c = maxConcatenatedWords; c > 0; c--)
            {
                if (startIndex + c - 1 < words.Length)
                {
                    // Build the phraze from the following words.
                    string txt = words[startIndex];
                    for (n = 1; n < c; n++)
                    {
                        txt += " " + words[startIndex + n];
                    }

                    // Test the word or phrase against the content/tag/name of each button.
                    for (int i = 0; i < elements.Count; i++)
                    {
                        // Is the UI element a button?
                        try
                        {
                            b = (Button)elements[i];
                            if (IsMatchingElementText(eElements.Button, (object)b, txt))
                            {
                                updatedIndex = startIndex + c - 1;
                                elementType = eElements.Button;
                                return (object)b;
                            }
                        }
                        catch
                        {
                            // UI element is not a button, is it a ToggleSwitch?
                            try
                            {
                                ts = (ToggleSwitch)elements[i];
                                if (IsMatchingElementText(eElements.ToggleSwitch, (object)ts, txt))
                                {
                                    updatedIndex = startIndex + c - 1;
                                    elementType = eElements.ToggleSwitch;
                                    return (object)ts;
                                }
                            }
                            catch
                            {
                                // Ignore the UI element.
                            }
                        }
                    }
                }
            }
            updatedIndex = startIndex;
            return null;
        }

참고

상수를 입력하려면 "상수"라고 말한 후 상수의 전체 이름을 말합니다. "메모리 말하기"는 메모리의 콘텐츠를 발음합니다. "계산 말하기"는 현재 계산의 콘텐츠를 발음합니다.

앱을 빌드 및 실행하고 음성 인식 토글 스위치를 켭니다.
마이크가 준비되면 "1.23456 곱하기 2.789는 무엇입니까"라고 말합니다. 말하는 내용이 수신 중이라는 대화 상자에 표시됩니다. 이 대화 상자는 말하기를 중지한 후 빠르게 닫힙니다. 계산이 디스플레이에 표시됩니다.

Speaking a natural addition.

참고 항목

[수신 중] 대화 상자에 죄송합니다. 잘 못 들었습니다.가 표시되면 스페이스바를 눌러 수신기를 다시 표시합니다.

음성을 사용하여 여러 개의 간단한 계산을 입력해 보세요.

참고

L 키를 누르고 “지우기”라고 말해 처리하기 힘든 계산을 지웁니다.

계산은 여러 부분으로 구성될 수 있으므로, 수신기가 닫히고 입력이 구문 분석되는 데는 약간의 일시 중지로 충분합니다. 예를 들어 "사인 30 곱하기는 무엇입니까"라고 말합니다. 그런 다음, L 키를 선택하고 수신기가 다시 표시되면 “30의 코사인”이라고 말합니다. L 키를 다시 선택하고 “같음”이라고 말합니다. 결과가 표시됩니다.
“무엇입니까”, “의” 및 “그” 같은 의미 없는 단어가 문장에 포함될 수 있지만 무시됩니다.
Clr, Del 단추, 메모리 스토리지 단추, 상수 및 다른 모든 것을 포함하여 ‘모든’ 단추 및 토글 스위치를 테스트할 수식(수학적 의미가 없으면 문제 없음)을 구성해 보세요. 이 방법은 코드를 통해 모두 제대로 처리되는지 확인하는 데 도움이 됩니다.

계속

참조

마이크에서 입력을 가져와 해당 입력에 대한 음성 인식을 실행하기 위한 권한 얻기

단어 및 구를 일치시키는 코드를 UI 요소에 추가

피드백