음성 인식

아티클
03/05/2024

음성 인식을 사용하여 입력을 제공하고, 작업 또는 명령을 지정하고, 작업을 수행합니다.

중요 API: Windows.Media.SpeechRecognition

음성 인식은 음성 런타임, 런타임 프로그래밍을 위한 인식 API, 받아쓰기 및 웹 검색을 위한 즉시 사용할 수 있는 문법 및 사용자가 음성 인식 기능을 검색하고 사용하는 데 도움이 되는 기본 시스템 UI로 구성됩니다.

음성 인식 구성

앱에서 음성 인식을 지원하려면 사용자가 디바이스에서 마이크를 연결하고 사용하도록 설정하고 앱에서 사용할 수 있는 Microsoft 개인정보처리방침 부여 권한을 수락해야 합니다.

마이크의 오디오 피드(예: 아래 표시된 음성 인식 및 음성 합성 샘플)에 액세스하고 사용할 수 있는 권한을 요청하는 시스템 대화 상자를 사용자에게 자동으로 표시하려면 앱 패키지 매니페스트에서 마이크디바이스 기능을 설정하기만 하면 됩니다. 자세한 내용은 앱 접근 권한 값 선언을 참조하세요.

사용자가 예를 클릭하여 마이크에 대한 액세스 권한을 부여하면 앱이 설정 -> 개인 정보 -> 마이크 페이지의 승인된 응용 프로그램 목록에 추가됩니다. 그러나 사용자가 언제든지 이 설정을 해제하도록 선택할 수 있기 때문에 마이크를 사용하기 전에 앱이 마이크에 액세스할 수 있는지 확인해야 합니다.

또한 받아쓰기, Cortana 또는 기타 음성 인식 서비스(예: 항목 제약 조건에 정의된 미리 정의된 문법)를 지원하려는 경우 온라인 음성 인식(설정 -> 개인 정보 -> 음성)이 사용되도록 설정되어 있는지도 확인해야 합니다.

이 조각은 앱에서 마이크가 있는지와 마이크를 사용할 수 있는 권한이 있는지 확인하는 방법을 보여줍니다.

public class AudioCapturePermissions
{
    // If no microphone is present, an exception is thrown with the following HResult value.
    private static int NoCaptureDevicesHResult = -1072845856;

    /// <summary>
    /// Note that this method only checks the Settings->Privacy->Microphone setting, it does not handle
    /// the Cortana/Dictation privacy check.
    ///
    /// You should perform this check every time the app gets focus, in case the user has changed
    /// the setting while the app was suspended or not in focus.
    /// </summary>
    /// <returns>True, if the microphone is available.</returns>
    public async static Task<bool> RequestMicrophonePermission()
    {
        try
        {
            // Request access to the audio capture device.
            MediaCaptureInitializationSettings settings = new MediaCaptureInitializationSettings();
            settings.StreamingCaptureMode = StreamingCaptureMode.Audio;
            settings.MediaCategory = MediaCategory.Speech;
            MediaCapture capture = new MediaCapture();

            await capture.InitializeAsync(settings);
        }
        catch (TypeLoadException)
        {
            // Thrown when a media player is not available.
            var messageDialog = new Windows.UI.Popups.MessageDialog("Media player components are unavailable.");
            await messageDialog.ShowAsync();
            return false;
        }
        catch (UnauthorizedAccessException)
        {
            // Thrown when permission to use the audio capture device is denied.
            // If this occurs, show an error or disable recognition functionality.
            return false;
        }
        catch (Exception exception)
        {
            // Thrown when an audio capture device is not present.
            if (exception.HResult == NoCaptureDevicesHResult)
            {
                var messageDialog = new Windows.UI.Popups.MessageDialog("No Audio Capture devices are present on this system.");
                await messageDialog.ShowAsync();
                return false;
            }
            else
            {
                throw;
            }
        }
        return true;
    }
}

/// <summary>
/// Note that this method only checks the Settings->Privacy->Microphone setting, it does not handle
/// the Cortana/Dictation privacy check.
///
/// You should perform this check every time the app gets focus, in case the user has changed
/// the setting while the app was suspended or not in focus.
/// </summary>
/// <returns>True, if the microphone is available.</returns>
IAsyncOperation<bool>^  AudioCapturePermissions::RequestMicrophonePermissionAsync()
{
    return create_async([]() 
    {
        try
        {
            // Request access to the audio capture device.
            MediaCaptureInitializationSettings^ settings = ref new MediaCaptureInitializationSettings();
            settings->StreamingCaptureMode = StreamingCaptureMode::Audio;
            settings->MediaCategory = MediaCategory::Speech;
            MediaCapture^ capture = ref new MediaCapture();

            return create_task(capture->InitializeAsync(settings))
                .then([](task<void> previousTask) -> bool
            {
                try
                {
                    previousTask.get();
                }
                catch (AccessDeniedException^)
                {
                    // Thrown when permission to use the audio capture device is denied.
                    // If this occurs, show an error or disable recognition functionality.
                    return false;
                }
                catch (Exception^ exception)
                {
                    // Thrown when an audio capture device is not present.
                    if (exception->HResult == AudioCapturePermissions::NoCaptureDevicesHResult)
                    {
                        auto messageDialog = ref new Windows::UI::Popups::MessageDialog("No Audio Capture devices are present on this system.");
                        create_task(messageDialog->ShowAsync());
                        return false;
                    }

                    throw;
                }
                return true;
            });
        }
        catch (Platform::ClassNotRegisteredException^ ex)
        {
            // Thrown when a media player is not available. 
            auto messageDialog = ref new Windows::UI::Popups::MessageDialog("Media Player Components unavailable.");
            create_task(messageDialog->ShowAsync());
            return create_task([] {return false; });
        }
    });
}

var AudioCapturePermissions = WinJS.Class.define(
    function () { }, {},
    {
        requestMicrophonePermission: function () {
            /// <summary>
            /// Note that this method only checks the Settings->Privacy->Microphone setting, it does not handle
            /// the Cortana/Dictation privacy check.
            ///
            /// You should perform this check every time the app gets focus, in case the user has changed
            /// the setting while the app was suspended or not in focus.
            /// </summary>
            /// <returns>True, if the microphone is available.</returns>
            return new WinJS.Promise(function (completed, error) {

                try {
                    // Request access to the audio capture device.
                    var captureSettings = new Windows.Media.Capture.MediaCaptureInitializationSettings();
                    captureSettings.streamingCaptureMode = Windows.Media.Capture.StreamingCaptureMode.audio;
                    captureSettings.mediaCategory = Windows.Media.Capture.MediaCategory.speech;

                    var capture = new Windows.Media.Capture.MediaCapture();
                    capture.initializeAsync(captureSettings).then(function () {
                        completed(true);
                    },
                    function (error) {
                        // Audio Capture can fail to initialize if there's no audio devices on the system, or if
                        // the user has disabled permission to access the microphone in the Privacy settings.
                        if (error.number == -2147024891) { // Access denied (microphone disabled in settings)
                            completed(false);
                        } else if (error.number == -1072845856) { // No recording device present.
                            var messageDialog = new Windows.UI.Popups.MessageDialog("No Audio Capture devices are present on this system.");
                            messageDialog.showAsync();
                            completed(false);
                        } else {
                            error(error);
                        }
                    });
                } catch (exception) {
                    if (exception.number == -2147221164) { // REGDB_E_CLASSNOTREG
                        var messageDialog = new Windows.UI.Popups.MessageDialog("Media Player components not available on this system.");
                        messageDialog.showAsync();
                        return false;
                    }
                }
            });
        }
    })

음성 입력 인식

제약 조건은 앱이 음성 입력에서 인식하는 단어와 구(어휘)를 정의합니다. 제약 조건은 음성 인식의 핵심이며 앱의 음성 인식 정확도를 더 잘 제어할 수 있도록 합니다.

음성 입력을 인식하기 위해 다음 유형의 제약 조건을 사용할 수 있습니다.

미리 정의된 문법

미리 정의된 받아쓰기 및 웹 검색 문법은 문법을 작성할 필요 없이 앱에 음성 인식을 제공합니다. 이러한 문법을 사용할 때 음성 인식은 원격 웹 서비스에서 수행되며 그 결과는 디바이스로 반환됩니다.

기본 자유 텍스트 받아쓰기 문법은 사용자가 특정 언어로 말할 수 있는 대부분의 단어와 구를 인식할 수 있으며 짧은 구를 인식하도록 최적화되어 있습니다. SpeechRecognizer 개체에 대한 제약 조건을 지정하지 않으면 미리 정의된 받아쓰기 문법이 사용됩니다. 자유 텍스트 받아쓰기는 사용자가 말할 수 있는 항목의 종류를 제한하지 않으려는 경우에 유용합니다. 일반적인 용도로는 노트 만들기 또는 메시지 내용 받아쓰기 등이 있습니다.

받아쓰기 문법과 같은 웹 검색 문법에는 사용자가 말할 수 있는 많은 단어와 구가 포함되어 있습니다. 그러나 웹을 검색할 때 일반적으로 사용하는 용어를 인식하도록 최적화되어 있습니다.

참고 항목

미리 정의된 받아쓰기 및 웹 검색 문법은 클 수 있으며, 디바이스가 아닌 온라인 상태이기 때문에 디바이스에 설치된 사용자 지정 문법만큼 성능이 빠르지 않을 수 있습니다.

이러한 미리 정의된 문법은 최대 10초의 음성 입력을 인식하는 데 사용할 수 있으며 작성 작업이 필요하지 않습니다. 그러나 네트워크에 연결해야 합니다.

웹 서비스 제약 조건을 사용하려면 설정 -> 개인 정보 -> 음성, 수동 입력 및 입력 페이지에서 "내 정보 표시" 옵션을 켜고 설정에서 음성 입력 및 받아쓰기 지원을 사용하도록 설정해야 합니다.

여기서는 음성 입력이 사용되는지 여부를 테스트하고 설정> - 개인 정보 -> 음성, 수동 입력 및 입력 페이지(없는 경우)를 여는 방법을 보여 줍니다.

먼저 전역 변수(HResultPrivacyStatementDeclined)를 0x80045509 HResult 값으로 초기화합니다. C# 또는 Visual Basic으로 작성된 예외 처리를 참조하세요.

private static uint HResultPrivacyStatementDeclined = 0x80045509;

그런 다음 인식하는 동안 표준 예외를 catch하여 HResult 값이 HResultPrivacyStatementDeclined 변수 값과 같은지 테스트합니다. 같으면 경고를 표시하고 await Windows.System.Launcher.LaunchUriAsync(new Uri("ms-settings:privacy-accounts"));를 호출하여 설정 페이지를 엽니다.

catch (Exception exception)
{
  // Handle the speech privacy policy error.
  if ((uint)exception.HResult == HResultPrivacyStatementDeclined)
  {
    resultTextBlock.Visibility = Visibility.Visible;
    resultTextBlock.Text = "The privacy statement was declined." + 
      "Go to Settings -> Privacy -> Speech, inking and typing, and ensure you" +
      "have viewed the privacy policy, and 'Get To Know You' is enabled.";
    // Open the privacy/speech, inking, and typing settings page.
    await Windows.System.Launcher.LaunchUriAsync(new Uri("ms-settings:privacy-accounts")); 
  }
  else
  {
    var messageDialog = new Windows.UI.Popups.MessageDialog(exception.Message, "Exception");
    await messageDialog.ShowAsync();
  }
}

SpeechRecognitionTopicConstraint를 참조하세요.

프로그래밍 방식 목록 제약 조건

프로그래밍 방식 목록 제약 조건은 단어 또는 구 목록을 사용하여 간단한 문법을 만드는 간단한 방법을 제공합니다. 목록 제약 조건은 짧고 고유한 구를 인식하는 데 적합합니다. 문법의 모든 단어를 명시적으로 지정하면 음성 인식 엔진이 음성만 처리하여 일치 여부를 확인해야 하므로 인식 정확도도 향상됩니다. 목록을 프로그래밍 방식으로 업데이트할 수도 있습니다.

목록 제약 조건은 앱에서 인식 작업을 위해 허용할 음성 입력을 나타내는 문자열 배열로 구성됩니다. 음성 인식 목록 제약 조건 개체를 만들고 문자열 배열을 전달하여 앱에서 목록 제약 조건을 만들 수 있습니다. 그런 다음 인식기의 제약 조건 컬렉션에 해당 개체를 추가합니다. 음성 인식기가 배열의 문자열 중 하나를 인식하면 인식에 성공합니다.

SpeechRecognitionListConstraint를 참조하세요.

SRGS 문법

SRGS(Speech Recognition Grammar Specification) 문법은 프로그래밍 방식 목록 제약 조건과 달리 SRGS 버전 1.0에서 정의된 XML 형식을 사용하는 정적 문서입니다. SRGS 문법은 한 번의 인식으로 여러 의미적 의미를 캡처할 수 있어 음성 인식 환경을 가장 잘 제어할 수 있습니다.

SpeechRecognitionGrammarFileConstraint를 참조하세요.

음성 명령 제약 조건

VCD(음성 명령 정의) XML 파일을 사용하여 사용자가 앱을 활성화할 때 작업을 시작하도록 말할 수 있는 명령을 정의합니다. 자세한 내용은 Cortana를 통해 음성 명령으로 전경 앱 활성화를 참조하세요.

SpeechRecognitionVoiceCommandDefinitionConstraint/를 참조하세요.

참고 사용할 제약 조건 유형은 만들려는 인식 환경의 복잡성에 따라 다릅니다. 모든 항목이 특정 인식 작업에 가장 적합한 선택일 수 있으며 앱에서 모든 유형의 제약 조건에 대한 용도를 찾을 수 있습니다. 제약 조건을 시작하려면 사용자 지정 인식 제약 조건 정의를 참조하세요.

미리 정의된 유니버설 Windows 앱 받아쓰기 문법은 언어에서 대부분의 단어와 짧은 구를 인식합니다. 이는 사용자 지정 제약 조건 없이 음성 인식기 개체가 인스턴스화될 때 기본적으로 활성화됩니다.

이 예제에서는 다음 방법을 보여드립니다.

음성 인식기를 만듭니다.
기본 유니버설 Windows 앱 제약 조건을 컴파일합니다(음성 인식기 문법 집합에 문법이 추가되지 않음).
RecognizeWithUIAsync 메서드에서 제공하는 기본 인식 UI 및 TTS 피드백을 사용하여 음성 수신 대기를 시작합니다. 기본 UI가 필요하지 않은 경우 RecognizeAsync 메서드를 사용합니다.

private async void StartRecognizing_Click(object sender, RoutedEventArgs e)
{
    // Create an instance of SpeechRecognizer.
    var speechRecognizer = new Windows.Media.SpeechRecognition.SpeechRecognizer();

    // Compile the dictation grammar by default.
    await speechRecognizer.CompileConstraintsAsync();

    // Start recognition.
    Windows.Media.SpeechRecognition.SpeechRecognitionResult speechRecognitionResult = await speechRecognizer.RecognizeWithUIAsync();

    // Do something with the recognition result.
    var messageDialog = new Windows.UI.Popups.MessageDialog(speechRecognitionResult.Text, "Text spoken");
    await messageDialog.ShowAsync();
}

인식 UI 사용자 지정

앱이 SpeechRecognizer.RecognizeWithUIAsync를 호출하여 음성 인식을 시도하면 다음과 같은 순서로 여러 화면이 표시됩니다.

미리 정의된 문법(받아쓰기 또는 웹 검색)에 따라 제약 조건을 사용하는 경우:

수신 대기 화면입니다.
고려하기 화면입니다.
사용자가 말하는 소리 듣기 화면 또는 오류 화면입니다.

단어 또는 구 목록을 기반으로 하는 제약 조건 또는 SRGS 문법 파일을 기반으로 하는 제약 조건을 사용하는 경우:

수신 대기 화면입니다.
사용자가 말한 내용이 둘 이상의 잠재적 결과로 해석될 수 있는 경우 Did you say 화면이 표시됩니다.
사용자가 말하는 소리 듣기 화면 또는 오류 화면입니다.

다음 이미지는 SRGS 문법 파일을 기반으로 하는 제약 조건을 사용하는 음성 인식기의 화면 간 흐름 예제를 보여줍니다. 이 예제에서는 음성 인식이 성공했습니다.

initial recognition screen for a constraint based on a sgrs grammar file

intermediate recognition screen for a constraint based on a sgrs grammar file

final recognition screen for a constraint based on a sgrs grammar file

수신 대기 화면은 앱에서 인식할 수 있는 단어 또는 구의 예를 제공할 수 있습니다. 여기서는 SpeechRecognizerUIOptions 클래스의 속성(SpeechRecognizer.UIOptions 속성을 호출하여 획득)을 사용하여 수신 대기 화면에서 콘텐츠를 사용자 지정하는 방법을 보여 줍니다.

private async void WeatherSearch_Click(object sender, RoutedEventArgs e)
{
    // Create an instance of SpeechRecognizer.
    var speechRecognizer = new Windows.Media.SpeechRecognition.SpeechRecognizer();

    // Listen for audio input issues.
    speechRecognizer.RecognitionQualityDegrading += speechRecognizer_RecognitionQualityDegrading;

    // Add a web search grammar to the recognizer.
    var webSearchGrammar = new Windows.Media.SpeechRecognition.SpeechRecognitionTopicConstraint(Windows.Media.SpeechRecognition.SpeechRecognitionScenario.WebSearch, "webSearch");


    speechRecognizer.UIOptions.AudiblePrompt = "Say what you want to search for...";
    speechRecognizer.UIOptions.ExampleText = @"Ex. 'weather for London'";
    speechRecognizer.Constraints.Add(webSearchGrammar);

    // Compile the constraint.
    await speechRecognizer.CompileConstraintsAsync();

    // Start recognition.
    Windows.Media.SpeechRecognition.SpeechRecognitionResult speechRecognitionResult = await speechRecognizer.RecognizeWithUIAsync();
    //await speechRecognizer.RecognizeWithUIAsync();

    // Do something with the recognition result.
    var messageDialog = new Windows.UI.Popups.MessageDialog(speechRecognitionResult.Text, "Text spoken");
    await messageDialog.ShowAsync();
}

음성 조작

샘플

음성 인식 및 음성 합성 샘플