C#용 Speech SDK를 사용하여 음성에서 의도를 인식하는 방법

아티클
03/12/2024

Azure AI 서비스 Speech SDK는 LUIS(Language Understanding) 서비스와 통합하여 의도를 인식합니다. 의도란 항공권 예약, 날씨 확인, 호출 등 사용자가 수행하려는 것을 말합니다. 사용자는 편한 용어를 사용할 수 있습니다. LUIS는 사용자 요청을 정의한 의도에 매핑합니다.

참고 항목

LUIS 애플리케이션은 인식할 의도와 엔터티를 정의합니다. 음성 서비스를 사용하는 C# 애플리케이션과는 다릅니다. 이 문서에서 "앱"은 LUIS 앱을 의미하고, "애플리케이션"은 C# 코드를 의미합니다.

이 가이드에서는 Speech SDK를 사용하여 디바이스의 마이크를 통해 사용자의 말에서 의도를 추론하는 C# 콘솔 애플리케이션을 개발합니다. 다음 방법에 대해 설명합니다.

Speech SDK NuGet 패키지를 참조하는 Visual Studio 프로젝트 만들기
음성 구성을 만들고 의도 인식기 가져오기
LUIS 앱에 사용할 모델을 가져오고 필요한 의도 추가
음성 인식에 사용할 언어 지정
파일에서 음성 인식
비동기, 이벤트 중심 연속 인식 사용

필수 조건

이 가이드를 시작하기 전에, 다음 항목을 갖추고 있는지 확인합니다.

LUIS 계정 LUIS 포털을 통해 무료로 얻을 수 있습니다.
Visual Studio 2019 모든 버전.

LUIS 및 음성

LUIS는 음성 서비스와 통합되어 음성에서 의도를 인식합니다. 음성 서비스 구독은 필요 없고 LUIS만 있으면 됩니다.

LUIS는 다음과 같은 두 종류의 키를 사용합니다.

키 유형	목적
작성	프로그래밍 방식으로 LUIS 앱을 만들고 수정할 수 있음
예측	런타임에 LUIS 애플리케이션에 액세스하는 데 사용됨

이 가이드에서는 예측 키 유형이 필요합니다. 이 가이드에서는 미리 빌드된 홈 자동화 앱 사용 빠른 시작의 단계에 따라 만들 수 있는 홈 자동화 LUIS 앱 예제를 사용합니다. LUIS 앱을 직접 만든 경우 그 앱을 사용해도 됩니다.

LUIS 앱을 만들 때 LUIS에서 텍스트 쿼리를 사용하여 앱을 테스트할 수 있도록 작성 키가 자동으로 생성됩니다. 이 키는 Speech Service 통합을 사용하지 않으며 이 가이드에서는 작동하지 않습니다. Azure 대시보드에서 LUIS 리소스를 만들고 LUIS 앱에 할당합니다. 이 가이드에 대해 무료 구독 계층을 사용할 수 있습니다.

Azure 대시보드에서 LUIS 리소스를 만든 후에는 LUIS 포털에 로그인하고, 내 앱 페이지에서 애플리케이션을 선택한 다음, 앱의 관리 페이지로 전환합니다. 마지막으로 사이드바에서 Azure 리소스를 선택합니다.

Azure 리소스 페이지에서 다음을 수행합니다.

키 옆에 있는 아이콘을 선택하여 클립보드에 복사합니다. (두 키 중 하나를 사용할 수 있습니다.)

프로젝트 만들기 및 워크로드 추가

Windows 개발에 대한 Visual Studio 프로젝트를 만들려면 .NET 데스크톱 개발용으로 Visual Studio를 설정하고, Speech SDK를 설치하고 대상 아키텍처를 선택해야 합니다.

시작하려면 Visual Studio에서 프로젝트를 만들고 Visual Studio가 .NET 데스크톱 개발용으로 설정되어 있는지 확인합니다.

Visual Studio 2019를 엽니다.
[시작] 창에서 새 프로젝트 만들기를 선택합니다.
새 프로젝트 만들기 창에서 콘솔 앱(.NET Framework)을 선택한 다음, 다음을 선택합니다.
새 프로젝트 구성 창에서 프로젝트 이름에 helloworld를 입력하고 위치에서 디렉터리 경로를 선택하거나 만든 다음, 만들기를 선택합니다.
Visual Studio 메뉴 모음에서 도구>도구 및 기능 가져오기를 선택하여 Visual Studio 설치 관리자를 열고 수정 대화 상자를 표시합니다.
.NET 데스크톱 개발 워크로드를 사용할 수 있는지 확인합니다. 워크로드가 설치되지 않은 경우 옆에 있는 확인란을 선택한 다음, 수정을 선택하여 설치를 시작합니다. 다운로드 및 설치하는 데 몇 분 정도 걸릴 수 있습니다.

이미 .NET 데스크톱 개발 옆의 확인란이 선택되어 있는 경우 닫기를 선택하여 대화 상자를 닫을 수 있습니다.
Visual Studio 설치 관리자를 닫습니다.

Speech SDK 설치

다음 단계는 코드에서 참조할 수 있도록 Speech SDK NuGet 패키지를 설치하는 것입니다.

솔루션 탐색기에서 helloworld 프로젝트를 마우스 오른쪽 단추로 클릭한 다음, NuGet 패키지 관리를 선택하여 NuGet 패키지 관리자를 표시합니다.
오른쪽 위 모서리에서 패키지 원본 드롭다운 상자를 찾아서 nuget.org를 선택합니다.
왼쪽 위 모서리에서 찾아보기를 선택합니다.
검색 상자에 Microsoft.CognitiveServices.Speech를 입력하고 Enter를 선택합니다.
검색 결과에서 Microsoft.CognitiveServices.Speech 패키지를 선택한 다음, 설치를 선택하여 안정적인 최신 버전을 설치합니다.
설치를 시작하려면 모든 계약 및 라이선스를 수락합니다.

패키지를 설치하면 패키지 관리자 콘솔 창에 확인 메시지가 나타납니다.

대상 아키텍처 선택

이제 콘솔 애플리케이션을 빌드하여 실행하려면 컴퓨터의 아키텍처와 일치하는 플랫폼 구성을 만듭니다.

메뉴 모음에서 빌드>구성 관리자를 선택합니다. 구성 관리자 대화 상자가 나타납니다.
활성 솔루션 플랫폼 드롭다운 상자에서 새로 만들기를 선택합니다. 새 솔루션 플랫폼 대화 상자가 나타납니다.
새 플랫폼 입력 또는 선택 드롭다운 상자에서 다음을 수행합니다.
- 64비트 Windows를 실행하는 경우 x64를 선택합니다.
- 32비트 Windows를 실행하는 경우 x86을 선택합니다.
확인, 닫기를 차례로 선택합니다.

코드 추가

다음으로, 프로젝트에 코드를 추가합니다.

솔루션 탐색기에서 Program.cs 파일을 엽니다.

파일 시작 부분의 using 문 블록을 다음 선언으로 바꿉니다.

using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
using Microsoft.CognitiveServices.Speech.Intent;

제공된 Main() 메서드를 다음과 같은 비동기식 메서드로 대체합니다.

public static async Task Main()
{
    await RecognizeIntentAsync();
    Console.WriteLine("Please press Enter to continue.");
    Console.ReadLine();
}

다음과 같이 빈 비동기 메서드 RecognizeIntentAsync()를 만듭니다.
```
static async Task RecognizeIntentAsync()
{
}
```

이 새 메서드 본문에서 이 코드를 추가합니다.

// Creates an instance of a speech config with specified subscription key
// and service region. Note that in contrast to other services supported by
// the Cognitive Services Speech SDK, the Language Understanding service
// requires a specific subscription key from https://www.luis.ai/.
// The Language Understanding service calls the required key 'endpoint key'.
// Once you've obtained it, replace with below with your own Language Understanding subscription key
// and service region (e.g., "westus").
// The default language is "en-us".
var config = SpeechConfig.FromSubscription("YourLanguageUnderstandingSubscriptionKey", "YourLanguageUnderstandingServiceRegion");

// Creates an intent recognizer using microphone as audio input.
using (var recognizer = new IntentRecognizer(config))
{
    // Creates a Language Understanding model using the app id, and adds specific intents from your model
    var model = LanguageUnderstandingModel.FromAppId("YourLanguageUnderstandingAppId");
    recognizer.AddIntent(model, "YourLanguageUnderstandingIntentName1", "id1");
    recognizer.AddIntent(model, "YourLanguageUnderstandingIntentName2", "id2");
    recognizer.AddIntent(model, "YourLanguageUnderstandingIntentName3", "any-IntentId-here");

    // Starts recognizing.
    Console.WriteLine("Say something...");

    // Starts intent recognition, and returns after a single utterance is recognized. The end of a
    // single utterance is determined by listening for silence at the end or until a maximum of 15
    // seconds of audio is processed.  The task returns the recognition text as result. 
    // Note: Since RecognizeOnceAsync() returns only a single utterance, it is suitable only for single
    // shot recognition like command or query. 
    // For long-running multi-utterance recognition, use StartContinuousRecognitionAsync() instead.
    var result = await recognizer.RecognizeOnceAsync().ConfigureAwait(false);

    // Checks result.
    if (result.Reason == ResultReason.RecognizedIntent)
    {
        Console.WriteLine($"RECOGNIZED: Text={result.Text}");
        Console.WriteLine($"    Intent Id: {result.IntentId}.");
        Console.WriteLine($"    Language Understanding JSON: {result.Properties.GetProperty(PropertyId.LanguageUnderstandingServiceResponse_JsonResult)}.");
    }
    else if (result.Reason == ResultReason.RecognizedSpeech)
    {
        Console.WriteLine($"RECOGNIZED: Text={result.Text}");
        Console.WriteLine($"    Intent not recognized.");
    }
    else if (result.Reason == ResultReason.NoMatch)
    {
        Console.WriteLine($"NOMATCH: Speech could not be recognized.");
    }
    else if (result.Reason == ResultReason.Canceled)
    {
        var cancellation = CancellationDetails.FromResult(result);
        Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");

        if (cancellation.Reason == CancellationReason.Error)
        {
            Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
            Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
            Console.WriteLine($"CANCELED: Did you update the subscription info?");
        }
    }
}

이 메서드의 자리 표시자를 다음과 같이 LUIS 리소스 키, 지역 및 앱 ID로 바꿉니다.

자리 표시자	Replace with
`YourLanguageUnderstandingSubscriptionKey`	사용자의 LUIS 리소스 키입니다. 다시, Azure 대시보드에서 이 항목을 가져와야 합니다. LUIS 포털의 관리 아래에 있는 앱의 Azure 리소스 페이지에서 찾을 수 있습니다.
`YourLanguageUnderstandingServiceRegion`	LUIS 리소스가 있는 지역의 짧은 식별자(예: 미국 서부를 의미하는 `westus`). 지역을 참조하세요.
`YourLanguageUnderstandingAppId`	LUIS 앱 ID LUIS 포털의 앱 설정 페이지에서 찾을 수 있습니다.

이렇게 변경하여 가이드 애플리케이션을 빌드(Control+Shift+B)하고 실행(F5)할 수 있습니다. 메시지가 표시되면 PC의 마이크에 대고 “Turn off the lights(조명 끄기)”라고 말합니다. 애플리케이션에서 콘솔 창에 결과를 표시합니다.

다음 섹션에는 코드 설명이 포함되어 있습니다.

의도 인식기 만들기

먼저 LUIS 예측 키 및 지역에서 음성 구성을 만들어야 합니다. 음성 구성은 Speech SDK의 다양한 기능에 대한 인식기를 만드는 데 사용할 수 있습니다. 음성 구성은 사용할 리소스를 지정하는 여러 가지 방법을 제공하는데, 여기서는 리소스 키와 지역을 사용하는 FromSubscription으로 하겠습니다.

참고 항목

음성 리소스가 아닌 LUIS 리소스의 키와 지역을 사용합니다.

다음으로, new IntentRecognizer(config)를 사용하여 의도 인식기를 만듭니다. 어떤 리소스를 사용해야 하는지 구성에서 이미 알고 있으므로 인식기를 만들 때 리소스 키를 다시 지정할 필요가 없습니다.

LUIS 모델을 가져오고 의도 추가

이제 LanguageUnderstandingModel.FromAppId()를 사용하여 LUIS 앱에서 모델을 가져오고, 인식기의 AddIntent() 메서드를 통해 인식하려는 LUIS 의도를 추가합니다. 이러한 두 단계는 사용자가 요청에서 사용할 가능성이 높은 단어를 지정하여 음성 인식의 정확도를 높입니다. 앱의 모든 의도를 애플리케이션에서 인식해야 하는 것이 아니라면 반드시 모든 앱의 의도를 추가할 필요는 없습니다.

의도를 추가하려면 LUIS 모델(이름이 model인), 의도 이름 및 의도 ID라는 세 가지 인수가 필요합니다. ID와 이름의 차이는 다음과 같습니다.

`AddIntent()` 인수	목적
`intentName`	LUIS 앱에서 정의된 의도의 이름입니다. 이 값은 LUIS 의도 이름과 정확히 일치해야 합니다.
`intentID`	Speech SDK가 인식한 의도에 할당되는 ID입니다. 이 값은 개발자가 원하는 대로 할 수 있으며, LUIS 앱에서 정의된 의도 이름과 일치하지 않아도 됩니다. 예를 들어 여러 의도가 동일한 코드를 통해 처리되는 경우 동일한 ID를 사용해도 됩니다.

홈 자동화 LUIS 앱에는 두 가지 의도가 있는데, 하나는 디바이스를 켜는 용도로, 다른 하나는 디바이스를 끄는 용도로 사용됩니다. 아래는 이러한 의도를 인식기에 추가하는 코드 줄입니다. RecognizeIntentAsync() 메서드의 세 AddIntent 줄을 이 코드로 바꾸세요.

recognizer.AddIntent(model, "HomeAutomation.TurnOff", "off");
recognizer.AddIntent(model, "HomeAutomation.TurnOn", "on");

개별 의도를 추가하는 대신 AddAllIntents 메서드를 사용하여 모델의 모든 의도를 인식기에 추가할 수도 있습니다.

인식 시작

인식기를 만들고 의도를 추가했으니, 이제 인식을 시작할 수 있습니다. Speech SDK는 1단계 인식과 연속 인식을 모두 지원합니다.

인식 모드	호출 방법	결과
1단계	`RecognizeOnceAsync()`	한 번의 발언 후에 의도를 인식하고, 인식된 의도가 있으면 반환합니다.
연속	`StartContinuousRecognitionAsync()` `StopContinuousRecognitionAsync()`	여러 발언를 인식합니다. 결과를 사용할 수 있는 경우 이벤트(예: `IntermediateResultReceived`)를 내보냅니다.

애플리케이션은 1단계 모드를 사용하므로 RecognizeOnceAsync()를 호출하여 인식을 시작합니다. 결과는 인식된 의도에 대한 정보를 포함하는 IntentRecognitionResult 개체입니다. LUIS JSON 응답은 다음 식을 사용하여 추출됩니다.

result.Properties.GetProperty(PropertyId.LanguageUnderstandingServiceResponse_JsonResult)

애플리케이션은 JSON 결과를 구문 분석하지 않습니다. JSON 텍스트만 콘솔 창에 표시됩니다.

Single LUIS recognition results

인식 언어 지정

기본적으로 LUIS는 미국 영어(en-us)로 의도를 인식합니다. 음성 구성의 SpeechRecognitionLanguage 속성에 로캘 코드를 할당하여 다른 언어로 의도를 인식할 수 있습니다. 예를 들어 독일어로 의도를 인식하는 인식기를 만들려면 먼저 애플리케이션에서 config.SpeechRecognitionLanguage = "de-de";를 추가합니다. 자세한 내용은 LUIS 언어 지원을 참조하세요.

파일에서 연속 인식

다음 코드는 Speech SDK를 사용하는 의도 인식의 두 가지 추가 기능을 보여 줍니다. 앞에서 언급한 첫 번째 기능은 연속 인식으로, 결과가 있으면 인식기가 이벤트를 내보냅니다. 사용자가 제공하는 이벤트 처리기로 이러한 이벤트를 처리할 수 있습니다. 연속 인식에서는 RecognizeOnceAsync() 대신 인식기의 StartContinuousRecognitionAsync() 메서드를 호출하여 인식을 시작합니다.

다른 기능은 WAV 파일에서 처리할 음성이 포함된 오디오를 읽는 기능입니다. 구현에는 의도 인식기를 만들 때 사용할 수 있는 오디오 구성을 만드는 과정이 포함됩니다. 파일은 샘플링 속도가 16kHz인 단일 채널(모노)이어야 합니다.

이러한 기능을 사용해 보려면 RecognizeIntentAsync() 메서드의 본문을 삭제하거나 주석으로 처리하고 그 자리에 다음 코드를 추가합니다.

// Creates an instance of a speech config with specified subscription key
// and service region. Note that in contrast to other services supported by
// the Cognitive Services Speech SDK, the Language Understanding service
// requires a specific subscription key from https://www.luis.ai/.
// The Language Understanding service calls the required key 'endpoint key'.
// Once you've obtained it, replace with below with your own Language Understanding subscription key
// and service region (e.g., "westus").
var config = SpeechConfig.FromSubscription("YourLanguageUnderstandingSubscriptionKey", "YourLanguageUnderstandingServiceRegion");

// Creates an intent recognizer using file as audio input.
// Replace with your own audio file name.
using (var audioInput = AudioConfig.FromWavFileInput("YourAudioFile.wav"))
{
    using (var recognizer = new IntentRecognizer(config, audioInput))
    {
        // The TaskCompletionSource to stop recognition.
        var stopRecognition = new TaskCompletionSource<int>(TaskCreationOptions.RunContinuationsAsynchronously);

        // Creates a Language Understanding model using the app id, and adds specific intents from your model
        var model = LanguageUnderstandingModel.FromAppId("YourLanguageUnderstandingAppId");
        recognizer.AddIntent(model, "YourLanguageUnderstandingIntentName1", "id1");
        recognizer.AddIntent(model, "YourLanguageUnderstandingIntentName2", "id2");
        recognizer.AddIntent(model, "YourLanguageUnderstandingIntentName3", "any-IntentId-here");

        // Subscribes to events.
        recognizer.Recognizing += (s, e) =>
        {
            Console.WriteLine($"RECOGNIZING: Text={e.Result.Text}");
        };

        recognizer.Recognized += (s, e) =>
        {
            if (e.Result.Reason == ResultReason.RecognizedIntent)
            {
                Console.WriteLine($"RECOGNIZED: Text={e.Result.Text}");
                Console.WriteLine($"    Intent Id: {e.Result.IntentId}.");
                Console.WriteLine($"    Language Understanding JSON: {e.Result.Properties.GetProperty(PropertyId.LanguageUnderstandingServiceResponse_JsonResult)}.");
            }
            else if (e.Result.Reason == ResultReason.RecognizedSpeech)
            {
                Console.WriteLine($"RECOGNIZED: Text={e.Result.Text}");
                Console.WriteLine($"    Intent not recognized.");
            }
            else if (e.Result.Reason == ResultReason.NoMatch)
            {
                Console.WriteLine($"NOMATCH: Speech could not be recognized.");
            }
        };

        recognizer.Canceled += (s, e) =>
        {
            Console.WriteLine($"CANCELED: Reason={e.Reason}");

            if (e.Reason == CancellationReason.Error)
            {
                Console.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}");
                Console.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}");
                Console.WriteLine($"CANCELED: Did you update the subscription info?");
            }

            stopRecognition.TrySetResult(0);
        };

        recognizer.SessionStarted += (s, e) =>
        {
            Console.WriteLine("\n    Session started event.");
        };

        recognizer.SessionStopped += (s, e) =>
        {
            Console.WriteLine("\n    Session stopped event.");
            Console.WriteLine("\nStop recognition.");
            stopRecognition.TrySetResult(0);
        };


        // Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
        await recognizer.StartContinuousRecognitionAsync().ConfigureAwait(false);

        // Waits for completion.
        // Use Task.WaitAny to keep the task rooted.
        Task.WaitAny(new[] { stopRecognition.Task });

        // Stops recognition.
        await recognizer.StopContinuousRecognitionAsync().ConfigureAwait(false);
    }
}

이전처럼 LUIS 예측 키, 지역 및 앱 ID를 포함하고 홈 자동화 의도를 추가하도록 코드를 수정합니다. whatstheweatherlike.wav를 레코딩된 오디오 파일 이름으로 변경합니다. 그런 다음, 오디오 파일을 빌드 디렉터리에 복사하고 애플리케이션을 실행합니다.

예를 들어 레코딩된 오디오 파일에서 “Turn off the lights(조명 끄기)”라고 말하고, 일시 중지한 다음, “Turn on the lights(조명 켜기)”라고 말하면 다음과 유사한 콘솔 출력이 표시될 수 있습니다.

Audio file LUIS recognition results

Speech SDK 팀은 오픈 소스 리포지토리에서 대규모 예제 세트를 적극적으로 유지 관리합니다. 샘플 소스 코드 리포지토리는 GitHub의 Azure AI 음성 SDK를 참조하세요. C#, C++, Java, Python, Objective-C, Swift, JavaScript, UWP, Unity 및 Xamarin에 대한 샘플이 있습니다. 이 문서에 사용된 코드는 samples/csharp/sharedcontent/console 폴더에서 찾을 수 있습니다.

다음 단계

빠른 시작: 마이크에서 음성 인식