How to recognize intents from speech using the Speech SDK for C#

مقالة
08/28/2024

The Azure AI services Speech SDK integrates with the Language Understanding service (LUIS) to provide intent recognition. An intent is something the user wants to do: book a flight, check the weather, or make a call. The user can use whatever terms feel natural. LUIS maps user requests to the intents you defined.

Note

A LUIS application defines the intents and entities you want to recognize. It's separate from the C# application that uses the Speech service. In this article, "app" means the LUIS app, while "application" means the C# code.

In this guide, you use the Speech SDK to develop a C# console application that derives intents from user utterances through your device's microphone. You learn how to:

Create a Visual Studio project referencing the Speech SDK NuGet package
Create a speech configuration and get an intent recognizer
Get the model for your LUIS app and add the intents you need
Specify the language for speech recognition
Recognize speech from a file
Use asynchronous, event-driven continuous recognition

Prerequisites

Be sure you have the following items before you begin this guide:

A LUIS account. You can get one for free through the LUIS portal.
Visual Studio 2019 (any edition).

LUIS and speech

LUIS integrates with the Speech service to recognize intents from speech. You don't need a Speech service subscription, just LUIS.

LUIS uses two kinds of keys:

Key type	Purpose
Authoring	Lets you create and modify LUIS apps programmatically
Prediction	Used to access the LUIS application in runtime

For this guide, you need the prediction key type. This guide uses the example Home Automation LUIS app, which you can create by following the Use prebuilt Home automation app quickstart. If you created a LUIS app of your own, you can use it instead.

When you create a LUIS app, LUIS automatically generates an authoring key so you can test the app using text queries. This key doesn't enable the Speech service integration and doesn't work with this guide. Create a LUIS resource in the Azure dashboard and assign it to the LUIS app. You can use the free subscription tier for this guide.

After you create the LUIS resource in the Azure dashboard, log into the LUIS portal, choose your application on the My Apps page, then switch to the app's Manage page. Finally, select Azure Resources in the sidebar.

On the Azure Resources page:

Select the icon next to a key to copy it to the clipboard. (You can use either key.)

Create the project and add the workload

To create a Visual Studio project for Windows development, you need to create the project, set up Visual Studio for .NET desktop development, install the Speech SDK, and choose the target architecture.

To start, create the project in Visual Studio, and make sure that Visual Studio is set up for .NET desktop development:

Open Visual Studio 2019.
In the Start window, select Create a new project.
In the Create a new project window, choose Console App (.NET Framework), and then select Next.
In the Configure your new project window, enter helloworld in Project name, choose or create the directory path in Location, and then select Create.
From the Visual Studio menu bar, select Tools > Get Tools and Features, which opens Visual Studio Installer and displays the Modifying dialog box.
Check whether the .NET desktop development workload is available. If the workload isn't installed, select the check box next to it, and then select Modify to start the installation. It might take a few minutes to download and install.

If the check box next to .NET desktop development is already selected, select Close to exit the dialog box.
Close Visual Studio Installer.

Install the Speech SDK

The next step is to install the Speech SDK NuGet package, so you can reference it in the code.

In the Solution Explorer, right-click the helloworld project, and then select Manage NuGet Packages to show the NuGet Package Manager.
In the upper-right corner, find the Package Source drop-down box, and make sure that nuget.org is selected.
In the upper-left corner, select Browse.
In the search box, type Microsoft.CognitiveServices.Speech and select Enter.
From the search results, select the Microsoft.CognitiveServices.Speech package, and then select Install to install the latest stable version.
Accept all agreements and licenses to start the installation.

After the package is installed, a confirmation appears in the Package Manager Console window.

Choose the target architecture

Now, to build and run the console application, create a platform configuration matching your computer's architecture.

From the menu bar, select Build > Configuration Manager. The Configuration Manager dialog box appears.
In the Active solution platform drop-down box, select New. The New Solution Platform dialog box appears.
In the Type or select the new platform drop-down box:
- If you're running 64-bit Windows, select x64.
- If you're running 32-bit Windows, select x86.
Select OK and then Close.

Add the code

Next, you add code to the project.

From Solution Explorer, open the file Program.cs.

Replace the block of using statements at the beginning of the file with the following declarations:

using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;
using Microsoft.CognitiveServices.Speech.Audio;
using Microsoft.CognitiveServices.Speech.Intent;

Replace the provided Main() method, with the following asynchronous equivalent:

public static async Task Main()
{
    await RecognizeIntentAsync();
    Console.WriteLine("Please press Enter to continue.");
    Console.ReadLine();
}

Create an empty asynchronous method RecognizeIntentAsync(), as shown here:
```
static async Task RecognizeIntentAsync()
{
}
```

In the body of this new method, add this code:

// Creates an instance of a speech config with specified subscription key
// and service region. Note that in contrast to other services supported by
// the Cognitive Services Speech SDK, the Language Understanding service
// requires a specific subscription key from https://www.luis.ai/.
// The Language Understanding service calls the required key 'endpoint key'.
// Once you've obtained it, replace with below with your own Language Understanding subscription key
// and service region (e.g., "westus").
// The default language is "en-us".
var config = SpeechConfig.FromSubscription("YourLanguageUnderstandingSubscriptionKey", "YourLanguageUnderstandingServiceRegion");

// Creates an intent recognizer using microphone as audio input.
using (var recognizer = new IntentRecognizer(config))
{
    // Creates a Language Understanding model using the app id, and adds specific intents from your model
    var model = LanguageUnderstandingModel.FromAppId("YourLanguageUnderstandingAppId");
    recognizer.AddIntent(model, "YourLanguageUnderstandingIntentName1", "id1");
    recognizer.AddIntent(model, "YourLanguageUnderstandingIntentName2", "id2");
    recognizer.AddIntent(model, "YourLanguageUnderstandingIntentName3", "any-IntentId-here");

    // Starts recognizing.
    Console.WriteLine("Say something...");

    // Starts intent recognition, and returns after a single utterance is recognized. The end of a
    // single utterance is determined by listening for silence at the end or until a maximum of 15
    // seconds of audio is processed.  The task returns the recognition text as result. 
    // Note: Since RecognizeOnceAsync() returns only a single utterance, it is suitable only for single
    // shot recognition like command or query. 
    // For long-running multi-utterance recognition, use StartContinuousRecognitionAsync() instead.
    var result = await recognizer.RecognizeOnceAsync().ConfigureAwait(false);

    // Checks result.
    if (result.Reason == ResultReason.RecognizedIntent)
    {
        Console.WriteLine($"RECOGNIZED: Text={result.Text}");
        Console.WriteLine($"    Intent Id: {result.IntentId}.");
        Console.WriteLine($"    Language Understanding JSON: {result.Properties.GetProperty(PropertyId.LanguageUnderstandingServiceResponse_JsonResult)}.");
    }
    else if (result.Reason == ResultReason.RecognizedSpeech)
    {
        Console.WriteLine($"RECOGNIZED: Text={result.Text}");
        Console.WriteLine($"    Intent not recognized.");
    }
    else if (result.Reason == ResultReason.NoMatch)
    {
        Console.WriteLine($"NOMATCH: Speech could not be recognized.");
    }
    else if (result.Reason == ResultReason.Canceled)
    {
        var cancellation = CancellationDetails.FromResult(result);
        Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");

        if (cancellation.Reason == CancellationReason.Error)
        {
            Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
            Console.WriteLine($"CANCELED: ErrorDetails={cancellation.ErrorDetails}");
            Console.WriteLine($"CANCELED: Did you update the subscription info?");
        }
    }
}

Replace the placeholders in this method with your LUIS resource key, region, and app ID as follows.

Placeholder	Replace with
`YourLanguageUnderstandingSubscriptionKey`	Your LUIS resource key. Again, you must get this item from your Azure dashboard. You can find it on your app's Azure Resources page (under Manage) in the LUIS portal.
`YourLanguageUnderstandingServiceRegion`	The short identifier for the region your LUIS resource is in, such as `westus` for West US. See Regions.
`YourLanguageUnderstandingAppId`	The LUIS app ID. You can find it on your app's Settings page in the LUIS portal.

With these changes made, you can build (Control+Shift+B) and run (F5) the application. When you're prompted, try saying "Turn off the lights" into your PC's microphone. The application displays the result in the console window.

The following sections include a discussion of the code.

Create an intent recognizer

First, you need to create a speech configuration from your LUIS prediction key and region. You can use speech configurations to create recognizers for the various capabilities of the Speech SDK. The speech configuration has multiple ways to specify the resource you want to use; here, we use FromSubscription, which takes the resource key and region.

Note

Use the key and region of your LUIS resource, not a Speech resource.

Next, create an intent recognizer using new IntentRecognizer(config). Since the configuration already knows which resource to use, you don't need to specify the key again when creating the recognizer.

Import a LUIS model and add intents

Now import the model from the LUIS app using LanguageUnderstandingModel.FromAppId() and add the LUIS intents that you wish to recognize via the recognizer's AddIntent() method. These two steps improve the accuracy of speech recognition by indicating words that the user is likely to use in their requests. You don't have to add all the app's intents if you don't need to recognize them all in your application.

To add intents, you must provide three arguments: the LUIS model (named model), the intent name, and an intent ID. The difference between the ID and the name is as follows.

`AddIntent()` argument	Purpose
`intentName`	The name of the intent as defined in the LUIS app. This value must match the LUIS intent name exactly.
`intentID`	An ID assigned to a recognized intent by the Speech SDK. This value can be whatever you like; it doesn't need to correspond to the intent name as defined in the LUIS app. If multiple intents are handled by the same code, for instance, you could use the same ID for them.

The Home Automation LUIS app has two intents: one for turning on a device, and another for turning off a device. The lines below add these intents to the recognizer; replace the three AddIntent lines in the RecognizeIntentAsync() method with this code.

recognizer.AddIntent(model, "HomeAutomation.TurnOff", "off");
recognizer.AddIntent(model, "HomeAutomation.TurnOn", "on");

Instead of adding individual intents, you can also use the AddAllIntents method to add all the intents in a model to the recognizer.

Start recognition

With the recognizer created and the intents added, recognition can begin. The Speech SDK supports both single-shot and continuous recognition.

Recognition mode	Methods to call	Result
Single-shot	`RecognizeOnceAsync()`	Returns the recognized intent, if any, after one utterance.
Continuous	`StartContinuousRecognitionAsync()` `StopContinuousRecognitionAsync()`	Recognizes multiple utterances; emits events (for example, `IntermediateResultReceived`) when results are available.

The application uses single-shot mode and so calls RecognizeOnceAsync() to begin recognition. The result is an IntentRecognitionResult object containing information about the intent recognized. You extract the LUIS JSON response by using the following expression:

result.Properties.GetProperty(PropertyId.LanguageUnderstandingServiceResponse_JsonResult)

The application doesn't parse the JSON result. It only displays the JSON text in the console window.

Single LUIS recognition results

Specify recognition language

By default, LUIS recognizes intents in US English (en-us). By assigning a locale code to the SpeechRecognitionLanguage property of the speech configuration, you can recognize intents in other languages. For example, add config.SpeechRecognitionLanguage = "de-de"; in our application before creating the recognizer to recognize intents in German. For more information, see LUIS language support.

Continuous recognition from a file

The following code illustrates two more capabilities of intent recognition using the Speech SDK. The first, previously mentioned, is continuous recognition, where the recognizer emits events when results are available. These events are processed by event handlers that you provide. With continuous recognition, you call the recognizer's StartContinuousRecognitionAsync() method to start recognition instead of RecognizeOnceAsync().

The other capability is reading the audio containing the speech to be processed from a WAV file. Implementation involves creating an audio configuration that can be used when creating the intent recognizer. The file must be single-channel (mono) with a sampling rate of 16 kHz.

To try out these features, delete or comment out the body of the RecognizeIntentAsync() method, and add the following code in its place.

// Creates an instance of a speech config with specified subscription key
// and service region. Note that in contrast to other services supported by
// the Cognitive Services Speech SDK, the Language Understanding service
// requires a specific subscription key from https://www.luis.ai/.
// The Language Understanding service calls the required key 'endpoint key'.
// Once you've obtained it, replace with below with your own Language Understanding subscription key
// and service region (e.g., "westus").
var config = SpeechConfig.FromSubscription("YourLanguageUnderstandingSubscriptionKey", "YourLanguageUnderstandingServiceRegion");

// Creates an intent recognizer using file as audio input.
// Replace with your own audio file name.
using (var audioInput = AudioConfig.FromWavFileInput("YourAudioFile.wav"))
{
    using (var recognizer = new IntentRecognizer(config, audioInput))
    {
        // The TaskCompletionSource to stop recognition.
        var stopRecognition = new TaskCompletionSource<int>(TaskCreationOptions.RunContinuationsAsynchronously);

        // Creates a Language Understanding model using the app id, and adds specific intents from your model
        var model = LanguageUnderstandingModel.FromAppId("YourLanguageUnderstandingAppId");
        recognizer.AddIntent(model, "YourLanguageUnderstandingIntentName1", "id1");
        recognizer.AddIntent(model, "YourLanguageUnderstandingIntentName2", "id2");
        recognizer.AddIntent(model, "YourLanguageUnderstandingIntentName3", "any-IntentId-here");

        // Subscribes to events.
        recognizer.Recognizing += (s, e) =>
        {
            Console.WriteLine($"RECOGNIZING: Text={e.Result.Text}");
        };

        recognizer.Recognized += (s, e) =>
        {
            if (e.Result.Reason == ResultReason.RecognizedIntent)
            {
                Console.WriteLine($"RECOGNIZED: Text={e.Result.Text}");
                Console.WriteLine($"    Intent Id: {e.Result.IntentId}.");
                Console.WriteLine($"    Language Understanding JSON: {e.Result.Properties.GetProperty(PropertyId.LanguageUnderstandingServiceResponse_JsonResult)}.");
            }
            else if (e.Result.Reason == ResultReason.RecognizedSpeech)
            {
                Console.WriteLine($"RECOGNIZED: Text={e.Result.Text}");
                Console.WriteLine($"    Intent not recognized.");
            }
            else if (e.Result.Reason == ResultReason.NoMatch)
            {
                Console.WriteLine($"NOMATCH: Speech could not be recognized.");
            }
        };

        recognizer.Canceled += (s, e) =>
        {
            Console.WriteLine($"CANCELED: Reason={e.Reason}");

            if (e.Reason == CancellationReason.Error)
            {
                Console.WriteLine($"CANCELED: ErrorCode={e.ErrorCode}");
                Console.WriteLine($"CANCELED: ErrorDetails={e.ErrorDetails}");
                Console.WriteLine($"CANCELED: Did you update the subscription info?");
            }

            stopRecognition.TrySetResult(0);
        };

        recognizer.SessionStarted += (s, e) =>
        {
            Console.WriteLine("\n    Session started event.");
        };

        recognizer.SessionStopped += (s, e) =>
        {
            Console.WriteLine("\n    Session stopped event.");
            Console.WriteLine("\nStop recognition.");
            stopRecognition.TrySetResult(0);
        };


        // Starts continuous recognition. Uses StopContinuousRecognitionAsync() to stop recognition.
        await recognizer.StartContinuousRecognitionAsync().ConfigureAwait(false);

        // Waits for completion.
        // Use Task.WaitAny to keep the task rooted.
        Task.WaitAny(new[] { stopRecognition.Task });

        // Stops recognition.
        await recognizer.StopContinuousRecognitionAsync().ConfigureAwait(false);
    }
}

Revise the code to include your LUIS prediction key, region, and app ID and to add the Home Automation intents, as before. Change whatstheweatherlike.wav to the name of your recorded audio file. Then build, copy the audio file to the build directory, and run the application.

For example, if you say "Turn off the lights", pause, and then say "Turn on the lights" in your recorded audio file, console output similar to the following might appear:

Audio file LUIS recognition results

The Speech SDK team actively maintains a large set of examples in an open-source repository. For the sample source code repository, see the Azure AI Speech SDK on GitHub. There are samples for C#, C++, Java, Python, Objective-C, Swift, JavaScript, UWP, Unity, and Xamarin. Look for the code from this article in the samples/csharp/sharedcontent/console folder.

Next steps

Quickstart: Recognize speech from a microphone

مشاركة عبر