Hangbemenet a DirectX-ben

Cikk
07/12/2023

Megjegyzés

Ez a cikk az örökölt WinRT natív API-kkal kapcsolatos. Új natív alkalmazásprojektekhez az OpenXR API használatát javasoljuk.

Ez a cikk azt ismerteti, hogyan valósíthat meg hangparancsokat, valamint kis kifejezés- és mondatfelismerést egy DirectX-alkalmazásban Windows Mixed Reality.

Megjegyzés

A cikkben szereplő kódrészletek A C+++/CX helyett a C++17 szabványnak megfelelő C+++/WinRT függvényt használják, amelyet a C++ holografikus projektsablonban használunk. A fogalmak egyenértékűek egy C++/WinRT-projekthez, de le kell fordítania a kódot.

A SpeechRecognizer használata a folyamatos beszédfelismeréshez

Ez a szakasz azt ismerteti, hogyan használhatja a folyamatos beszédfelismerést a hangparancsok engedélyezéséhez az alkalmazásban. Ez az útmutató a HolographicVoiceInput minta kódját használja. Amikor a minta fut, mondja ki az egyik regisztrált színparancs nevét a forgó kocka színének módosításához.

Először hozzon létre egy új Windows::Media::SpeechRecognition::SpeechRecognizer példányt .

Forrás : HolographicVoiceInputSampleMain::CreateSpeechConstraintsForCurrentState:

m_speechRecognizer = ref new SpeechRecognizer();

Hozzon létre egy listát a beszédparancsokról, amelyet a felismerő figyelhet. Itt egy parancskészletet hozunk létre a hologram színének módosításához. Az egyszerűség kedvéért létrehozzuk azokat az adatokat is, amelyeket később a parancsokhoz fogunk használni.

m_speechCommandList = ref new Platform::Collections::Vector<String^>();
   m_speechCommandData.clear();
   m_speechCommandList->Append(StringReference(L"white"));
   m_speechCommandData.push_back(float4(1.f, 1.f, 1.f, 1.f));
   m_speechCommandList->Append(StringReference(L"grey"));
   m_speechCommandData.push_back(float4(0.5f, 0.5f, 0.5f, 1.f));
   m_speechCommandList->Append(StringReference(L"green"));
   m_speechCommandData.push_back(float4(0.f, 1.f, 0.f, 1.f));
   m_speechCommandList->Append(StringReference(L"black"));
   m_speechCommandData.push_back(float4(0.1f, 0.1f, 0.1f, 1.f));
   m_speechCommandList->Append(StringReference(L"red"));
   m_speechCommandData.push_back(float4(1.f, 0.f, 0.f, 1.f));
   m_speechCommandList->Append(StringReference(L"yellow"));
   m_speechCommandData.push_back(float4(1.f, 1.f, 0.f, 1.f));
   m_speechCommandList->Append(StringReference(L"aquamarine"));
   m_speechCommandData.push_back(float4(0.f, 1.f, 1.f, 1.f));
   m_speechCommandList->Append(StringReference(L"blue"));
   m_speechCommandData.push_back(float4(0.f, 0.f, 1.f, 1.f));
   m_speechCommandList->Append(StringReference(L"purple"));
   m_speechCommandData.push_back(float4(1.f, 0.f, 1.f, 1.f));

A parancsok megadásához használhat olyan fonetikus szavakat, amelyek nem feltétlenül szerepelnek a szótárban.

m_speechCommandList->Append(StringReference(L"SpeechRecognizer"));
   m_speechCommandData.push_back(float4(0.5f, 0.1f, 1.f, 1.f));

A parancsok listájának a beszédfelismerő korlátozásainak listájába való betöltéséhez használjon SpeechRecognitionListConstraint objektumot .

SpeechRecognitionListConstraint^ spConstraint = ref new SpeechRecognitionListConstraint(m_speechCommandList);
   m_speechRecognizer->Constraints->Clear();
   m_speechRecognizer->Constraints->Append(spConstraint);
   create_task(m_speechRecognizer->CompileConstraintsAsync()).then([this](SpeechRecognitionCompilationResult^ compilationResult)
   {
       if (compilationResult->Status == SpeechRecognitionResultStatus::Success)
       {
           m_speechRecognizer->ContinuousRecognitionSession->StartAsync();
       }
       else
       {
           // Handle errors here.
       }
   });

Iratkozzon fel a ResultGenerated eseményre a beszédfelismerő SpeechContinuousRecognitionSession elemén. Ez az esemény értesíti az alkalmazást, ha az egyik parancs felismerve lett.

m_speechRecognizer->ContinuousRecognitionSession->ResultGenerated +=
       ref new TypedEventHandler<SpeechContinuousRecognitionSession^, SpeechContinuousRecognitionResultGeneratedEventArgs^>(
           std::bind(&HolographicVoiceInputSampleMain::OnResultGenerated, this, _1, _2)
           );

Az OnResultGenerated eseménykezelő egy SpeechContinuousRecognitionResultGeneratedEventArgs példányban fogadja az eseményadatokat. Ha a megbízhatóság nagyobb a megadott küszöbértéknél, az alkalmazásnak fel kell jegyeznie, hogy az esemény történt. Mentse az eseményadatokat, hogy egy későbbi frissítési ciklusban használhassa őket.

Forrás : HolographicVoiceInputSampleMain.cpp:

// Change the cube color, if we get a valid result.
   void HolographicVoiceInputSampleMain::OnResultGenerated(SpeechContinuousRecognitionSession ^sender, SpeechContinuousRecognitionResultGeneratedEventArgs ^args)
   {
       if (args->Result->RawConfidence > 0.5f)
       {
           m_lastCommand = args->Result->Text;
       }
   }

A példakódban a felhasználó parancsának megfelelően módosítjuk a forgó hologramkocka színét.

Forrás : HolographicVoiceInputSampleMain::Update:

// Check for new speech input since the last frame.
   if (m_lastCommand != nullptr)
   {
       auto command = m_lastCommand;
       m_lastCommand = nullptr;

       int i = 0;
       for each (auto& iter in m_speechCommandList)
       {
           if (iter == command)
           {
               m_spinningCubeRenderer->SetColor(m_speechCommandData[i]);
               break;
           }

           ++i;
       }
   }

Az "egy lövéses" felismerés használata

A beszédfelismerőt konfigurálhatja úgy, hogy a felhasználó által beszélt kifejezéseket vagy mondatokat figyelje. Ebben az esetben egy SpeechRecognitionTopicConstraint utasítást alkalmazunk, amely közli a beszédfelismerővel, hogy milyen típusú bemenetre számíthat. Íme egy alkalmazás-munkafolyamat ehhez a forgatókönyvhöz:

Az alkalmazás létrehozza a SpeechRecognizert, felhasználói felületi utasításokat ad, és elkezd figyelni egy kimondott parancsot.
A felhasználó egy kifejezést vagy mondatot beszél.
Ekkor megtörténik a felhasználó beszédének felismerése, és a rendszer visszaadja az eredményt az alkalmazásnak. Ekkor az alkalmazásnak meg kell adnia egy felhasználói felületi kérést, amely jelzi, hogy a felismerés megtörtént.
A beszédfelismerés eredményének megbízhatósági szintjétől és megbízhatósági szintjétől függően az alkalmazás feldolgozhatja az eredményt, és szükség szerint válaszolhat rá.

Ez a szakasz bemutatja, hogyan hozhat létre SpeechRecognizert, hogyan állíthatja össze a kényszert, és hogyan hallgathatja meg a beszédbevitelt.

Az alábbi kód lefordítja a témakör korlátozását, amely ebben az esetben webes keresésre van optimalizálva.

auto constraint = ref new SpeechRecognitionTopicConstraint(SpeechRecognitionScenario::WebSearch, L"webSearch");
   m_speechRecognizer->Constraints->Clear();
   m_speechRecognizer->Constraints->Append(constraint);
   return create_task(m_speechRecognizer->CompileConstraintsAsync())
       .then([this](task<SpeechRecognitionCompilationResult^> previousTask)
   {

Ha a fordítás sikeres, folytathatjuk a beszédfelismerést.

try
       {
           SpeechRecognitionCompilationResult^ compilationResult = previousTask.get();

           // Check to make sure that the constraints were in a proper format and the recognizer was able to compile it.
           if (compilationResult->Status == SpeechRecognitionResultStatus::Success)
           {
               // If the compilation succeeded, we can start listening for the user's spoken phrase or sentence.
               create_task(m_speechRecognizer->RecognizeAsync()).then([this](task<SpeechRecognitionResult^>& previousTask)
               {

Az eredmény ezután visszakerül az alkalmazásba. Ha elég magabiztosak vagyunk az eredményben, feldolgozhatjuk a parancsot. Ez a példakód legalább közepes megbízhatósággal dolgozza fel az eredményeket.

try
                   {
                       auto result = previousTask.get();

                       if (result->Status != SpeechRecognitionResultStatus::Success)
                       {
                           PrintWstringToDebugConsole(
                               std::wstring(L"Speech recognition was not successful: ") +
                               result->Status.ToString()->Data() +
                               L"\n"
                               );
                       }

                       // In this example, we look for at least medium confidence in the speech result.
                       if ((result->Confidence == SpeechRecognitionConfidence::High) ||
                           (result->Confidence == SpeechRecognitionConfidence::Medium))
                       {
                           // If the user said a color name anywhere in their phrase, it will be recognized in the
                           // Update loop; then, the cube will change color.
                           m_lastCommand = result->Text;

                           PrintWstringToDebugConsole(
                               std::wstring(L"Speech phrase was: ") +
                               m_lastCommand->Data() +
                               L"\n"
                               );
                       }
                       else
                       {
                           PrintWstringToDebugConsole(
                               std::wstring(L"Recognition confidence not high enough: ") +
                               result->Confidence.ToString()->Data() +
                               L"\n"
                               );
                       }
                   }

Amikor beszédfelismerést használ, watch olyan kivételek esetén, amelyek azt jelezhetik, hogy a felhasználó kikapcsolta a mikrofont a rendszer adatvédelmi beállításaiban. Ez az inicializálás vagy a felismerés során fordulhat elő.

catch (Exception^ exception)
                   {
                       // Note that if you get an "Access is denied" exception, you might need to enable the microphone
                       // privacy setting on the device and/or add the microphone capability to your app manifest.

                       PrintWstringToDebugConsole(
                           std::wstring(L"Speech recognizer error: ") +
                           exception->ToString()->Data() +
                           L"\n"
                           );
                   }
               });

               return true;
           }
           else
           {
               OutputDebugStringW(L"Could not initialize predefined grammar speech engine!\n");

               // Handle errors here.
               return false;
           }
       }
       catch (Exception^ exception)
       {
           // Note that if you get an "Access is denied" exception, you might need to enable the microphone
           // privacy setting on the device and/or add the microphone capability to your app manifest.

           PrintWstringToDebugConsole(
               std::wstring(L"Exception while trying to initialize predefined grammar speech engine:") +
               exception->Message->Data() +
               L"\n"
               );

           // Handle exceptions here.
           return false;
       }
   });

Megjegyzés

Számos előre definiált SpeechRecognitionScenarios segítségével optimalizálhatja a beszédfelismerést.

A diktáláshoz való optimalizáláshoz használja a diktálási forgatókönyvet.

// Compile the dictation topic constraint, which optimizes for speech dictation.
auto dictationConstraint = ref new SpeechRecognitionTopicConstraint(SpeechRecognitionScenario::Dictation, "dictation");
m_speechRecognizer->Constraints->Append(dictationConstraint);

A beszédalapú webes keresésekhez használja az alábbi webspecifikus forgatókönyv-korlátozást.

// Add a web search topic constraint to the recognizer.
auto webSearchConstraint = ref new SpeechRecognitionTopicConstraint(SpeechRecognitionScenario::WebSearch, "webSearch");
speechRecognizer->Constraints->Append(webSearchConstraint);

Űrlapok kitöltéséhez használja az űrlapkényszert. Ebben az esetben a legjobb, ha az űrlap kitöltésére optimalizált saját nyelvtant alkalmazza.

// Add a form constraint to the recognizer.
auto formConstraint = ref new SpeechRecognitionTopicConstraint(SpeechRecognitionScenario::FormFilling, "formFilling");
speechRecognizer->Constraints->Append(formConstraint );

Saját nyelvtant is megadhat SRGS formátumban.

Folyamatos felismerés használata

A folyamatos diktálási forgatókönyvhöz tekintse meg a Windows 10 UWP beszédkódmintát.

A minőség romlásának kezelése

A környezeti feltételek néha zavarják a beszédfelismerést. Előfordulhat például, hogy a szoba túl zajos, vagy a felhasználó túl hangosan beszél. Amikor csak lehetséges, a beszédfelismerési API információkat nyújt a minőségromlást okozó feltételekről. Ezeket az információkat a rendszer egy WinRT-eseményen keresztül küldi el az alkalmazásnak. Az alábbi példa bemutatja, hogyan iratkozhat fel erre az eseményre.

m_speechRecognizer->RecognitionQualityDegrading +=
       ref new TypedEventHandler<SpeechRecognizer^, SpeechRecognitionQualityDegradingEventArgs^>(
           std::bind(&HolographicVoiceInputSampleMain::OnSpeechQualityDegraded, this, _1, _2)
           );

A kódmintánkban a hibakeresési konzolba írjuk a feltételekkel kapcsolatos információkat. Előfordulhat, hogy egy alkalmazás visszajelzést szeretne küldeni a felhasználónak a felhasználói felületen, a beszédszintézisen és egy másik módszeren keresztül. Vagy másképpen kell viselkednie, ha a beszédet a minőség ideiglenes csökkenése megszakítja.

void HolographicSpeechPromptSampleMain::OnSpeechQualityDegraded(SpeechRecognizer^ recognizer, SpeechRecognitionQualityDegradingEventArgs^ args)
   {
       switch (args->Problem)
       {
       case SpeechRecognitionAudioProblem::TooFast:
           OutputDebugStringW(L"The user spoke too quickly.\n");
           break;

       case SpeechRecognitionAudioProblem::TooSlow:
           OutputDebugStringW(L"The user spoke too slowly.\n");
           break;

       case SpeechRecognitionAudioProblem::TooQuiet:
           OutputDebugStringW(L"The user spoke too softly.\n");
           break;

       case SpeechRecognitionAudioProblem::TooLoud:
           OutputDebugStringW(L"The user spoke too loudly.\n");
           break;

       case SpeechRecognitionAudioProblem::TooNoisy:
           OutputDebugStringW(L"There is too much noise in the signal.\n");
           break;

       case SpeechRecognitionAudioProblem::NoSignal:
           OutputDebugStringW(L"There is no signal.\n");
           break;

       case SpeechRecognitionAudioProblem::None:
       default:
           OutputDebugStringW(L"An error was reported with no information.\n");
           break;
       }
   }

Ha nem ref osztályokat használ a DirectX-alkalmazás létrehozásához, le kell iratkoznia az eseményről, mielőtt kiadná vagy újból létrehozná a beszédfelismerőt. A HolographicSpeechPromptSample rutinszerűen leállítja az események felismerését és leiratkozását.

Concurrency::task<void> HolographicSpeechPromptSampleMain::StopCurrentRecognizerIfExists()
   {
       return create_task([this]()
       {
           if (m_speechRecognizer != nullptr)
           {
               return create_task(m_speechRecognizer->StopRecognitionAsync()).then([this]()
               {
                   m_speechRecognizer->RecognitionQualityDegrading -= m_speechRecognitionQualityDegradedToken;

                   if (m_speechRecognizer->ContinuousRecognitionSession != nullptr)
                   {
                       m_speechRecognizer->ContinuousRecognitionSession->ResultGenerated -= m_speechRecognizerResultEventToken;
                   }
               });
           }
           else
           {
               return create_task([this]() { m_speechRecognizer = nullptr; });
           }
       });
   }

Beszédszintézis használata hangjelzések megadásához

A holografikus beszédminták beszédszintézist használnak, hogy hallható utasításokat adjanak a felhasználónak. Ez a szakasz bemutatja, hogyan hozhat létre szintetizált hangmintát, majd hogyan játszhatja le újra a HRTF hang API-kkal.

Javasoljuk, hogy a kifejezésbevitel kérésekor adjon meg saját beszédüzeneteket. A parancssorok azt is jelezhetik, hogy mikor lehet kimondani a beszédparancsokat egy folyamatos felismerési forgatókönyvhöz. Az alábbi példa bemutatja, hogyan használhat beszédszintetizátort ehhez. Használhat egy előre rögzített hangklipet, egy vizuális felhasználói felületet vagy egy másik jelzőt is, amely jelzi a mondanivalót, például olyan esetekben, amikor a kérdés nem dinamikus.

Először hozza létre a SpeechSynthesizer objektumot.

auto speechSynthesizer = ref new Windows::Media::SpeechSynthesis::SpeechSynthesizer();

Szüksége lesz egy sztringre is, amely tartalmazza a szintetizálandó szöveget.

// Phrase recognition works best when requesting a phrase or sentence.
   StringReference voicePrompt = L"At the prompt: Say a phrase, asking me to change the cube to a specific color.";

A beszéd aszinkron módon van szintetizálva a SynthesizeTextToStreamAsync használatával. Itt elindítunk egy aszinkron feladatot a beszéd szintetizálásához.

create_task(speechSynthesizer->SynthesizeTextToStreamAsync(voicePrompt), task_continuation_context::use_current())
       .then([this, speechSynthesizer](task<Windows::Media::SpeechSynthesis::SpeechSynthesisStream^> synthesisStreamTask)
   {
       try
       {

A beszédszintézis bájtstreamként lesz elküldve. Ezzel a bájtstreamel inicializálhatunk egy XAudio2-hangot. A holografikus kódminták esetében HRTF hangeffektusként játsszuk le.

Windows::Media::SpeechSynthesis::SpeechSynthesisStream^ stream = synthesisStreamTask.get();

           auto hr = m_speechSynthesisSound.Initialize(stream, 0);
           if (SUCCEEDED(hr))
           {
               m_speechSynthesisSound.SetEnvironment(HrtfEnvironment::Small);
               m_speechSynthesisSound.Start();

               // Amount of time to pause after the audio prompt is complete, before listening
               // for speech input.
               static const float bufferTime = 0.15f;

               // Wait until the prompt is done before listening.
               m_secondsUntilSoundIsComplete = m_speechSynthesisSound.GetDuration() + bufferTime;
               m_waitingForSpeechPrompt = true;
           }
       }

A beszédfelismeréshez hasonlóan a beszédszintézis kivételt jelez, ha valami probléma merül fel.

catch (Exception^ exception)
       {
           PrintWstringToDebugConsole(
               std::wstring(L"Exception while trying to synthesize speech: ") +
               exception->Message->Data() +
               L"\n"
               );

           // Handle exceptions here.
       }
   });

Megosztás a következőn keresztül:

Hangbemenet a DirectX-ben

A SpeechRecognizer használata a folyamatos beszédfelismeréshez

Az "egy lövéses" felismerés használata

Folyamatos felismerés használata

A minőség romlásának kezelése

Beszédszintézis használata hangjelzések megadásához

Lásd még

További források