Issue
There is a major flaw in the Windows.Media.SpeechRecognition API introduced with Windows 11 and it has been going on for months (since January). To quickly summarize, the speech recognizer cannot hear the user when certain audio playback is happening on the system, but it only seems to be happening on Windows 11 systems (Not all, but it's becoming more prevalent).
Background
The software I’m developing is complicated. It’s a legacy MFC application which makes use of a C# .Net framework 4.7.2 DLL which uses Microsoft.Windows.SDK.Contracts as detailed in this documentation from Microsoft. I use SRGS grammar constraints to create my own commands. The DLL’s recognizer receives voice commands, parses them as needed, and sends them on to the MFC application which plays .wav files as feedback. The feedback is done with the old PlaySound() function. If the user tries to speak again as the feedback is playing, the recognizer just cannot hear them, but ONLY on certain Windows 11 systems. It did not behave this way and does not have this issue on Windows 10.
Update Testing
I have 2 test computers (one Lenovo and one HP), which are compatible with Windows 11 and the behavior has been identical. My initial round of testing was back in March and went like this:
I started with Windows 10 and my software could hear and play the .wav files at the same time like normal. Win10 and working.
I ran the in-place update to Windows 11. After which, the issue happens and the recognizer cannot hear during audio playback. Win11 and broken.
I performed an in-place rollback to Windows 10 using the built-in utility. After which, the issue was no longer happening. It could hear and play audio at the same time again. Win10 and working.
I created a Windows 11 boot USB and performed a full installation, formatting everything with it. After reinstalling my software it was working and could play audio and hear simultaneously. It was still fine after all rounds of Windows updates, recommended and optional. Win11 and working.
I created a Windows 10 boot USB and did the same thing as step 4. Still working. Win10 and working.
I performed another in-place update like step 2. Broken again. Win11 and broken.
This led me to believe it was something about the in-place update that screws something up. I remember this kind of thing happening before with the jump from Windows 7 to 10 years ago. The in-place update back then broke all sorts of drivers and I assumed this was the same scenario and I could just tell users to perform a clean Windows 11 installation.
This has since then been proven incorrect.
I have now more recently in May used the same Windows 11 boot USB from March to perform a new clean installation for testing and it was a very different experience. The UI process was different and it automatically applied a number of updates as part of the installation. At the end of it all and installing my software to test I was greeted with a fresh Windows 11 system with the issue. So it has officially gotten worse.
Other Testing
Here is a list of other things I have tried to no effect on the Windows 11 installations that exhibit the issue:
Updating audio drivers
Reinstalling audio drivers
Rolling back audio drivers
Toggling audio enhancements
Running troubleshooters
Recreating my DLL with .Net Core 7.0 instead of Framework 4.7.2.
Recreating the speech recognizer as a Windows Store UWP app as the original documentation intended.
Restarting the Windows Audio and Windows Audio Endpoint services
Switching audio playback from PlaySound() to TTS via SAPI 5.4 Text-To-Speech
Switching audio playback from PlaySound() to System.Media.SoundPlayer
Switching audio playback from PlaySound() to TTS via Windows.Media.SpeechSynthesis
Switching audio playback from PlaySound() to Windows.Media.Playback.MediaPlayer
Switching file format to .mp3 and using Windows.Media.Playback.MediaPlayer
Switching audio playback device
Switching audio recording device
Not even muting the audio output device prevented the issue
Discoveries
Amongst these tests I made a couple peculiar discoveries.
In my early tests, if I pressed the Windows + H key to launch the built-in recognizer, and then close it, my software would work as intended again. Something with the built-in recognizer could kick the computer in the pants and back into working order. It would unfortunately go back to being broken after a restart.
However it seems something has changed with the built-in recognizer between March and May as it no longer is enough to kick speech recognition for the whole computer back into shape. It also coincided with the time period where dictation for the Windows.Media.SpeechRecognition recognizer was down for everyone. I was involved in the reporting of that event as well but that is a separate issue.
The other discovery was actually rather paradoxical. If I played a number of the .wav files on a loop in the Groove Music app, my software would actually work again. It could hear the user again through its own audio feedback and the constant background noise from Groove Music.
(Unsatisfactory) Fix
The second discovery led me to create an additional .Net Core 7.0 program with the sole purpose of playing an empty .wav file on loop using the Windows.Media.Playback.MediaPlayer class. I tried doing this with multithreading first, but that was not enough. As long as it’s playing that empty .wav file, my software can hear the user and play the audio feedback without issue again.
Hopefully Microsoft will address the root issue that they introduced into the engine, and I can only hope that my testing and discoveries can help with that.