Feedback from Azure TTS to Azure speech recognizer

Fan Yang 20 Reputation points
2023-06-30T23:37:25.49+00:00

Hello, I'm trying out both Azure TTS and speech recognition from the speechsdk in Python. I want to be able to talk to a chatbot and have it talk back to me continuously like we are having a conversation. My issue is whenever the AI talks back to me through TTS, the speech recognizer always picks up part of the output and turns it into a new input. Both the TTS and speech recognizer are continuously on the default microphone and default speaker.

This is almost bizarre since I am using a headphone. I tried playing Youtube videos of people talking through the headphone, the speech recognizer would only pick up a word or two occasionally. But when the TTS speaks, the recognizer picks up whole sentences.

I know I can manually juggle the audio input and output, but is there an easy way to set a trigger threshold on the speech recognizer?

Thanks,

Fan

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,864 questions
0 comments No comments
{count} votes

Accepted answer
  1. Sedat SALMAN 14,145 Reputation points MVP
    2023-07-01T05:23:33.0233333+00:00

    https://github.com/Azure-Samples/cognitive-services-speech-sdk/issues/1793

    One way to solve this issue is to programmatically mute the microphone while the Text-to-Speech (TTS) is speaking. This approach helps to prevent the Speech-to-Text (STT) from picking up the TTS output. So, instead of switching the recognition off and on, you can mute the microphone temporarily until the TTS has finished speaking.

    Another recommendation that was suggested is to use echo cancellation. However, this feature has limited support with Microsoft Audio Stack (MAS) in the SDK, and Python doesn't have MAS support, so this might not be a practical solution for you.

    Finally, if you're experiencing a delay when stopping and restarting continuous recognition, it might be related to an issue with the SDK. While pausing the continuous recognition might seem like a good solution, it's noted that using stop_continuous_recognition and then restarting with start_continuous_recognition does not work very well, as it has a significant overhead and can cause a delay.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.